SQL-如何使用 MongoDB和PyMongo。

先决条件

在开始之前，请确保已经安装了 PyMongo 发行版。在 Python shell 中，下面的代码应该在不引发异常的情况下运行:

>>> import pymongo

假设 MongoDB 实例在默认主机和端口上运行。假设你已经下载并安装了 MongoDB，你可以这样启动它:

$ mongod

与 MongoClient 建立连接

使用 PyMongo 时的第一步是为运行的 mongod 实例创建一个 MongoClient。这样做很简单:

>>> from pymongo import MongoClient

>>> client = MongoClient()

上面的代码将在默认主机和端口上连接。我们也可以明确地指定主机和端口，如下所示:

>>> client = MongoClient('localhost', 27017)

或者使用 MongoDB URI 格式:

>>> client = MongoClient('mongodb://localhost:27017/')

获取数据库

MongoDB的单个实例可以支持多个独立的数据库。使用PyMongo时，您可以使用MongoClient实例上的属性样式访问来访问数据库：

>>> db = client.test_database

如果您的数据库名称使用属性样式访问不起作用（例如test-database），则可以使用字典样式访问：

>>> db = client['test-database']

获取集合

一个集合是一组存储在MongoDB中的文档，并且可以被认为是大致在关系数据库中的表的当量。在PyMongo中获取集合与获取数据库的工作方式相同：

>>> collection = db.test_collection

或（使用字典样式访问）：

>>> collection = db['test-collection']

关于MongoDB中的集合（和数据库）的一个重要注意事项是它们是懒惰创建的 - 上述命令都没有在MongoDB服务器上实际执行过任何操作。将第一个文档插入其中时，将创建集合和数据库。

文件

使用JSON样式的文档表示（并存储）MongoDB中的数据。在PyMongo中，我们使用字典来表示文档。例如，以下字典可能用于表示博客帖子：

>>> import datetime

>>> post = {"author": "Mike",

...         "text": "My first blog post!",

...         "tags": ["mongodb", "python", "pymongo"],

...         "date": datetime.datetime.utcnow()}

请注意，文档可以包含本机Python类型（如datetime.datetime实例），这些类型将自动转换为适当的BSON类型。

插入文档

要将文档插入集合，我们可以使用以下 insert_one()方法：

>>> posts = db.posts

>>> post_id = posts.insert_one(post).inserted_id

>>> post_id

ObjectId('...')

插入文档时"_id"，如果文档尚未包含"_id"密钥，则会自动添加特殊键。"_id"整个集合中的值必须是唯一的。insert_one()返回一个实例InsertOneResult。有关更多信息"_id"，请参阅_id上的文档。

插入第一个文档后，实际上已在服务器上创建了posts集合。我们可以通过在数据库中列出所有集合来验证这一点：

>>> db.collection_names(include_system_collections=False)

[u'posts']

用获取单个文档`find_one()`

可以在MongoDB中执行的最基本类型的查询是 find_one()。此方法返回与查询匹配的单个文档（或者None如果没有匹配项）。当您知道只有一个匹配的文档，或者只对第一个匹配感兴趣时，它很有用。这里我们用来 find_one()从posts集合中获取第一个文档：

>>> import pprint

>>> pprint.pprint(posts.find_one())

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'mongodb', u'python', u'pymongo'],

 u'text': u'My first blog post!'}

结果是一个与前面插入的字典匹配的字典。

注意返回的文档包含一个"_id"，在插入时自动添加。

find_one()还支持查询生成的文档必须匹配的特定元素。要将我们的结果限制为作者“Mike”的文档，我们会：

>>> pprint.pprint(posts.find_one({"author": "Mike"}))

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'mongodb', u'python', u'pymongo'],

 u'text': u'My first blog post!'}

如果我们尝试使用其他作者，例如“艾略特”，我们将得不到任何结果：

>>> posts.find_one({"author": "Eliot"})

通过ObjectId查询

我们也可以通过它找到一个帖子_id，在我们的例子中是一个ObjectId：

>>> post_id

ObjectId(...)

>>> pprint.pprint(posts.find_one({"_id": post_id}))

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'mongodb', u'python', u'pymongo'],

 u'text': u'My first blog post!'}

请注意，ObjectId与其字符串表示形式不同：

>>> post_id_as_str = str(post_id)

>>> posts.find_one({"_id": post_id_as_str}) # No result

Web应用程序中的常见任务是从请求URL获取ObjectId并查找匹配的文档。在这种情况下，有必要在将ObjectId传递给之前将其从字符串转换为 find_one：

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string

def get(post_id):

    # Convert from string to ObjectId:

    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

也可以看看当我在Web应用程序中通过ObjectId查询文档时，我得不到任何结果

关于Unicode字符串的注释

您可能已经注意到，从服务器检索时，我们之前存储的常规Python字符串看起来有所不同（例如，u'Mike'而不是'Mike'）。简短的解释是有序的。

MongoDB以BSON格式存储数据。BSON字符串是UTF-8编码的，因此PyMongo必须确保它存储的任何字符串仅包含有效的UTF-8数据。常规字符串（<type'str'>）经过验证并保持不变。Unicode字符串（<type'unicode'>）首先编码为UTF-8。我们的示例字符串在Python shell中表示为u'Mike'而不是'Mike'的原因是PyMongo将每个BSON字符串解码为Python unicode字符串，而不是常规str。

批量插入

为了使查询更有趣，让我们再插入一些文档。除了插入单个文档之外，我们还可以通过将列表作为第一个参数传递来执行批量插入操作insert_many()。这将在列表中插入每个文档，只向服务器发送一个命令：

>>> new_posts = [{"author": "Mike",

...               "text": "Another post!",

...               "tags": ["bulk", "insert"],

...               "date": datetime.datetime(2009, 11, 12, 11, 14)},

...              {"author": "Eliot",

...               "title": "MongoDB is fun",

...               "text": "and pretty easy too!",

...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]

>>> result = posts.insert_many(new_posts)

>>> result.inserted_ids

[ObjectId('...'), ObjectId('...')]

关于这个例子，有几个有趣的事情需要注意：

insert_many()现在的结果返回两个ObjectId实例，每个插入一个文档。

new_posts[1]与其他帖子有不同的“形状” - 没有"tags"字段，我们添加了一个新字段， "title"。当我们说MongoDB没有架构时，这就是我们的意思。

查询多个文档

要获取查询结果以外的多个文档，我们使用该 find() 方法。find()返回一个 Cursor实例，它允许我们迭代所有匹配的文档。例如，我们可以迭代posts集合中的每个文档：

>>> for post in posts.find():

...   pprint.pprint(post)

...

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'mongodb', u'python', u'pymongo'],

 u'text': u'My first blog post!'}

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'bulk', u'insert'],

 u'text': u'Another post!'}

{u'_id': ObjectId('...'),

 u'author': u'Eliot',

 u'date': datetime.datetime(...),

 u'text': u'and pretty easy too!',

 u'title': u'MongoDB is fun'}

就像我们一样find_one()，我们可以传递一个文档find() 来限制返回的结果。在这里，我们只获得作者为“Mike”的文档：

>>> for post in posts.find({"author": "Mike"}):

...   pprint.pprint(post)

...

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'mongodb', u'python', u'pymongo'],

 u'text': u'My first blog post!'}

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'bulk', u'insert'],

 u'text': u'Another post!'}

计数

如果我们只想知道有多少文档与查询匹配，我们可以执行count_documents()操作而不是完整查询。我们可以计算集合中的所有文档：

>>> posts.count_documents({})

3

或者只是那些与特定查询匹配的文档：

>>> posts.count_documents({"author": "Mike"})

2

范围查询

MongoDB支持许多不同类型的高级查询。例如，让我们执行查询，将结果限制为超过特定日期的帖子，同时按作者对结果进行排序：

>>> d = datetime.datetime(2009, 11, 12, 12)

>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):

...   pprint.pprint(post)

...

{u'_id': ObjectId('...'),

 u'author': u'Eliot',

 u'date': datetime.datetime(...),

 u'text': u'and pretty easy too!',

 u'title': u'MongoDB is fun'}

{u'_id': ObjectId('...'),

 u'author': u'Mike',

 u'date': datetime.datetime(...),

 u'tags': [u'bulk', u'insert'],

 u'text': u'Another post!'}

这里我们使用特殊"$lt"运算符来进行范围查询，并调用sort()按作者对结果进行排序。

索引

添加索引可以帮助加速某些查询，还可以添加其他功能来查询和存储文档。在此示例中，我们将演示如何在键上创建唯一索引，该索引拒绝索引中已存在该键值的文档。

首先，我们需要创建索引：

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],

...                                   unique=True)

>>> sorted(list(db.profiles.index_information()))

[u'_id_', u'user_id_1']

请注意，我们现在有两个索引：一个是_idMongoDB自动创建的索引，另一个是user_id我们刚创建的索引。

现在让我们设置一些用户配置文件：

>>> user_profiles = [

...     {'user_id': 211, 'name': 'Luke'},

...     {'user_id': 212, 'name': 'Ziltoid'}]

>>> result = db.profiles.insert_many(user_profiles)

索引阻止我们插入user_id已在集合中的文档：

>>> new_profile = {'user_id': 213, 'name': 'Drew'}

>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}

>>> result = db.profiles.insert_one(new_profile)  # This is fine.

>>> result = db.profiles.insert_one(duplicate_profile)

Traceback (most recent call last):

DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }