文档:

http://api.mongodb.com/python/current/tutorial.html

安装:

官网直接下载安装, mac上brew安装的下载太慢, 打算手动安装

使用:

开启服务:

 mongod #默认配置开启服务
mongod -- dpath <db path> # 指定数据库文件路径

连接服务:

 mongo # 默认配置连接
mongo [options] [db address] [file names (ending in .js)]

图形可视化程序:

https://www.robomongo.org/

shell:

 > help
db.help() help on db methods
db.mycoll.help() help on collection methods
sh.help() sharding helpers
rs.help() replica set helpers
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory, 'global' is default
use <db_name> set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : } ) list objects in foo where a ==
it result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell

more helps...

 > db.help()
DB methods:
db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [just calls db.runCommand(...)]
db.aggregate([pipeline], {options}) - performs a collectionless aggregation on this database; returns a cursor
db.auth(username, password)
db.cloneDatabase(fromhost)
db.commandHelp(name) returns the help for the command
db.copyDatabase(fromdb, todb, fromhost)
db.createCollection(name, {size: ..., capped: ..., max: ...})
db.createView(name, viewOn, [{$operator: {...}}, ...], {viewOptions})
db.createUser(userDocument)
db.currentOp() displays currently executing operations in the db
db.dropDatabase()
db.eval() - deprecated
db.fsyncLock() flush data to disk and lock server for backups
db.fsyncUnlock() unlocks server following a db.fsyncLock()
db.getCollection(cname) same as db['cname'] or db.cname
db.getCollectionInfos([filter]) - returns a list that contains the names and options of the db's collections
db.getCollectionNames()
db.getLastError() - just returns the err msg string
db.getLastErrorObj() - return full status object
db.getLogComponents()
db.getMongo() get the server connection object
db.getMongo().setSlaveOk() allow queries on a replication slave server
db.getName()
db.getPrevError()
db.getProfilingLevel() - deprecated
db.getProfilingStatus() - returns if profiling is on and slow threshold
db.getReplicationInfo()
db.getSiblingDB(name) get the db at the same server as this one
db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set
db.hostInfo() get details about the server's host
db.isMaster() check replica primary status
db.killOp(opid) kills the current operation in the db
db.listCommands() lists all the db commands
db.loadServerScripts() loads all the scripts in db.system.js
db.logout()
db.printCollectionStats()
db.printReplicationInfo()
db.printShardingStatus()
db.printSlaveReplicationInfo()
db.dropUser(username)
db.repairDatabase()
db.resetError()
db.runCommand(cmdObj) run a database command. if cmdObj is a string, turns it into {cmdObj: }
db.serverStatus()
db.setLogLevel(level,<component>)
db.setProfilingLevel(level,slowms) =off =slow =all
db.setWriteConcern(<write concern doc>) - sets the write concern for writes to the db
db.unsetWriteConcern(<write concern doc>) - unsets the write concern for writes to the db
db.setVerboseShell(flag) display extra information in shell output
db.shutdownServer()
db.stats()
db.version() current version of the server
>

DB methods

 > db.mycoll.help()
DBCollection help
db.mycoll.find().help() - show DBCursor help
db.mycoll.bulkWrite( operations, <optional params> ) - bulk execute write operations, optional parameters are: w, wtimeout, j
db.mycoll.count( query = {}, <optional params> ) - count the number of documents that matches the query, optional parameters are: limit, skip, hint, maxTimeMS
db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.
db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command
db.mycoll.createIndex(keypattern[,options])
db.mycoll.createIndexes([keypatterns], <options>)
db.mycoll.dataSize()
db.mycoll.deleteOne( filter, <optional params> ) - delete first matching document, optional parameters are: w, wtimeout, j
db.mycoll.deleteMany( filter, <optional params> ) - delete all matching documents, optional parameters are: w, wtimeout, j
db.mycoll.distinct( key, query, <optional params> ) - e.g. db.mycoll.distinct( 'x' ), optional parameters are: maxTimeMS
db.mycoll.drop() drop the collection
db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" : } )
db.mycoll.dropIndexes()
db.mycoll.ensureIndex(keypattern[,options]) - DEPRECATED, use createIndex() instead
db.mycoll.explain().help() - show explain help
db.mycoll.reIndex()
db.mycoll.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
e.g. db.mycoll.find( {x:} , {name:, x:} )
db.mycoll.find(...).count()
db.mycoll.find(...).limit(n)
db.mycoll.find(...).skip(n)
db.mycoll.find(...).sort(...)
db.mycoll.findOne([query], [fields], [options], [readConcern])
db.mycoll.findOneAndDelete( filter, <optional params> ) - delete first matching document, optional parameters are: projection, sort, maxTimeMS
db.mycoll.findOneAndReplace( filter, replacement, <optional params> ) - replace first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument
db.mycoll.findOneAndUpdate( filter, update, <optional params> ) - update first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument
db.mycoll.getDB() get DB object associated with collection
db.mycoll.getPlanCache() get query plan cache associated with collection
db.mycoll.getIndexes()
db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
db.mycoll.insert(obj)
db.mycoll.insertOne( obj, <optional params> ) - insert a document, optional parameters are: w, wtimeout, j
db.mycoll.insertMany( [objects], <optional params> ) - insert multiple documents, optional parameters are: w, wtimeout, j
db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> )
db.mycoll.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor
db.mycoll.remove(query)
db.mycoll.replaceOne( filter, replacement, <optional params> ) - replace the first matching document, optional parameters are: upsert, w, wtimeout, j
db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection.
db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
db.mycoll.save(obj)
db.mycoll.stats({scale: N, indexDetails: true/false, indexDetailsKey: <index key>, indexDetailsName: <index name>})
db.mycoll.storageSize() - includes free space allocated to this collection
db.mycoll.totalIndexSize() - size in bytes of all the indexes
db.mycoll.totalSize() - storage allocated for all data and indexes
db.mycoll.update( query, object[, upsert_bool, multi_bool] ) - instead of two flags, you can pass an object with fields: upsert, multi
db.mycoll.updateOne( filter, update, <optional params> ) - update the first matching document, optional parameters are: upsert, w, wtimeout, j
db.mycoll.updateMany( filter, update, <optional params> ) - update all matching documents, optional parameters are: upsert, w, wtimeout, j
db.mycoll.validate( <full> ) - SLOW
db.mycoll.getShardVersion() - only for use with sharding
db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster
db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function
db.mycoll.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set
db.mycoll.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection
db.mycoll.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection
db.mycoll.latencyStats() - display operation latency histograms for this collection
>

Collection methods

 > sh.help()
sh.addShard( host ) server:port OR setname/server:port
sh.addShardToZone(shard,zone) adds the shard to the zone
sh.updateZoneKeyRange(fullName,min,max,zone) assigns the specified range of the given collection to a zone
sh.disableBalancing(coll) disable balancing on one collection
sh.enableBalancing(coll) re-enable balancing on one collection
sh.enableSharding(dbname) enables sharding on the database dbname
sh.getBalancerState() returns whether the balancer is enabled
sh.isBalancerRunning() return true if the balancer has work in progress on any mongos
sh.moveChunk(fullName,find,to) move the chunk where 'find' is to 'to' (name of shard)
sh.removeShardFromZone(shard,zone) removes the shard from zone
sh.removeRangeFromZone(fullName,min,max) removes the range of the given collection from any zone
sh.shardCollection(fullName,key,unique,options) shards the collection
sh.splitAt(fullName,middle) splits the chunk that middle is in at middle
sh.splitFind(fullName,find) splits the chunk that find is in at the median
sh.startBalancer() starts the balancer so chunks are balanced automatically
sh.status() prints a general overview of the cluster
sh.stopBalancer() stops the balancer so chunks are not balanced automatically
sh.disableAutoSplit() disable autoSplit on one collection
sh.enableAutoSplit() re-enable autoSplit on one collection
sh.getShouldAutoSplit() returns whether autosplit is enabled
>

sharding helpers

 > rs.help()
rs.status() { replSetGetStatus : } checks repl set status
rs.initiate() { replSetInitiate : null } initiates set with default settings
rs.initiate(cfg) { replSetInitiate : cfg } initiates set with configuration cfg
rs.conf() get the current configuration object from local.system.replset
rs.reconfig(cfg) updates the configuration of a running replica set with cfg (disconnects)
rs.add(hostportstr) add a new member to the set with default attributes (disconnects)
rs.add(membercfgobj) add a new member to the set with extra attributes (disconnects)
rs.addArb(hostportstr) add a new member which is arbiterOnly:true (disconnects)
rs.stepDown([stepdownSecs, catchUpSecs]) step down as primary (disconnects)
rs.syncFrom(hostportstr) make a secondary sync from the given member
rs.freeze(secs) make a node ineligible to become primary for the time specified
rs.remove(hostportstr) remove a host from the replica set (disconnects)
rs.slaveOk() allow queries on secondary nodes rs.printReplicationInfo() check oplog size and time range
rs.printSlaveReplicationInfo() check replica set members and replication lag
db.isMaster() check who is primary reconfiguration helpers disconnect from the database so the shell will display
an error, even if the command succeeds.
>

replica set helpers

 > help admin
ls([path]) list files
pwd() returns current directory
listFiles([path]) returns file list
hostname() returns name of this host
cat(fname) returns contents of text file as a string
removeFile(f) delete a file or directory
load(jsfilename) load and execute a .js file
run(program[, args...]) spawn a program and wait for its completion
runProgram(program[, args...]) same as run(), above
sleep(m) sleep m milliseconds
getMemInfo() diagnostic
>

administrative help

 > help connect

 Normally one specifies the server on the mongo shell command line.  Run mongo --help to see those options.
Additional connections may be opened: var x = new Mongo('host[:port]');
var mydb = x.getDB('mydb');
or
var mydb = connect('host[:port]/mydb'); Note: the REPL prompt only auto-reports getLastError() for the shell command line connection. >

connect db help

 > help keys
Tab completion and command history is available at the command prompt. Some emacs keystrokes are available too:
Ctrl-A start of line
Ctrl-E end of line
Ctrl-K del to end of line Multi-line commands
You can enter a multi line javascript expression. If parens, braces, etc. are not closed, you will see a new line
beginning with '...' characters. Type the rest of your expression. Press Ctrl-C to abort the data entry if you
get stuck. >

shotcut keys

 > help misc
b = new BinData(subtype,base64str) create a BSON BinData value
b.subtype() the BinData subtype (..)
b.length() length of the BinData data in bytes
b.hex() the data as a hex encoded string
b.base64() the data as a base encoded string
b.toString() b = HexData(subtype,hexstr) create a BSON BinData value from a hex string
b = UUID(hexstr) create a BSON BinData value of UUID subtype
b = MD5(hexstr) create a BSON BinData value of MD5 subtype
"hexstr" string, sequence of hex characters (no 0x prefix) o = new ObjectId() create a new ObjectId
o.getTimestamp() return timestamp derived from first bits of the OID
o.isObjectId
o.toString()
o.equals(otherid) d = ISODate() like Date() but behaves more intuitively when used
d = ISODate('YYYY-MM-DD hh:mm:ss') without an explicit "new " prefix on construction
>

misc

 > help mr

 See also http://dochub.mongodb.org/core/mapreduce

 function mapf() {
// 'this' holds current document to inspect
emit(key, value);
} function reducef(key,value_array) {
return reduced_value;
} db.mycollection.mapReduce(mapf, reducef[, options]) options
{[query : <query filter object>]
[, sort : <sort the query. useful for optimization>]
[, limit : <number of objects to return from collection>]
[, out : <output-collection name>]
[, keeptemp: <true|false>]
[, finalize : <finalizefunction>]
[, scope : <object where fields go into javascript global scope >]
[, verbose : true]} >

mr

python驱动

pip install pymongo

scrapy:

settings.py

 ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ]

 MONGODB_SERVER = "localhost"
MONGODB_PORT = 27017
MONGODB_DB = "stackoverflow"
MONGODB_COLLECTION = "questions"

piplines.py

 import pymongo

 from scrapy.conf import settings
from scrapy.exceptions import DropItem
from scrapy import log class MongoDBPipeline(object): def __init__(self):
connection = pymongo.MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT']
)
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']] def process_item(self, item, spider):
valid = True
for data in item:
if not data:
valid = False
raise DropItem("Missing {0}!".format(data))
if valid:
self.collection.insert(dict(item))
log.msg("Question added to MongoDB database!",
level=log.DEBUG, spider=spider)
return item

scrapy 官方文档 https://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb:

piplines.py

 import pymongo

 class MongoPipeline(object):

     collection_name = 'scrapy_items'

     def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db @classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
) def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db] def close_spider(self, spider):
self.client.close() def process_item(self, item, spider):
self.db[self.collection_name].insert_one(dict(item))
return item

[python]Mongodb的更多相关文章

  1. Python Mongodb接口

    Python Mongodb接口 MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写.旨在为 WEB 应用提供可扩展的高性能数据存储解决方案. 同时,MongoDB 是一个介于关系 ...

  2. python+MongoDB使用示例

    本博客起源于博主的大三NoSQL课程设计,采用python+MongoDB结合方式,将数据从txt文件导入MongoDB之中,再将其取出以作图.主要技术是采用python与MongoDB结合存储读取方 ...

  3. Python MongoDB 教程

    基于菜鸟教程实际操作后总结而来 Python MongoDB MongoDB 是目前最流行的 NoSQL 数据库之一,使用的数据类型 BSON(类似 JSON). MongoDB 数据库安装与介绍可以 ...

  4. 吴裕雄--天生自然python学习笔记:Python MongoDB

    MongoDB 是目前最流行的 NoSQL 数据库之一,使用的数据类型 BSON(类似 JSON). PyMongo Python 要连接 MongoDB 需要 MongoDB 驱动,这里我们使用 P ...

  5. Python MongoDB使用介绍

    MongoDB介绍 MongoDB是一个面向文档的,开源数据库程序,它平台无关.MongoDB像其他一些NoSQL数据库(但不是全部!)使用JSON结构的文档存储数据.这是使得数据非常灵活,不需要的S ...

  6. python&MongoDB爬取图书馆借阅记录(没有验证码)

    题外话:这个爬虫本来是想用java完成然后发布在博客园里的,但是一直用java都失败了,最后看到别人用了python,然后自己就找别人问了问关键的知识点,发现连接那部分,python只用了19行!!! ...

  7. Python mongoDB 的简单操作

    #!/usr/bin/env python # coding:utf-8 # Filename:mongodb.py from pymongo import MongoClient,ASCENDING ...

  8. windows 7下安装python+mongodb

    1. python安装 下载:http://python.org/download/ 直接双击安装,安装完后将路径加入系统环境变量path中. 2. mongodb安装 下载:http://www.m ...

  9. 数据抓取分析(python + mongodb)

    分享点干货!!! Python数据抓取分析 编程模块:requests,lxml,pymongo,time,BeautifulSoup 首先获取所有产品的分类网址: def step(): try: ...

  10. python数据抓取分析(python + mongodb)

    分享点干货!!! Python数据抓取分析 编程模块:requests,lxml,pymongo,time,BeautifulSoup 首先获取所有产品的分类网址: def step(): try: ...

随机推荐

  1. 傅立叶变换—FFT

    FFT(快速傅立叶变换)使用“分而治之”的策略来计算一个n阶多项式的n阶DFT系数的值.定义n为2的整数幂数,为了计算一个n阶多项式f(x),算法定义了连个新的n/2阶多项式,函数f[0](x)包含了 ...

  2. [JavaScript设计模式]惰性单例模式

    惰性单例模式 之前介绍了JS中类的单例模式,这次我们讨论下单例模式的应用.在众多网站中,登录框的实现方式就是一个单例,点击一次就展示一次,所以我们可以在页面加载好的时候就创建一个登录框,点击页面上的登 ...

  3. 【转】常见Java面试题 – 第四部分:迭代(iteration)和递归(recursion)

    ImportNew注: 本文是ImportNew编译整理的Java面试题系列文章之一.你可以从这里查看全部的Java面试系列. Q.请写一段代码来计算给定文本内字符“A”的个数.分别用迭代和递归两种方 ...

  4. docker 批量删除 镜像 容器

    我们在docker构建和测试时,经常会产生很多无用的镜像或者容器,我们可用如下两条命令一个一个删除. docker container rm 容器id #删除容器 可简写: docker rm 容器i ...

  5. java 字典 map 和 list.forEach

    1.keySet  key 集 2.values value 集(为何不叫valueSet)... 3.entrySet key value 集 List<?> 循环 1.Iterable ...

  6. 微信小程序--百度地图坐标转换成腾讯地图坐标

    最近开发小程序时出现一个问题,后台程序坐标采用的时百度地图的坐标,因为小程序地图时采用的腾讯地图的坐标系,两种坐标有一定的误差,导致位置信息显示不正确.现在需要一个可以转换两种坐标的方法,经过查询发现 ...

  7. 通过例子进阶学习C++(六)你真的能写出约瑟夫环么

    本文是通过例子学习C++的第六篇,通过这个例子可以快速入门c++相关的语法. 1.问题描述 n 个人围坐在一个圆桌周围,现在从第 s 个人开始报数,数到第 m 个人,让他出局:然后从出局的下一个人重新 ...

  8. Java 基础(四)| IO 流之使用文件流的正确姿势

    为跳槽面试做准备,今天开始进入 Java 基础的复习.希望基础不好的同学看完这篇文章,能掌握泛型,而基础好的同学权当复习,希望看完这篇文章能够起一点你的青涩记忆. 一.什么是 IO 流? 想象一个场景 ...

  9. Window下安装并使用InfluxDB可视化工具 —— InfluxDBStudio

    下载 直接访问: https://github.com/CymaticLabs/InfluxDBStudio/releases/tag/v0.2.0-beta.1 创建or编辑InfluxDB 这个软 ...

  10. electron教程(番外篇一): 开发环境及插件, VSCode调试, ESLint + Google JavaScript Style Guide代码规范

    我的electron教程系列 electron教程(一): electron的安装和项目的创建 electron教程(番外篇一): 开发环境及插件, VSCode调试, ESLint + Google ...