[python]Mongodb

文档:

http://api.mongodb.com/python/current/tutorial.html

安装:

官网直接下载安装, mac上brew安装的下载太慢, 打算手动安装

使用:

开启服务:

 mongod #默认配置开启服务

 mongod -- dpath <db path> # 指定数据库文件路径

连接服务:

 mongo # 默认配置连接

 mongo [options] [db address] [file names (ending in .js)]

图形可视化程序:

https://www.robomongo.org/

shell:

 > help

     db.help()                    help on db methods

     db.mycoll.help()             help on collection methods

     sh.help()                    sharding helpers

     rs.help()                    replica set helpers

     help admin                   administrative help

     help connect                 connecting to a db help

     help keys                    key shortcuts

     help misc                    misc things to know

     help mr                      mapreduce

     show dbs                     show database names

     show collections             show collections in current database

     show users                   show users in current database

     show profile                 show most recent system.profile entries with time >= 1ms

     show logs                    show the accessible logger names

     show log [name]              prints out the last segment of log in memory, 'global' is default

     use <db_name>                set current database

     db.foo.find()                list objects in collection foo

     db.foo.find( { a :  } )     list objects in foo where a ==

     it                           result of the last line evaluated; use to further iterate

     DBQuery.shellBatchSize = x   set default number of items to display on shell

     exit                         quit the mongo shell

more helps...

 > db.help()

 DB methods:

     db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [just calls db.runCommand(...)]

     db.aggregate([pipeline], {options}) - performs a collectionless aggregation on this database; returns a cursor

     db.auth(username, password)

     db.cloneDatabase(fromhost)

     db.commandHelp(name) returns the help for the command

     db.copyDatabase(fromdb, todb, fromhost)

     db.createCollection(name, {size: ..., capped: ..., max: ...})

     db.createView(name, viewOn, [{$operator: {...}}, ...], {viewOptions})

     db.createUser(userDocument)

     db.currentOp() displays currently executing operations in the db

     db.dropDatabase()

     db.eval() - deprecated

     db.fsyncLock() flush data to disk and lock server for backups

     db.fsyncUnlock() unlocks server following a db.fsyncLock()

     db.getCollection(cname) same as db['cname'] or db.cname

     db.getCollectionInfos([filter]) - returns a list that contains the names and options of the db's collections

     db.getCollectionNames()

     db.getLastError() - just returns the err msg string

     db.getLastErrorObj() - return full status object

     db.getLogComponents()

     db.getMongo() get the server connection object

     db.getMongo().setSlaveOk() allow queries on a replication slave server

     db.getName()

     db.getPrevError()

     db.getProfilingLevel() - deprecated

     db.getProfilingStatus() - returns if profiling is on and slow threshold

     db.getReplicationInfo()

     db.getSiblingDB(name) get the db at the same server as this one

     db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set

     db.hostInfo() get details about the server's host

     db.isMaster() check replica primary status

     db.killOp(opid) kills the current operation in the db

     db.listCommands() lists all the db commands

     db.loadServerScripts() loads all the scripts in db.system.js

     db.logout()

     db.printCollectionStats()

     db.printReplicationInfo()

     db.printShardingStatus()

     db.printSlaveReplicationInfo()

     db.dropUser(username)

     db.repairDatabase()

     db.resetError()

     db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into {cmdObj: }

     db.serverStatus()

     db.setLogLevel(level,<component>)

     db.setProfilingLevel(level,slowms) =off =slow =all

     db.setWriteConcern(<write concern doc>) - sets the write concern for writes to the db

     db.unsetWriteConcern(<write concern doc>) - unsets the write concern for writes to the db

     db.setVerboseShell(flag) display extra information in shell output

     db.shutdownServer()

     db.stats()

     db.version() current version of the server

 >

DB methods

 > db.mycoll.help()

 DBCollection help

     db.mycoll.find().help() - show DBCursor help

     db.mycoll.bulkWrite( operations, <optional params> ) - bulk execute write operations, optional parameters are: w, wtimeout, j

     db.mycoll.count( query = {}, <optional params> ) - count the number of documents that matches the query, optional parameters are: limit, skip, hint, maxTimeMS

     db.mycoll.copyTo(newColl) - duplicates collection by copying all documents to newColl; no indexes are copied.

     db.mycoll.convertToCapped(maxBytes) - calls {convertToCapped:'mycoll', size:maxBytes}} command

     db.mycoll.createIndex(keypattern[,options])

     db.mycoll.createIndexes([keypatterns], <options>)

     db.mycoll.dataSize()

     db.mycoll.deleteOne( filter, <optional params> ) - delete first matching document, optional parameters are: w, wtimeout, j

     db.mycoll.deleteMany( filter, <optional params> ) - delete all matching documents, optional parameters are: w, wtimeout, j

     db.mycoll.distinct( key, query, <optional params> ) - e.g. db.mycoll.distinct( 'x' ), optional parameters are: maxTimeMS

     db.mycoll.drop() drop the collection

     db.mycoll.dropIndex(index) - e.g. db.mycoll.dropIndex( "indexName" ) or db.mycoll.dropIndex( { "indexKey" :  } )

     db.mycoll.dropIndexes()

     db.mycoll.ensureIndex(keypattern[,options]) - DEPRECATED, use createIndex() instead

     db.mycoll.explain().help() - show explain help

     db.mycoll.reIndex()

     db.mycoll.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.

                                                   e.g. db.mycoll.find( {x:} , {name:, x:} )

     db.mycoll.find(...).count()

     db.mycoll.find(...).limit(n)

     db.mycoll.find(...).skip(n)

     db.mycoll.find(...).sort(...)

     db.mycoll.findOne([query], [fields], [options], [readConcern])

     db.mycoll.findOneAndDelete( filter, <optional params> ) - delete first matching document, optional parameters are: projection, sort, maxTimeMS

     db.mycoll.findOneAndReplace( filter, replacement, <optional params> ) - replace first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument

     db.mycoll.findOneAndUpdate( filter, update, <optional params> ) - update first matching document, optional parameters are: projection, sort, maxTimeMS, upsert, returnNewDocument

     db.mycoll.getDB() get DB object associated with collection

     db.mycoll.getPlanCache() get query plan cache associated with collection

     db.mycoll.getIndexes()

     db.mycoll.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )

     db.mycoll.insert(obj)

     db.mycoll.insertOne( obj, <optional params> ) - insert a document, optional parameters are: w, wtimeout, j

     db.mycoll.insertMany( [objects], <optional params> ) - insert multiple documents, optional parameters are: w, wtimeout, j

     db.mycoll.mapReduce( mapFunction , reduceFunction , <optional params> )

     db.mycoll.aggregate( [pipeline], <optional params> ) - performs an aggregation on a collection; returns a cursor

     db.mycoll.remove(query)

     db.mycoll.replaceOne( filter, replacement, <optional params> ) - replace the first matching document, optional parameters are: upsert, w, wtimeout, j

     db.mycoll.renameCollection( newName , <dropTarget> ) renames the collection.

     db.mycoll.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name

     db.mycoll.save(obj)

     db.mycoll.stats({scale: N, indexDetails: true/false, indexDetailsKey: <index key>, indexDetailsName: <index name>})

     db.mycoll.storageSize() - includes free space allocated to this collection

     db.mycoll.totalIndexSize() - size in bytes of all the indexes

     db.mycoll.totalSize() - storage allocated for all data and indexes

     db.mycoll.update( query, object[, upsert_bool, multi_bool] ) - instead of two flags, you can pass an object with fields: upsert, multi

     db.mycoll.updateOne( filter, update, <optional params> ) - update the first matching document, optional parameters are: upsert, w, wtimeout, j

     db.mycoll.updateMany( filter, update, <optional params> ) - update all matching documents, optional parameters are: upsert, w, wtimeout, j

     db.mycoll.validate( <full> ) - SLOW

     db.mycoll.getShardVersion() - only for use with sharding

     db.mycoll.getShardDistribution() - prints statistics about data distribution in the cluster

     db.mycoll.getSplitKeysForChunks( <maxChunkSize> ) - calculates split points over all chunks and returns splitter function

     db.mycoll.getWriteConcern() - returns the write concern used for any operations on this collection, inherited from server/db if set

     db.mycoll.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the collection

     db.mycoll.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the collection

     db.mycoll.latencyStats() - display operation latency histograms for this collection

 >

Collection methods

 > sh.help()

     sh.addShard( host )                       server:port OR setname/server:port

     sh.addShardToZone(shard,zone)             adds the shard to the zone

     sh.updateZoneKeyRange(fullName,min,max,zone)      assigns the specified range of the given collection to a zone

     sh.disableBalancing(coll)                 disable balancing on one collection

     sh.enableBalancing(coll)                  re-enable balancing on one collection

     sh.enableSharding(dbname)                 enables sharding on the database dbname

     sh.getBalancerState()                     returns whether the balancer is enabled

     sh.isBalancerRunning()                    return true if the balancer has work in progress on any mongos

     sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to 'to' (name of shard)

     sh.removeShardFromZone(shard,zone)      removes the shard from zone

     sh.removeRangeFromZone(fullName,min,max)   removes the range of the given collection from any zone

     sh.shardCollection(fullName,key,unique,options)   shards the collection

     sh.splitAt(fullName,middle)               splits the chunk that middle is in at middle

     sh.splitFind(fullName,find)               splits the chunk that find is in at the median

     sh.startBalancer()                        starts the balancer so chunks are balanced automatically

     sh.status()                               prints a general overview of the cluster

     sh.stopBalancer()                         stops the balancer so chunks are not balanced automatically

     sh.disableAutoSplit()                   disable autoSplit on one collection

     sh.enableAutoSplit()                    re-enable autoSplit on one collection

     sh.getShouldAutoSplit()                 returns whether autosplit is enabled

 >

sharding helpers

 > rs.help()

     rs.status()                                { replSetGetStatus :  } checks repl set status

     rs.initiate()                              { replSetInitiate : null } initiates set with default settings

     rs.initiate(cfg)                           { replSetInitiate : cfg } initiates set with configuration cfg

     rs.conf()                                  get the current configuration object from local.system.replset

     rs.reconfig(cfg)                           updates the configuration of a running replica set with cfg (disconnects)

     rs.add(hostportstr)                        add a new member to the set with default attributes (disconnects)

     rs.add(membercfgobj)                       add a new member to the set with extra attributes (disconnects)

     rs.addArb(hostportstr)                     add a new member which is arbiterOnly:true (disconnects)

     rs.stepDown([stepdownSecs, catchUpSecs])   step down as primary (disconnects)

     rs.syncFrom(hostportstr)                   make a secondary sync from the given member

     rs.freeze(secs)                            make a node ineligible to become primary for the time specified

     rs.remove(hostportstr)                     remove a host from the replica set (disconnects)

     rs.slaveOk()                               allow queries on secondary nodes

     rs.printReplicationInfo()                  check oplog size and time range

     rs.printSlaveReplicationInfo()             check replica set members and replication lag

     db.isMaster()                              check who is primary

     reconfiguration helpers disconnect from the database so the shell will display

     an error, even if the command succeeds.

 >

replica set helpers

 > help admin

     ls([path])                      list files

     pwd()                           returns current directory

     listFiles([path])               returns file list

     hostname()                      returns name of this host

     cat(fname)                      returns contents of text file as a string

     removeFile(f)                   delete a file or directory

     load(jsfilename)                load and execute a .js file

     run(program[, args...])         spawn a program and wait for its completion

     runProgram(program[, args...])  same as run(), above

     sleep(m)                        sleep m milliseconds

     getMemInfo()                    diagnostic

 >

administrative help

 > help connect

 Normally one specifies the server on the mongo shell command line.  Run mongo --help to see those options.

 Additional connections may be opened:

     var x = new Mongo('host[:port]');

     var mydb = x.getDB('mydb');

   or

     var mydb = connect('host[:port]/mydb');

 Note: the REPL prompt only auto-reports getLastError() for the shell command line connection.

 >

connect db help

 > help keys

 Tab completion and command history is available at the command prompt.

 Some emacs keystrokes are available too:

   Ctrl-A start of line

   Ctrl-E end of line

   Ctrl-K del to end of line

 Multi-line commands

 You can enter a multi line javascript expression.  If parens, braces, etc. are not closed, you will see a new line

 beginning with '...' characters.  Type the rest of your expression.  Press Ctrl-C to abort the data entry if you

 get stuck.

 >

shotcut keys

 > help misc

     b = new BinData(subtype,base64str)  create a BSON BinData value

     b.subtype()                         the BinData subtype (..)

     b.length()                          length of the BinData data in bytes

     b.hex()                             the data as a hex encoded string

     b.base64()                          the data as a base  encoded string

     b.toString()

     b = HexData(subtype,hexstr)         create a BSON BinData value from a hex string

     b = UUID(hexstr)                    create a BSON BinData value of UUID subtype

     b = MD5(hexstr)                     create a BSON BinData value of MD5 subtype

     "hexstr"                            string, sequence of hex characters (no 0x prefix)

     o = new ObjectId()                  create a new ObjectId

     o.getTimestamp()                    return timestamp derived from first  bits of the OID

     o.isObjectId

     o.toString()

     o.equals(otherid)

     d = ISODate()                       like Date() but behaves more intuitively when used

     d = ISODate('YYYY-MM-DD hh:mm:ss')    without an explicit "new " prefix on construction

 >

misc

 > help mr

 See also http://dochub.mongodb.org/core/mapreduce

 function mapf() {

   // 'this' holds current document to inspect

   emit(key, value);

 }

 function reducef(key,value_array) {

   return reduced_value;

 }

 db.mycollection.mapReduce(mapf, reducef[, options])

 options

 {[query : <query filter object>]

  [, sort : <sort the query.  useful for optimization>]

  [, limit : <number of objects to return from collection>]

  [, out : <output-collection name>]

  [, keeptemp: <true|false>]

  [, finalize : <finalizefunction>]

  [, scope : <object where fields go into javascript global scope >]

  [, verbose : true]}

 >

python驱动

pip install pymongo

scrapy:

settings.py

 ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ]

 MONGODB_SERVER = "localhost"

 MONGODB_PORT = 27017

 MONGODB_DB = "stackoverflow"

 MONGODB_COLLECTION = "questions"

piplines.py

 import pymongo

 from scrapy.conf import settings

 from scrapy.exceptions import DropItem

 from scrapy import log

 class MongoDBPipeline(object):

     def __init__(self):

         connection = pymongo.MongoClient(

             settings['MONGODB_SERVER'],

             settings['MONGODB_PORT']

         )

         db = connection[settings['MONGODB_DB']]

         self.collection = db[settings['MONGODB_COLLECTION']]

     def process_item(self, item, spider):

         valid = True

         for data in item:

             if not data:

                 valid = False

                 raise DropItem("Missing {0}!".format(data))

         if valid:

             self.collection.insert(dict(item))

             log.msg("Question added to MongoDB database!",

                     level=log.DEBUG, spider=spider)

         return item

scrapy 官方文档 https://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb:

piplines.py

 import pymongo

 class MongoPipeline(object):

     collection_name = 'scrapy_items'

     def __init__(self, mongo_uri, mongo_db):

         self.mongo_uri = mongo_uri

         self.mongo_db = mongo_db

     @classmethod

     def from_crawler(cls, crawler):

         return cls(

             mongo_uri=crawler.settings.get('MONGO_URI'),

             mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')

         )

     def open_spider(self, spider):

         self.client = pymongo.MongoClient(self.mongo_uri)

         self.db = self.client[self.mongo_db]

     def close_spider(self, spider):

         self.client.close()

     def process_item(self, item, spider):

         self.db[self.collection_name].insert_one(dict(item))

         return item

[python]Mongodb的更多相关文章

Python Mongodb接口
Python Mongodb接口 MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写.旨在为 WEB 应用提供可扩展的高性能数据存储解决方案. 同时,MongoDB 是一个介于关系 ...
python+MongoDB使用示例
本博客起源于博主的大三NoSQL课程设计,采用python+MongoDB结合方式,将数据从txt文件导入MongoDB之中,再将其取出以作图.主要技术是采用python与MongoDB结合存储读取方 ...
Python MongoDB 教程
基于菜鸟教程实际操作后总结而来 Python MongoDB MongoDB 是目前最流行的 NoSQL 数据库之一,使用的数据类型 BSON(类似 JSON). MongoDB 数据库安装与介绍可以 ...
吴裕雄--天生自然python学习笔记：Python MongoDB
MongoDB 是目前最流行的 NoSQL 数据库之一,使用的数据类型 BSON(类似 JSON). PyMongo Python 要连接 MongoDB 需要 MongoDB 驱动,这里我们使用 P ...
Python MongoDB使用介绍
MongoDB介绍 MongoDB是一个面向文档的,开源数据库程序,它平台无关.MongoDB像其他一些NoSQL数据库(但不是全部!)使用JSON结构的文档存储数据.这是使得数据非常灵活,不需要的S ...
python&MongoDB爬取图书馆借阅记录（没有验证码）
题外话:这个爬虫本来是想用java完成然后发布在博客园里的,但是一直用java都失败了,最后看到别人用了python,然后自己就找别人问了问关键的知识点,发现连接那部分,python只用了19行!!! ...
Python mongoDB 的简单操作
#!/usr/bin/env python # coding:utf-8 # Filename:mongodb.py from pymongo import MongoClient,ASCENDING ...
windows 7下安装python+mongodb
1. python安装下载:http://python.org/download/ 直接双击安装,安装完后将路径加入系统环境变量path中. 2. mongodb安装下载:http://www.m ...
数据抓取分析（python + mongodb）
分享点干货!!! Python数据抓取分析编程模块:requests,lxml,pymongo,time,BeautifulSoup 首先获取所有产品的分类网址: def step(): try: ...
python数据抓取分析（python + mongodb）
分享点干货!!! Python数据抓取分析编程模块:requests,lxml,pymongo,time,BeautifulSoup 首先获取所有产品的分类网址: def step(): try: ...

随机推荐

Linux 踩坑记
# Linux docker内部执行apt-get install 报错在某个项目中使用docker构建mysql容器后想要在容器中修改mysql的配置文件,使用vim后提示 bash: vim: ...
Antd将Table导出为Excel
Antd将Table导出为Excel 在最近的项目中,需要把表格中的数据导出给财务进行统计,网上很多一键导出的按钮都没用.经过东拼西凑,最终搞定了导出,自己封装了组件. import { File } ...
Django之models字段属性
目录常用字段 AutoField IntegerField CharField 自定义及使用char DateField DateTimeField 字段合集字段参数 null unique db ...
C#反射与特性(六)：设计一个仿ASP.NETCore依赖注入Web
目录 1,编写依赖注入框架 1.1 路由索引 1.2 依赖实例化 1.3 实例化类型.依赖注入.调用方法 2,编写控制器和参数类型 2.1 编写类型 2.2 实现控制器 3,实现低配山寨 ASP.NE ...
CI框架获取post和get参数_CodeIgniter使用心得
请参考:CI文档的输入类部分: $this->input->post()$this->input->get() -------------------------------- ...
PQSQL 按照时间进行分组
按照时间分组时一般是按照年.月.日进行分组,不会把时分秒也算进去,所以需要把时间戳提取出所需要的时间段,本质上是把时间戳格式化成对应形式的字符串,这个过程需要用to_char(timestamp, t ...
解决：'chromedriver' executable needs to be in PATH的问题
0.前言今天写一个B站登录的模拟器时,用到了Chrome浏览器,但是会报了一个异常"'chromedriver' executable needs to be in PATH", ...
rest实践2
通过url读取图片资源其他的上传图片和对应的添加信息到数据库等的相关操作则引入crud来操作,编写相关代码的话==>要引入相关的crud包.
Jenkins配置邮件发送测试报告
前言在之前的文章(Jenkins自动执行python脚本输出测试报告)中,我们已成功实现利用Jenkins自动执行python脚本,输出并可直接在界面上查看测试报告,这里我们还差最后一步,我们需要将 ...
springboot 报错nested exception is java.lang.IllegalStateException: Failed to check the status of the service xxxService No provider available for the service
spring: dubbo:#关闭所有服务的启动时检查:(没有提供者时报错) consumer: check: false timeout: 3000

[python]Mongodb

[python]Mongodb的更多相关文章

随机推荐

热门专题