python使用hbase

#coding:utf-8

__author__ = 'similarface'

from multiprocessing import Process

import happybase

import os

import re

import hashlib

import multiprocessing

from multiprocessing import Queue

basedir="/tmp/t8"

filterpath="/Users/similarface/Documents/20170303Morgene999ProductFullSNP.txt"

snpkey={}

pattern_barcode= re.compile(r'[0-9]{3}[-][0-9]{4}[-][0-9]{4}')

pattern_ls=re.compile(r'\s+')

def func(filepath,snpkey):

    conn=happybase.Connection(host='192.168.30.250')

    table=conn.table('chipdata')

    barcodes=pattern_barcode.findall(filepath)

    barcode=barcodes[0]

    i=0

    all=0

    with open(filepath,'rb') as foper:

        for line in foper:

            try:

                lines=pattern_ls.split(line.strip())

                chr=lines[1]

                pos=lines[2]

                key=chr+":"+pos

                #print key

                if key in snpkey:

                    all=all+1

                    m = hashlib.md5()

                    m.update(pos.strip())

                    rowkey = m.hexdigest()+":"+chr.upper()

                    dictkey='d:'+barcode

                    columns=[dictkey]

                    rows_as_dict = dict(table.row(rowkey,columns))

                    if rows_as_dict[dictkey]==lines[3]:

                        i=i+1

            except Exception,e:

                pass

    print barcode+":"+format((i+0.0)/all,'0.1%')+"match"+str(i)

        #q.put(barcode+":"+format((i+0.0)/all,'0.1%'))

    conn.close()

def read(q):

    while True:

        value = q.get(True)

        print 'Get %s from queue.' % value

if __name__ == "__main__":

    pool = multiprocessing.Pool(processes = 3)

    snpkey={}

    q = Queue()

    pattern_s=re.compile(r'\s+')

    with open(filterpath,'rb') as oper:

        for line in oper:

            if line.strip()!="":

                lines=pattern_s.split(line.strip())

                snpkey[':'.join(lines[0:2])]=""

    # pr = Process(target=read, args=(q,))

    # pr.start()

    for filename in os.listdir(basedir):

        if filename.endswith("snp"):

            filterpath=os.path.join(basedir,filename)

            pool.apply_async(func, args=(filterpath,snpkey))   #维持执行的进程总数为processes，当一个进程执行完毕后会添加新的进程进去

    print "Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~"

    pool.close()

    pool.join()   #调用join之前，先调用close函数，否则会出错。执行完close后不会有新的进程加入到pool,join函数等待所有子进程结束

    print "Sub-process(es) done."

    #pr.terminate()

python使用hbase的更多相关文章

【hbase】使用thrift with python 访问HBase
HBase 版本: 0.98.6 thrift 版本: 0.9.0 使用 thrift client with python 连接 HBase 报错: Traceback (most recent ...
Hbase理论&&hbase shell&&python操作hbase&&python通过mapreduce操作hbase
一.Hbase搭建: 二.理论知识介绍: 1Hbase介绍: Hbase是分布式.面向列的开源数据库(其实准确的说是面向列族).HDFS为Hbase提供可靠的底层数据存储服务,MapReduce为Hb ...
python 操作 hbase
python 是万能的,当然也可以通过api去操作big database 的hbase了,python是通过thrift去访问操作hbase 以下是在centos7 上安装操作,前提是hbase已经 ...
python连接hbase
安装HBase HBase是一个构建在HDFS上的分布式列存储系统,主要用于海量结构化数据存储.这里,我们的目标只是为Python访问HBase提供一个基本的环境,故直接下载二进制包,采用单机安装.下 ...
ambari安装集群下python连接hbase之安装thrift
简介: python连接hbase是需要通过thrift连进行连接的,ambari安装的服务中貌似没有自带安装hbase的thrift,我是看配置hbase的配置名称里面没有thrift,cdh版本的 ...
【Hbase三】Java,python操作Hbase
Java,python操作Hbase 操作Hbase python操作Hbase 安装Thrift之前所需准备安装Thrift 产生针对Python的Hbase的API 启动Thrift服务执行p ...
Python操作HBase之happybase
安装Thrift 安装Thrift的具体操作,请点击链接 pip install thrift 安装happybase pip install happybase 连接(happybase.Conne ...
python实现Hbase
1. 下载thrift 作用:翻译python语言为hbase语言的工具 2. 运行时先启动hbase 再启动thrift,最后在pycharm中通过happybase包连接hbase 在hbase目 ...
python操作Hbase
本地操作启动thrift服务:./bin/hbase-daemon.sh start thrift hbase模块产生: 下载thrfit源码包:thrift-0.8.0.tar.gz 解压安装 . ...
python thrift hbase安装连接
默认已装好 hbase,我的版本是hbase-0.98.24,并运行 python 2.7.x 步骤: sudo apt-get install automake bison flex g++ git ...

随机推荐

[Contest20180318]求和
题意:求$\sum\limits_{i=1}^n\sum\limits_{j=1}^i\sum\limits_{k=1}^i(i,j,k)$ 先令$f(n)=\sum\limits_{i=1}^n\s ...
【Splay】【块状链表】bzoj3223 Tyvj 1729 文艺平衡树
让蒟蒻见识到了常数大+滥用STL的危害. <法一>很久之前的Splay #include<cstdio> #include<algorithm> using nam ...
webservice_客户端生成工具
1. axis java -Djava.ext.dirs=lib org.apache.axis.wsdl.WSDL2Java -p com.qunar.flight.flagship.provide ...
JAVA 按时间排序
排序使用的是 Collections.sort(List,Comparator) 自定义类实现Comparator接口假如A的值大于B,你返回1.这样调用Collections.sort()方法就是 ...
Linux下CURL设置请求超时时间
使用CURL时,有两个超时时间:一个是连接超时时间,另一个是数据传输的最大允许时间. 连接超时时间用--connect-timeout参数来指定,数据传输的最大允许时间用-m参数来指定. 例如: cu ...
Step by Step 使用HTML5开发一个星际大战游戏（1）
本系列博文翻译自以下文章 http://blog.sklambert.com/html5-canvas-game-panning-a-background/ Languages: HTML5, Jav ...
还原数据库完整sq语句l
use master go declare @dbname varchar ( 20) set @dbname = 'QADB' declare @sql nvarchar ( 500) declar ...
SharePoint 2013 项目部署
SharePoint 2013 项目部署本人刚接触sharepoint不久,是个小菜鸟,而且上手版本是2013,对10和07版也没有太多的了解.最近由于项目需要本人磕磕碰碰部署了sharepoint ...
linux下javadoc生成文件出现中文乱码
javadoc命令的正确使用姿势 javadoc -d apidoc -windowtitle Testing -doctitle 'The API of javadoc' -header 'My c ...
mac os x 安装adb
http://stackoverflow.com/questions/31374085/installing-adb-on-mac-os-x Option 1 - Using Homebrew Thi ...

python使用hbase

python使用hbase的更多相关文章

随机推荐

热门专题