sphinx测试数据生成

import json

from random import sample, randint

from uuid import uuid4

def gen_random_words():

    with open("D:\\exp\\test_data\\dictionary.txt") as f:

        words = [word.strip() for word in f]

        f.close()

        # print "OK. words length:", len(words)

        return sample(words, 3000)

    return []

total_words = 0

def sample_words(search_words, random_words):

    global total_words

    sample_cnt = 1000

    for word in random_words:

        total_words += 1

        if len(search_words) < sample_cnt:

            search_words.append(word)

        else:

            if randint(1, total_words) <= sample_cnt:

                kick_off = randint(0, sample_cnt-1)

                search_words[kick_off] = word

def gen_an_event(words, search_words):

    query_words = sample(words, randint(1, 10))

    sample_words(search_words,query_words)

    title = " ".join(query_words)

    query_words = sample(words, randint(1, 100))

    sample_words(search_words,query_words)

    content = " ".join(query_words)

    event_data = {"title": title, "content": content}

    return event_data

if __name__ == "__main__":

    search_words = []

    for i in range(1):

        words = gen_random_words()

        lines_cnt = 500000

        es_out_put = [""]*lines_cnt

        for i in range(0, lines_cnt):

            event = gen_an_event(words, search_words)

            es_out_put[i] = "        (%d, 2, 9, NOW(), '%s', '%s'), \n" % (i+5, event["title"], event["content"])

        # print es_out_put

        # print splunk_out_put

        out_puts = [es_out_put]

        file_name = str(uuid4()) + ".txt"

        for i,dir_name in enumerate(["Sphinx"]):

            outfile = "D:\\test_data\\%s\\%s" % (dir_name, file_name)

            f = open(outfile, "w")

            for j in range(0, lines_cnt):

                f.write(out_puts[i][j])

            f.close()

            print outfile

    outfile = "D:\\test_data\\search_words2.txt"

    f = open(outfile, "w")

    f.write(json.dumps(search_words))

    f.close()

sql = '''

DROP TABLE IF EXISTS test.documents;

CREATE TABLE test.documents

(

        id                      INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,

        group_id        INTEGER NOT NULL,

        group_id2       INTEGER NOT NULL,

        date_added      DATETIME NOT NULL,

        title           VARCHAR(255) NOT NULL,

        content         TEXT NOT NULL

);

REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES

        ( 1, 1, 5, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),

        ( 2, 1, 6, NOW(), 'test two', 'this is my test document number two' ),

        ( 3, 2, 7, NOW(), 'another doc', 'this is another group' ),

        ( 4, 2, 8, NOW(), 'doc number four', 'this is to test groups' );

DROP TABLE IF EXISTS test.tags;

CREATE TABLE test.tags

(

        docid INTEGER NOT NULL,

        tagid INTEGER NOT NULL,

        UNIQUE(docid,tagid)

);

INSERT INTO test.tags VALUES

        (1,1), (1,3), (1,5), (1,7),

        (2,6), (2,4), (2,2),

        (3,15),

        (4,7), (4,40);

'''

sphinx测试数据生成的更多相关文章

[Dynamic Language] 用Sphinx自动生成python代码注释文档
用Sphinx自动生成python代码注释文档 pip install -U sphinx 安装好了之后,对Python代码的文档,一般使用sphinx-apidoc来自动生成:查看帮助mac-abe ...
收藏清单: python测试数据生成及代码扫描最全工具列表
Test Data manipulation 测试数据的操作和处理 faker - 生成假数据的python库 fake2db - 创建假数据库 ForgeryPy - 使用起来很简单的假数据生成库. ...
使用sphinx快速生成Python API 文档
一简单介绍不管是开源还是闭源,文档都是很重要的.当然理论上说,最好的文档就是代码本身,但是要让所有人都能读懂你的代码这太难了.所以我们要写文档.大部分情况,我们不希望维护一份代码再加上一份文档, ...
使用python编写量子线路打印的简单项目，并使用Sphinx自动化生成API文档
技术背景该文章一方面从量子线路的打印着手,介绍了一个简单的python量子线路工程.同时基于这个简单的小工程,我们顺带的介绍了python的API文档自动化生成工具Sphinx的基本使用方法. 量子 ...
OnlineJudge测试数据生成模板
int类型数据生成一(正数最多4位): #include <bits/stdc++.h> using namespace std; int main() { freopen("t ...
Junit单元测试数据生成工具类
在Junit单元测试中,经常需要对一些领域模型的属性赋值,以便传递给业务类测试,常见的场景如下: com.enation.javashop.Goods goods = new com.enation. ...
MySQL快速生成本地测试数据
利用数据的存储过程生成测试数据: 我们可以通过数据库的的 INSERT 语句直接在存储过程中向普通数据表中添加数据,但是当我们添加到百万数据后,往普通表插入测试数据的性能就会明显降低.所以在这里建议 ...
使用faker 生成测试数据
测试数据生成 faker基础使用 from faker import Faker f=Faker(locale='zh_CN') print(f.name()) address 地址 person 人 ...
Sphinx和coreseek检索引擎
Sphinx是检索英文用,coreseek是检索中文用. Sphinx(斯芬克斯)是一个基于SQL的全文检索引擎,可以结合MySQL,PostgreSQL做全文搜索,它可以提供比数据库本身更专业的搜索 ...

随机推荐

Ubuntu 配置Apache虚拟目录
http://blog.csdn.net/spring21st/article/details/6589300 Ubuntu 配置Apache虚拟目录 http://blog.csdn.net/spr ...
Selenium学习系列---- FirePath的安装和使用
在用Selenium编写测试用例的时候,需要对对网页元素上定位,而现在很多的浏览器是可以看到网页上相关的元素信息,可以查看某一个网页的元素信息,通过定位的方式查找元素.另外安装好Selenium ID ...
xtu summer individual 2 C - Hometask
Hometask Time Limit: 2000ms Memory Limit: 262144KB This problem will be judged on CodeForces. Origin ...
一篇文章告诉你，TLS 1.3 如何用性能为 HTTPS 正名
序•魔戒再现几天前,OpenSSL 官方宣布即将发布的新版本 (OpenSSL 1.1.1) 将会提供 TLS 1.3 的支持,而且还会和之前的 1.1.0 版本完全兼容,这当然是个好消息. ...
数列分段Section II（二分）
洛谷传送门输入时处理出最小的答案和最大的答案,然后二分答案即可. 其余细节看代码 #include <iostream> #include <cstdio> using na ...
在线修改MySQL大表的表结构
由于某个临时需求,需要给在线MySQL的某个超过千万的表增加一个字段.此表在设计之时完全按照需求实现,并没有多余的保留字段. 我们知道在MySQL中如果要执行ALTER TABLE操作,MySQL会通 ...
洛谷——P3353 在你窗外闪耀的星星
P3353 在你窗外闪耀的星星题目描述飞逝的的时光不会模糊我对你的记忆.难以相信从我第一次见到你以来已经过去了3年.我仍然还生动地记得,3年前,在美丽的集美中学,从我看到你微笑着走出教室,你将头向 ...
JavaScript高级程序设计重点(一)
1.一个完整的 JavaScript 实现应该由下列三个不同的部分组成  核心(ECMAScript)  文档对象模型(DOM)  浏览器对象模型(BOM) 2.Undefined 类型只有一 ...
Entity framework自定义字段实现思路
ublic class MyModel { public int MyModelID { get; set; } public string FixedProperty1 { get; set; } ...
虚拟社会(Virtual Society)
虚拟社会(Virtual Society),又称赛博社会(Cyber Society),是指不同网民之间经由计算机.远程通讯终端等技术设备相互连接起来以进行信息的共享.互动与交流,并在其中进行社会交往 ...

sphinx测试数据生成

sphinx测试数据生成的更多相关文章

随机推荐

热门专题