Linguistic corpora 种子语料库-待分析对象-分析与更新语料库
Computational Linguistics
http://matplotlib.org/
https://github.com/matplotlib/matplotlib/blob/master/INSTALL#L59
http://www.nltk.org/book/ch01.html#id9

C:\Users\w>python -m pip install --upgrade pip
Collecting pip
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.reque
Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(
Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB)
100% |████████████████████████████████| 1.3MB 14kB/s
Installing collected packages: pip
Found existing installation: pip 8.1.1
Uninstalling pip-8.1.1:
Successfully uninstalled pip-8.1.1
Successfully installed pip-9.0.1 C:\Users\w>python -m pip install matplotlib
Collecting matplotlib
Downloading matplotlib-1.5.3-cp35-cp35m-win_amd64.whl (6.5MB)
100% |████████████████████████████████| 6.5MB 30kB/s
Collecting pytz (from matplotlib)
Downloading pytz-2016.10-py2.py3-none-any.whl (483kB)
100% |████████████████████████████████| 491kB 35kB/s
Collecting pyparsing!=2.0.4,!=2.1.2,>=1.5.6 (from matplotlib)
Downloading pyparsing-2.1.10-py2.py3-none-any.whl (56kB)
100% |████████████████████████████████| 61kB 29kB/s
Collecting numpy>=1.6 (from matplotlib)
Downloading numpy-1.11.2-cp35-none-win_amd64.whl (7.6MB)
100% |████████████████████████████████| 7.6MB 32kB/s
Collecting cycler (from matplotlib)
Downloading cycler-0.10.0-py2.py3-none-any.whl
Collecting python-dateutil (from matplotlib)
Downloading python_dateutil-2.6.0-py2.py3-none-any.whl (194kB)
100% |████████████████████████████████| 194kB 46kB/s
Collecting six (from cycler->matplotlib)
Downloading six-1.10.0-py2.py3-none-any.whl
Installing collected packages: pytz, pyparsing, numpy, six, cycler, python-dateutil, matplotlib
Successfully installed cycler-0.10.0 matplotlib-1.5.3 numpy-1.11.2 pyparsing-2.1.10 python-dateutil-2.6.0 pytz-2016.10 six-1.10.0
text4.dispersion_plot(["kate","he","she","jack"])
<p id="w_last" style="color: red; font-size: 6em;">w-WAITING---</p><br>
<?php
include('conn.php');
//http://www.baidu.com/s?wd=%E5%8F%96%E8%8B%B1%E6%96%87%E5%90%8D
$w_db_incr_girl = 0;
for ($w = 0; $w < 153; $w++) {
$wgirl = 'http://api.open.baidu.com/pae/channel/data/asyncqury?appid=4036&srcid=4036&from_mid=1&format=json&ie=utf-8&oe=utf-8&subtitle=%E8%8B%B1%E6%96%87%E5%90%8D&query=%E8%8B%B1%E6%96%87%E5%90%8D&rn=5&stat1=%E5%A5%B3%E7%94%9F&pn=' . (5 * $w) . '&srcid=4036&cb=jQuery110205654252001601794_1481004786057&_=' . (1481004786059 + $w);
$w_file = file_get_contents($wgirl);
$partten = '/\"englishname\"\:\"\w{0,}\"/';
$w_name = preg_match_all($partten, $w_file, $matches, PREG_SET_ORDER);
$tmp = 0;
foreach ($matches AS $one) {
if ($tmp % 3 == 2) {
$given_name = substr($one[0], 15, strlen($one[0]) - 15 - 1);
$sql = 'INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $wgirl . '",0)';
if (mysqli_query($link, $sql)) {
$w_db_incr_girl++;
};
}
$tmp++;
}
} $w_db_incr_boy = 0;
for ($w = 0; $w < 153; $w++) { // $wgirl = 'http://api.open.baidu.com/pae/channel/data/asyncqury?appid=4036&srcid=4036&from_mid=1&format=json&ie=utf-8&oe=utf-8&subtitle=%E8%8B%B1%E6%96%87%E5%90%8D&query=%E8%8B%B1%E6%96%87%E5%90%8D&rn=5&stat1=%E5%A5%B3%E7%94%9F&pn='.(5*$w).'&srcid=4036&cb=jQuery110205654252001601794_1481004786057&_='.(1481004786059+$w); $wboy = 'http://api.open.baidu.com/pae/channel/data/asyncqury?appid=4036&srcid=4036&from_mid=1&format=json&ie=utf-8&oe=utf-8&subtitle=%E8%8B%B1%E6%96%87%E5%90%8D&query=%E8%8B%B1%E6%96%87%E5%90%8D&rn=5&pn=' . (5 * $w) . '&srcid=4036&stat1=%E7%94%B7%E7%94%9F&cb=jQuery1102017382318514491035_1481005337608&_=' . (1481004786059 + $w);
$w_file = file_get_contents($wboy);
$partten = '/\"englishname\"\:\"\w{0,}\"/';
$w_name = preg_match_all($partten, $w_file, $matches, PREG_SET_ORDER); $tmp = 0;
foreach ($matches AS $one) {
if ($tmp % 3 == 2) {
$given_name = substr($one[0], 15, strlen($one[0]) - 15 - 1);
$sql = 'INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $wboy . '",1)';
if (mysqli_query($link, $sql)) {
$w_db_incr_boy++;
};
}
$tmp++;
}
} $w_arr = array_merge(range('A', 'Z'));
//http://ename.dict.cn/list/female/R/2
foreach ($w_arr AS $w_range) {
for ($w = 1; $w < 8; $w++) {
$wgirl = 'http://ename.dict.cn/list/female/' . $w_range . '/' . $w;
$w_file = file_get_contents($wgirl);
$partten = '/' . 'href=\"\/\w{0,}\"\>' . '/';
$w_name = preg_match_all($partten, $w_file, $matches, PREG_SET_ORDER);
foreach ($matches AS $one) {
$given_name = substr($one[0], 7, strlen($one[0]) - 7 - 2);
$sql = 'INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $wgirl . '",0)';
if (mysqli_query($link, $sql)) {
$w_db_incr_girl++;
};
}
} for ($w = 1; $w < 8; $w++) {
$wboy = 'http://ename.dict.cn/list/male/' . $w_range . '/' . $w;
$w_file = file_get_contents($wboy);
$partten = '/' . 'href=\"\/\w{0,}\"\>' . '/';
$w_name = preg_match_all($partten, $w_file, $matches, PREG_SET_ORDER);
foreach ($matches AS $one) {
$given_name = substr($one[0], 7, strlen($one[0]) - 7 - 2);
$sql = 'INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $wboy . '",1)';
if (mysqli_query($link, $sql)) {
$w_db_incr_boy++;
};
}
}
} ?>
<script>
document.getElementById('w_last').innerHTML = 'w_db_incr_girl\'s=<?= $w_db_incr_girl?>,w_db_incr_boy\'s=' +<?= $w_db_incr_boy?>;
</script>
$sql_db_check = 'SEELCT isboy FROM namelist WHERE given_name="'.$given_name.'"';
$check = db_multiple_rows_link($link, $sql_db_check);
if(count($check)==2){
$isboy = 2;
}elseif(count($check)==1){
$isboy = $check['isboy'];
}elseif(count($check)==0){
$w_arr = w_cross_domian_name_isboy($given_name);
//var_dump($w_arr);
$isboy = $w_arr['w_code'];
$grab_url = $w_arr['w_url'];
if($isboy!=4){
if($isboy==1){
$sql_w ='INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $grab_url . '",1)';
}elseif($isboy==0){
$sql_w ='INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $grab_url . '",0)';
}elseif($isboy==2){
$sql_w ='INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $grab_url . '",1)'.';';
$sql_w .='INSERT INTO namelist (given_name,grab_url,isboy) VALUES ("' . $given_name . '","' . $grab_url . '",0)';
}
// var_dump($sql_w);
mysqli_multi_query($link,$sql_w);
function w_given_name($wstr)
{
$given_name = strstr($wstr, ' ', TRUE);
if (empty($given_name)) $given_name = ltrim($wstr);
$given_name = strtoupper(substr($given_name, 0, 1)) . strtolower(substr($given_name, 1));
RETURN $given_name;
} //http://dict.youdao.com/w/eng/Tommy/#keyfrom=dict2.index
//http://dict.youdao.com/w/eng/Chris/#keyfrom=dict2.index
//http://dict.youdao.com/w/eng/Billie/#keyfrom=dict2.index
//http://dict.youdao.com/w/eng/Mikhael/#keyfrom=dict2.index
function w_cross_domian_name_isboy($name)
{
$url = 'http://dict.youdao.com/w/eng/' . $name . '/#keyfrom=dict2.index';
$w_file = file_get_contents($url);
// $wfile = fopen('w.w', 'w');
//fwrite($wfile, $w_file); $partten = '/' . '您要找的是不是' . '/';
preg_match_all($partten, $w_file, $matches_spell, PREG_SET_ORDER);
if (!empty($matches_spell)) {
} else {
$partten = '/' . '男子名' . '/';
preg_match_all($partten, $w_file, $matches_boy, PREG_SET_ORDER);
$partten = '/' . '女子名' . '/';
preg_match_all($partten, $w_file, $matches_girl, PREG_SET_ORDER);
} $w = array();
$w['w_url'] = $url;
$w['w_code'] = 4;
if (!empty($matches_spell) || (empty($matches_boy) && empty($matches_girl))) {
} elseif (!empty($matches_boy) && !empty($matches_girl)) {
$w['w_code'] = 2;
} elseif (!empty($matches_boy)) {
$w['w_code'] = 1;
} elseif (!empty($matches_girl)) {
$w['w_code'] = 0;
}
RETURN $w;
}
Linguistic corpora 种子语料库-待分析对象-分析与更新语料库的更多相关文章
- .Net 内存对象分析
在生产环境中,通过运行日志我们会发现一些异常问题,此时,我们不能直接拿VS远程到服务器上调试,同时日志输出的信息无法百分百反映内存中对象的状态,比如说我们想查看进程中所有的Socket连接状态.服务路 ...
- 序列化与反序列化、def的介绍与快速使用、cbv源码分析、APIView与request对象分析
今日内容概要 序列化与反序列化 def介绍和快速使用 cbv源码流程分析 drf之APIView和Request对象分析 内容详细 1.序列化和反序列化 # api接口开发 最核心最常见的一个过程就是 ...
- mybatis 04: mybatis对象分析 + 测试代码简化 + 配置优化
MyBatis对象分析 测试代码示例 package com.example.test; import com.example.pojo.Student; import org.apache.ibat ...
- Android核心分析 之一分析方法论探讨之设计意图
为什么要研究Android,是因为它够庞大,它够复杂,他激起了我作为一个程序员的内心的渴望,渴望理解这种复杂性.我研究的对象是作为手机开发平台的Android软件系统部分,而不是Dalvik虚拟机本身 ...
- MapReduce源代码分析MapTask分析
前言 MapReduce该分析是基于源代码Hadoop1.2.1代码分析进行的基础上. 该章节会分析在MapTask端的详细处理流程以及MapOutputCollector是怎样处理map之后的col ...
- x264源代码 概述 框架分析 架构分析
函数背景色 函数在图中以方框的形式表现出来.不同的背景色标志了该函数不同的作用: 白色背景的函数:不加区分的普通内部函数. 浅红背景的函数:libx264类库的接口函数(API). 粉红色背景函数:滤 ...
- 转:[gevent源码分析] 深度分析gevent运行流程
[gevent源码分析] 深度分析gevent运行流程 http://blog.csdn.net/yueguanghaidao/article/details/24281751 一直对gevent运行 ...
- Qt Creator Valgrind内存分析前端(分析Nginx内存)
Linux上使用Qt Creator进行C/C++开发http://my.oschina.net/eechen/blog/166969Qt Creator GDB调试前端(调试Nginx):http: ...
- Python之路,Day22 - 网站用户访问质量分析监测分析项目开发
Python之路,Day22 - 网站用户访问质量分析监测分析项目开发 做此项目前请先阅读 http://3060674.blog.51cto.com/3050674/1439129 项目实战之 ...
随机推荐
- 项目如何脱离TFS 2010的管理
在VS 里,文件->源代码管理->更改源代码管理->取消绑定.
- C语言有字符串这种数据类型吗?
C/C++语言 用 char 数组 存放 字符串.例如: char str[]="abcd 1234";char *ss = "1234 XYZ";printf ...
- ListView系列(七)——Adapter内的onItemClick监听器四个arg参数 (转)
举个例子你会理解的更快:X, Y两个listview,X里有1,2,3,4这4个item,Y里有a,b,c,d这4个item.如果你点了b这个item.如下: public void onItemCl ...
- AppInventor学习笔记(五)——瓢虫快跑应用学习
一.瓢虫引入 1:加入控件: 先引入方框中含有的控件,里面有两个画布,一个图像精灵,一个重力感应,一个时钟(设为10Ms).顺手改名.设置图像精灵的移动speed为10,并且引入瓢虫的图像 2.加入逻 ...
- ThinkPHP中where()方法的使用
where方法的用法是ThinkPHP查询语言的精髓,也是ThinkPHP ORM的重要组成部分和亮点所在,可以完成包括普通查询.表达式查询.快捷查询.区间查询.组合查询在内的查询操作.where方法 ...
- JS一个根据时区输出时区时间的函数
做项目遇到的坑爹问题,需要根据时区获取时区中轴线的时间.为此搜了好久网上都没什么JS的代码描述到这一方面,最后自己翻了下高中地理才写了个函数出来. 此图可以看出来,全球分为了0时区,东西1-11区,第 ...
- DFS HDOJ 2614 Beat
题目传送门 /* 题意:处理完i问题后去处理j问题,要满足a[i][j] <= a[j][k],问最多能有多少问题可以解决 DFS简单题:以每次处理的问题作为过程(即行数),最多能解决n个问题, ...
- POJ3189 Steady Cow Assignment(最大流)
题目大概说,有n头牛和b块草地,每头牛心中分别对每块草地都有排名,草地在牛中排名越高牛安排在那的幸福度就越小(...),每块草地都能容纳一定数量的牛.现在要给这n头牛分配草地,牛中的幸福度最大与幸福度 ...
- SPOJ371 Boxes(最小费用最大流)
把球当作水. #include<cstdio> #include<cstring> #include<queue> #include<algorithm> ...
- BZOJ2310 : ParkII
单路径最大和问题,设f[i][j][S]表示到达(i,j),轮廓线状态为S的最优解. S用4进制m+1位数表示,0表示无插头,1表示左括号,2表示右括号,3表示独立插头. 在DP之前先进行一次预处理, ...