统计某几个词在文章出现的次数

-file参数分发,是从客户端分发到各个执行mapreduce端的机器上

1.找一篇文章The_Man_of_Property.txt如下:

He was proud of him! He could not but feel that in similar circumstances he himself would have been tempted to enlarge his replies, but his instinct told him that this taciturnity was the very thing. He sighed with relief, however, when Soames, slowly turning, and without any change of expression, descended from the box.
When it came to the turn of Bosinney’s Counsel to address the Judge, James redoubled his attention, and he searched the Court again and again to see if Bosinney were not somewhere concealed.
Young Chankery began nervously; he was placed by Bosinney’s absence in an awkward position. He therefore did his best to turn that absence to account.
He could not but fear — he said — that his client had met with an accident. He had fully expected him there to give evidence; they had sent round that morning both to Mr. Bosinney’s office and to his rooms (though he knew they were one and the same, he thought it was as well not to say so), but it was not known where he was, and this he considered to be ominous, knowing how anxious Mr. Bosinney had been to give his evidence. He had not, however, been instructed to apply for an adjournment, and in default of such instruction he conceived it his duty to go on. The plea on which he somewhat confidently relied, and which his client, had he not unfortunately been prevented in some way from attending, would have supported by his evidence, was that such an expression as a ‘free hand’ could not be limited, fettered, and rendered unmeaning, by any verbiage which might follow it. He would go further and say that the correspondence showed that whatever he might have said in his evidence, Mr. Forsyte had in fact never contemplated repudiating liability on any of the work ordered or executed by his architect. The defendant had certainly never contemplated such a contingency, or, as was demonstrated by his letters, he would never have proceeded with the work — a work of extreme delicacy, carried out with great care and efficiency, to meet and satisfy the fastidious taste of a connoisseur, a rich man, a man of property. He felt strongly on this point, and feeling strongly he used, perhaps, rather strong words when he said that this action was of a most unjustifiable, unexpected, indeed — unprecedented character. If his Lordship had had the opportunity that he himself had made it his duty to take, to go over this very fine house and see the great delicacy and beauty of the decorations executed by his client — an artist in his most honourable profession — he felt convinced that not for one moment would his Lordship tolerate this, he would use no stronger word than daring attempt to evade legitimate responsibility.
Taking the text of Soames’ letters, he lightly touched on ‘Boileau v. The Blasted Cement Company, Limited.’ “It is doubtful,” he said, “what that authority has decided; in any case I would submit that it is just as much in my favour as in my friend’s.” He then argued the ‘nice point’ closely. With all due deference he submitted that Mr. Forsyte’s expression nullified itself. His client not being a rich man, the matter was a serious one for him; he was a very talented architect, whose professional reputation was undoubtedly somewhat at stake. He concluded with a perhaps too personal appeal to the Judge, as a lover of the arts, to show himself the protector of artists, from what was occasionally — he said occasionally — the too iron hand of capital. “What,” he said, “will be the position of the artistic professions, if men of property like this Mr. Forsyte refuse, and are allowed to refuse, to carry out the obligations of the commissions which they have given.” He would now call his client, in case he should at the last moment have found himself able to be present.
The name Philip Baynes Bosinney was called three times by the Ushers, and the sound of the calling echoed with strange melancholy throughout the Court and Galleries.
The crying of this name, to which no answer was returned, had upon James a curious effect: it was like calling for your lost dog about the streets. And the creepy feeling that it gave him, of a man missing, grated on his sense of comfort and security-on his cosiness. Though he could not have said why, it made him feel uneasy.
He looked now at the clock — a quarter to three! It would be all over in a quarter of an hour. Where could the young fellow be?
It was only when Mr. Justice Bentham delivered judgment that he got over the turn he had received.
Behind the wooden erection, by which he was fenced from more ordinary mortals, the learned Judge leaned forward. The electric light, just turned on above his head, fell on his face, and mellowed it to an orange hue beneath the snowy crown of his wig; the amplitude of his robes grew before the eye; his whole figure, facing the comparative dusk of the Court, radiated like some majestic and sacred body. He cleared his throat, took a sip of water, broke the nib of a quill against the desk, and, folding his bony hands before him, began.
A divorce! Thus close, the word was paralyzing, so utterly at variance with all the principles that had hitherto guided his life. Its lack of compromise appalled him; he felt — like the captain of a ship, going to the side of his vessel, and, with his own hands throwing over the most precious of his bales. This jettisoning of his property with his own hand seemed uncanny to Soames. It would injure him in his profession: He would have to get rid of the house at Robin Hill, on which he had spent so much money, so much anticipation — and at a sacrifice. And she! She would no longer belong to him, not even in name! She would pass out of his life, and he — he should never see her again!
He traversed in the cab the length of a street without getting beyond the thought that he should never see her again!
r her hair; and at this scent the burning sickness of his jealousy seized him again.
Struggling into his fur,the watch was a three-cornered note addressed ‘Soames Forsyte,’ in‘Ierceived under the softness and immobility of this figure something desperate and resolved; something not to be turned away, something dangerous. She tore off her hat, and, putting both hands to her brow, pressed back the bronze mass of her hair.

2.将文章上传至hdfs文件系统/mapreduce目录下

hadoop  fs  -put  The_Man_of_Property.txt    /mapreduce

3.白名单文件white_list如下,统计白名单文件中单词在上述文章The_Man_of_Property.txt出现的次数

suitable
against
recent
across

4.map.py(map端代码如下)

#!usr/bin/python
import sys
def read_local_file(file):
word_set = set()
file_in = open (file,'r')
for line in file_in:
word = line.strip()
word_set.add(word)
return word_set
def mapper_func(file):
word_set=read_local_file(file) for line in sys.stdin:
ss=line.strip().split()
for word in ss:
word.strip()
if word != "" and (word in word_set):
print "%s\t%s"%(word,"") if __name__ == "__main__":
func = getattr(sys.modules[__name__],sys.argv[1])
args = None
if len(sys.argv) > 1:
args = sys.argv[2:]
func(*args)

5.red.py(reduce端代码如下)

#!usr/bin/python
import sys
def reducer_func():
word="None"
sum=0
for line in sys.stdin:
ss=line.split()
cur_word=ss[0]
cnt=int(ss[1])
if cur_word!=word:
if word!="None":
print "%s\t%s"%(word,sum)
word=cur_word
sum=0
else:
sum+=cnt
print "%s\t%s"%(word,sum)
if __name__ == "__main__":
func = getattr(sys.modules[__name__],sys.argv[1])
args = None
if len(sys.argv) > 1:
args=sys.argv[2:]
func(*args)

6.run.sh脚本如下:设置2个map,提前将输出路径删除(mapreduce不允许输出路径存在)

HADOOP="/usr/local/src/hadoop-1.2.1/bin/hadoop"
HADOOP_STREAMING="/usr/local/src/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar"
INPUT_PATH="/mapreduce/The_Man_of_Property.txt"
OUTPUT_PATH="/mapreduce/out"
$HADOOP fs -rmr $OUTPUT_PATH
$HADOOP jar $HADOOP_STREAMING \
-input "$INPUT_PATH" \
-output "$OUTPUT_PATH" \
-mapper "python map.py mapper_func white_list" \
-reducer "python red.py reducer_func" \
-file "./map.py"\
-file "./red.py"\
-file "./white_list"\
-jobconf "mapred.map.tasks=2"

7.运行shell脚本

bash run.sh

大数据python词频统计之本地分发-file的更多相关文章

  1. 大数据python词频统计之hdfs分发-cacheArchive

    -cacheArchive也是从hdfs上进分发,但是分发文件是一个压缩包,压缩包内可能会包含多层目录多个文件 1.The_Man_of_Property.txt文件如下(将其上传至hdfs上) ha ...

  2. 大数据python词频统计之hdfs分发-cacheFile

    -cacheFile 分发,文件事先上传至Hdfs上,分发的是一个文件 1.找一篇文章The_Man_of_Property.txt: He was proud of him! He could no ...

  3. Python 词频统计

    利用Python做一个词频统计 GitHub地址:FightingBob [Give me a star , thanks.] 词频统计 对纯英语的文本文件[Eg: 瓦尔登湖(英文版).txt]的英文 ...

  4. 大数据Python学习大纲

    最近公司在写一个课程<大数据运维实训课>,分为4个部分,linux实训课.Python开发.hadoop基础知识和项目实战.这门课程主要针对刚从学校毕业的学生去应聘时不会像一个小白菜一样被 ...

  5. python词频统计及其效能分析

    1) 博客开头给出自己的基本信息,格式建议如下: 学号2017****7128 姓名:肖文秀 词频统计及其效能分析仓库:https://gitee.com/aichenxi/word_frequenc ...

  6. 大数据理论篇HDFS的基石——Google File System

    Google File System 但凡是要开始讲大数据的,都绕不开最初的Google三驾马车:Google File System(GFS), MapReduce,BigTable. 为这一切的基 ...

  7. 入门大数据---Python基础

    前言 由于AI的发展,包括Python集成了很多计算库,所以淡入了人们的视野,成为一个极力追捧的语言. 首先概括下Python中文含义是蟒蛇,它是一个胶水语言和一个脚本语言,胶水的意思是能和多种语言集 ...

  8. python词频统计

    1.jieba 库 -中文分词库 words = jieba.lcut(str)  --->列表,词语 count = {} for word in words: if len(word)==1 ...

  9. 软工之词频统计器及基于sketch在大数据下的词频统计设计

    目录 摘要 算法关键 红黑树 稳定排序 代码框架 .h文件: .cpp文件 频率统计器的实现 接口设计与实现 接口设计 核心功能词频统计器流程 效果 单元测试 性能分析 性能分析图 问题发现 解决方案 ...

随机推荐

  1. 【二】Spring Cloud 入门

    官网 版本号: SpringCloud中文网:https://springcloud.cc SpringCloud中文社区:http://springcloud.cn 以下代码就是Maven父子工程, ...

  2. 清除 SQL Server Management Studio 服务器名称历史记录

    Ø  前言 在开发过程中,经常使用 SQL Server Management Studio 连接本地或远程 SQL Server 服务器,时间长了可能有些名称就不用了或者重复了,SQL Server ...

  3. 二十一、Linux 进程与信号---进程资源限制

    21.1 进程资源限制 在操作系统中,我们能够通过函数getrlimit().setrlimit()分别获得.设置每个进程能够创建的各种系统资源的限制使用量. 21.1.1 函数 #include & ...

  4. oracle rman备份

    rman 登录到cmd 打开cmd 输入 rman connect target jhpt/1@orcl C:\Documents and Settings\Administrator>rman ...

  5. Docker build Dockerfile 构建镜像 - 二

    Dockerfile 制作镜像 https://hub.docker.com/ 搜索需要镜像: https://hub.docker.com/_/centos/ 官方示例: centos:6 1.这里 ...

  6. [C++]2-1 水仙花数

    /* 水仙花数 输出100-999中的所有水仙花数.若三位数ABC满足ABC=A^3+B^3+C^3,则 称其为水仙花数.例如:153 = 1^3 + 5^3 + 3^3,故153是水仙花数. */ ...

  7. 上架一台Cisco防火墙及其架构

    领导给小白两条线,分别是电源线和网线,去吧,机房上架一台防火墙... 额, 然后小白抱着防火墙就去了...

  8. python之第三方模块安装

    1. 直接打开cmd窗口运行 pip install xxx   #可联网情况下使用,联网下载 xxx表示要安装的模块名称 pip问题及解决方法: 1. 配置环境变量,将如下两个路径都加到系统path ...

  9. 实现两线程的同步二(lockSupport的park/unpark)

    1.使用LockSupport的part/unpark实现 package com.ares.thread; import java.util.concurrent.locks.LockSupport ...

  10. mybatis批量更新报错

    批量更新sql <update id="updateAutoAppraiseInfo" parameterType="Object"> <fo ...