-cacheArchive也是从hdfs上进分发,但是分发文件是一个压缩包,压缩包内可能会包含多层目录多个文件

1.The_Man_of_Property.txt文件如下(将其上传至hdfs上)

hadoop fs -put The_Man_of_Property.txt  /mapreduce
Preface
“The Forsyte Saga” was the title originally destined for that part of it which is called “The Man of Property”; and to adopt it for the collected chronicles of the Forsyte family has indulged the Forsytean tenacity that is in all of us. The word Saga might be objected to on the ground that it connotes the heroic and that there is little heroism in these pages. But it is used with a suitable irony; and, after all, this long tale, though it may deal with folk in frock coats, furbelows, and a gilt-edged period, is not devoid of the essential heat of conflict. Discounting for the gigantic stature and blood-thirstiness of old days, as they have come down to us in fairy-tale and legend, the folk of the old Sagas were Forsytes, assuredly, in their possessive instincts, and as little proof against the inroads of beauty and passion as Swithin, Soames, or even Young Jolyon. And if heroic figures, in days that never were, seem to startle out from their surroundings in fashion unbecoming to a Forsyte of the Victorian era, we may be sure that tribal instinct was even then the prime force, and that “family” and the sense of home and property counted as they do to this day, for all the recent efforts to “talk them out.”
So many people have written and claimed that their families were the originals of the Forsytes that one has been almost encouraged to believe in the typicality of an imagined species. Manners change and modes evolve, and “Timothy’s on the Bayswater Road” becomes a nest of the unbelievable in all except essentials; we shall not look upon its like again, nor perhaps on such a one as James or Old Jolyon. And yet the figures of Insurance Societies and the utterances of Judges reassure us daily that our earthly paradise is still a rich preserve, where the wild raiders, Beauty and Passion, come stealing in, filching security from beneath our noses. As surely as a dog will bark at a brass band, so will the essential Soames in human nature ever rise up uneasily against the dissolution which hovers round the folds of ownership.
“Let the dead Past bury its dead” would be a better saying if the Past ever died. The persistence of the Past is one of those tragi-comic blessings which each new age denies, coming cocksure on to the stage to mouth its claim to a perfect novelty.
But no Age is so new as that! Human Nature, under its changing pretensions and clothes, is and ever will be very much of a Forsyte, and might, after all, be a much worse animal.
Looking back on the Victorian era, whose ripeness, decline, and ‘fall-of’ is in some sort pictured in “The Forsyte Saga,” we see now that we have but jumped out of a frying-pan into a fire. It would be difficult to substantiate a claim that the case of England was better in than it was in , when the Forsytes assembled at Old Jolyon’s to celebrate the engagement of June to Philip Bosinney. And in , when again the clan gathered to bless the marriage of Fleur with Michael Mont, the state of England is as surely too molten and bankrupt as in the eighties it was too congealed and low-percented. If these chronicles had been a really scientific study of transition one would have dwelt probably on such factors as the invention of bicycle, motor-car, and flying-machine; the arrival of a cheap Press; the decline of country life and increase of the towns; the birth of the Cinema. Men are, in fact, quite unable to control their own inventions; they at best develop adaptability to the new conditions those inventions create.
But this long tale is no scientific study of a period; it is rather an intimate incarnation of the disturbance that Beauty effects in the lives of men.
The figure of Irene, never, as the reader may possibly have observed, present, except through the senses of other characters, is a concretion of disturbing Beauty impinging on a possessive world.
One has noticed that readers, as they wade on through the salt waters of the Saga, are inclined more and more to pity Soames, and to think that in doing so they are in revolt against the mood of his creator. Far from it! He, too, pities Soames, the tragedy of whose life is the very simple, uncontrollable tragedy of being unlovable, without quite a thick enough skin to be thoroughly unconscious of the fact. Not even Fleur loves Soames as he feels he ought to be loved. But in pitying Soames, readers incline, perhaps, to animus against Irene: After all, they think, he wasn’t a bad fellow, it wasn’t his fault; she ought to have forgiven him, and so on!

2.white_list1与white_list2做为白名单,找出白名单文件中单词在The_Man_of_Property.tx中出现的次数(实现将2个文件打包为white.tar.gz,上传至hdfs上)

white_list1如下:

suitable
against
recent
across

white_list2如下:

Age
on

打包并上传至hdfs:

tar czvf white.tar.gz white_list1 white_list2
hadoop fs -put white.tar.gz /mapreduce

map函数代码如下:思路(1.遍历找到所有文件的路径,2.读取white_list文件内容;3.进行过滤)

#!usr/bin/python
import sys
import os
def read_dir_file(file_dir,dir):
fs = os.listdir(dir)
for f1 in fs:
tmp_path=os.path.join(dir,f1)
if not os.path.isdir(tmp_path):
file_dir.append(tmp_path)
else:
read_dir_file(file_dir,tmp_path)
return file_dir
def read_local_file(file_dir):
word_set = set()
for file in file_dir:
file_in = open (file,'r')
for line in file_in:
word = line.strip()
word_set.add(word)
return word_set
def mapper_func(dir):
file_dir=[]
file_dir=read_dir_file(file_dir,dir)
word_set=read_local_file(file_dir)
for line in sys.stdin:
ss=line.strip().split()
for word in ss:
word.strip()
if word != "" and (word in word_set):
print "%s\t%s"%(word,"")
if __name__ == "__main__":
func = getattr(sys.modules[__name__],sys.argv[1])
args = None
if len(sys.argv) > 1:
args = sys.argv[2:]
func(*args)

4.reduce端代码如下:

#!usr/bin/python
import sys
def reducer_func():
word="None"
sum=0
for line in sys.stdin:
ss=line.split()
cur_word=ss[0]
cnt=int(ss[1])
if cur_word!=word:
if word!="None":
print "%s\t%s"%(word,sum)
word=cur_word
sum=0
else:
sum+=cnt
print "%s\t%s"%(word,sum)
if __name__ == "__main__":
func = getattr(sys.modules[__name__],sys.argv[1])
args = None
if len(sys.argv) > 1:
args=sys.argv[2:]
func(*args)

5.运行脚本run.sh如下:

HADOOP="/usr/local/src/hadoop-1.2.1/bin/hadoop"
HADOOP_STREAMING="/usr/local/src/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar"
INPUT_PATH="/mapreduce/The_Man_of_Property.txt"
OUTPUT_PATH="/mapreduce/out"
$HADOOP fs -rmr $OUTPUT_PATH
$HADOOP jar $HADOOP_STREAMING \
-input "$INPUT_PATH" \
-output "$OUTPUT_PATH" \
-mapper "python map.py mapper_func ABC" \
-reducer "python red.py reducer_func" \
-file "./map.py"\
-file "./red.py"\
-cacheArchive "hdfs://master:9000/mapreduce/white.tar.gz#ABC"

大数据python词频统计之hdfs分发-cacheArchive的更多相关文章

  1. 大数据python词频统计之hdfs分发-cacheFile

    -cacheFile 分发,文件事先上传至Hdfs上,分发的是一个文件 1.找一篇文章The_Man_of_Property.txt: He was proud of him! He could no ...

  2. 大数据python词频统计之本地分发-file

    统计某几个词在文章出现的次数 -file参数分发,是从客户端分发到各个执行mapreduce端的机器上 1.找一篇文章The_Man_of_Property.txt如下: He was proud o ...

  3. Python 词频统计

    利用Python做一个词频统计 GitHub地址:FightingBob [Give me a star , thanks.] 词频统计 对纯英语的文本文件[Eg: 瓦尔登湖(英文版).txt]的英文 ...

  4. 大数据Python学习大纲

    最近公司在写一个课程<大数据运维实训课>,分为4个部分,linux实训课.Python开发.hadoop基础知识和项目实战.这门课程主要针对刚从学校毕业的学生去应聘时不会像一个小白菜一样被 ...

  5. python词频统计及其效能分析

    1) 博客开头给出自己的基本信息,格式建议如下: 学号2017****7128 姓名:肖文秀 词频统计及其效能分析仓库:https://gitee.com/aichenxi/word_frequenc ...

  6. 大数据学习(一)-------- HDFS

    需要精通java开发,有一定linux基础. 1.简介 大数据就是对海量数据进行数据挖掘. 已经有了很多框架方便使用,常用的有hadoop,storm,spark,flink等,辅助框架hive,ka ...

  7. 大数据学习之旅1——HDFS版本演化

    最近开始学习大数据,发现大数据有很多很多组件,我现在负责的是HDFS(Hadoop分布式储存系统)的学习,整理了一下HDFS的版本情况.因为HDFS是Hadoop的重要组成部分,所以有关HDFS的版本 ...

  8. 大数据谢列3:Hdfs的HA实现

    在之前的文章:大数据系列:一文初识Hdfs , 大数据系列2:Hdfs的读写操作 中Hdfs的组成.读写有简单的介绍. 在里面介绍Secondary NameNode和Hdfs读写的流程. 并且在文章 ...

  9. 大数据学习(02)——HDFS入门

    Hadoop模块 提到大数据,Hadoop是一个绕不开的话题,我们来看看Hadoop本身包含哪些模块. Common是基础模块,这个是必须用的.剩下常用的就是HDFS和YARN. MapReduce现 ...

随机推荐

  1. H5网页适配 iPhoneX,就是这么简单

    iPhoneX 取消了物理按键,改成底部小黑条,这一改动导致网页出现了比较尴尬的屏幕适配问题.对于网页而言,顶部(刘海部位)的适配问题浏览器已经做了处理,所以我们只需要关注底部与小黑条的适配问题即可( ...

  2. vue之v-for循环的使用

  3. Oracle简单学习笔记

    创建用户 CREATE USER username identified by password;//这是最简单的用户创建SQL语句. CREATE USER username identified ...

  4. HDFS笔记(二)

    fsimage : NameNode启动时,对文件系统的快照 eidt logs : NameNode启动后,对文件系统的改动序列 namenode在全局里就一个进程,所以存在单点问题 DataNod ...

  5. compileSdkVersion,minSdkVersion 和 targetSdkVersion

    compileSdkVersion(Eclipse中叫做build target) 1.在eclipse中位于项目根目录中的project.properties文件中 2.在studio中位于项目中的 ...

  6. python答题辅助

    最近直播答题app很热门,由于之前看过跳一跳的python脚本(非常棒),于是也想写一个答题的脚本. https://github.com/huanmsf/cai 思路: 1.截图 2.文字识别,提取 ...

  7. Android 6.0系统动态请求系统相机权限

    private static final int TAKE_PHOTO_REQUEST_CODE = 1; public static String takePhoto(Context context ...

  8. WindowsPE权威指南-PE文件头中的重定位表

    PE加载的过程 任何一个EXE程序会被分配4GB的内存空间,用户层处理低2G的内存,驱动处理高2G的内存. 1.双击EXE程序,操作系统开辟一个4GB的空间. 2.从ImageBase决定了加载后的基 ...

  9. 【php】下载站系统Simple Down v5.5.1 xss跨站漏洞分析

    author:zzzhhh 一.        跨站漏洞 利用方法1,直接在搜索框处搜索<script>alert(/xss/)</script>//',再次刷新,跨站语句将被 ...

  10. ROI Pool和ROI Align

    这里说一下ROI Pool和ROI Align的区别: 一.ROI Pool层: 参考faster rcnn中的ROI Pool层,功能是将不同size的ROI区域映射到固定大小的feature ma ...