Mapreduce 反向索引

反向索引主要用于全文搜索，就是形成一个word url这样的结构

file1:

MapReduce is simple

file2:

MapReduce is powerful is simple

file3:

Hello MapReduce bye MapReduce

那么经过反向索引后就是：

Hello     file3.txt:1;
MapReduce     file3.txt:2;fil1.txt:1;fil2.txt:1;
bye     file3.txt:1;
is     fil1.txt:1;fil2.txt:2;
powerful     fil2.txt:1;
simple     fil2.txt:1;fil1.txt:1;

主要的方法就是，对每个文件的内容进行遍历，形成的key为word+filename，value=1然后在combiner中将key相同的进行累加，这样就得到在同一个文件中word的字数了。最后在reduce中将filename进行分割即可。不过这里有个小的bug，一般来说combiner是在同一个节点上进行reduce，但是我这里却是用于统计同一个文件了，如果说文件很大，那么很有可能一个文件的内容会被分配到两个不同的节点上，那么就有会bug了。所以这里只能适合小的文件。

PS：获得文件名String filename = ((FileSplit) context.getInputSplit()).getPath().getName();别的似乎没有了。

public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable ikey, Text ivalue, Context context)

throws IOException, InterruptedException {

StringTokenizer st= new StringTokenizer(ivalue.toString());

FileSplit split=new FileSplit();

split = (FileSplit) context.getInputSplit();

InputSplit isplit=context.getInputSplit();

String filename = ((FileSplit) context.getInputSplit()).getPath().getName();

while(st.hasMoreTokens()){

//int splitIndex = split.getPath().toString().indexOf("file");

String key=st.nextToken()+":" +filename;

context.write( new Text(key),new Text("1"));

}

public class MyCombiner extends Reducer<Text, Text, Text, Text> {

public void reduce(Text _key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {

// process values

int sum=0;

for (Text val : values) {

sum++;

}

StringTokenizer st= new StringTokenizer(_key.toString(),":");

String key=st.nextToken();

String value=st.nextToken();

value=value+ ":"+sum;

context.write( new Text(key),new Text(value));

}

public class MyReducer extends Reducer<Text, Text, Text, Text> {

public void reduce(Text _key, Iterable<Text> values, Context context)

throws IOException, InterruptedException {

// process values

String filelist= new String();

for (Text val : values) {

filelist=filelist+val.toString()+ "; ";

}

context.write(_key, new Text(filelist));

//System.out.println(_key.toString()+filelist);

}

Mapreduce 反向索引的更多相关文章

Oracle索引梳理系列（三）- Oracle索引种类之反向索引
版权声明:本文发布于http://www.cnblogs.com/yumiko/,版权由Yumiko_sunny所有,欢迎转载.转载时,请在文章明显位置注明原文链接.若在未经作者同意的情况下,将本文内 ...
Reverse Key Indexes反向索引
Reverse Key Indexes反向索引A reverse key index is a type of B-tree index that physically reverses the by ...
【转】Lucene工作原理——反向索引
原文链接: http://my.oschina.net/wangfree/blog/77045 倒排索引倒排索引(反向索引) 倒排索引源于实际应用中需要根据属性的值来查找记录.这种索引表中的每一项 ...
Oracle 反向索引（反转建索引）理解
一反向索引 1.1 反向索引的定义反向索引作为B-tree索引的一个分支,主要是在创建索引时,针对索引列的索引键值进行字节反转,进而实现分散存放到不同叶子节点块的目的. 1.2 反向索引针对的问题 ...
反向索引（Inverted Index）
转自:http://zhangyu8374.iteye.com/blog/86307 反向索引是一种索引结构,它存储了单词与单词自身在一个或多个文档中所在位置之间的映射.反向索引通常利用关联数组实现. ...
lucene反向索引——倒排表无论是文档号及词频，还是位置信息，都是以跳跃表的结构存在的
转自:http://www.cnblogs.com/forfuture1978/archive/2010/02/02/1661436.html 4.2. 反向信息反向信息是索引文件的核心,也即反向索 ...
Oracle 反键索引/反向索引
反键索引又叫反向索引,不是用来加速数据访问的,而是为了均衡IO,解决热块而设计的比如数据这样: 1000001 1000002 1000005 1000006 在普通索引中会出现在一个叶子上,如果部门 ...
Elastic Search 学习之路（二）——inverted index(反向索引)
这是篇翻译文,图画的挺有意思. Elastic使用非常特殊的数据结构,称作反向索引.反向索引中,包括了一组document中出现的唯一的单词,和对应的单词,所出现的位置.反向索引是在ES中,docum ...
MongoDB入门三步曲2－－基本操作(续)--聚合、索引、游标及mapReduce
mongodb 基本操作(续)--聚合.索引.游标及mapReduce 目录聚合操作 MapReduce 游标索引聚合操作像大多关系数据库一样,Mongodb也提供了聚合操作,这里仅列取常见到 ...

随机推荐

PID控制学习笔记（一）
比例控制往往会存在稳态误差(该结论适用于0型对象) 由比例度的定义和意义,比例增益Kc越大,即直线的斜率越大,则,越快达到平衡,稳态误差越小,因此在保证系统相对稳定性一定的条件下,总是希望比例增益越大 ...
android ApplicationContext Context Activity 内存的一些学习
Android中context可以作很多操作,但是最主要的功能是加载和访问资源. 在android中有两种context,一种是application context,一种是activity cont ...
android webview处理下载内容
url = "http://m.mumayi.com/"; WebView = (WebView) findViewById(R.id.webView1); WebView.get ...
“#ifdef __cplusplus extern "C" { #endif”的定义-----C和C++的互相调用
"#ifdef __cplusplus extern "C" { #endif"的定义看一些程序的时候老是有 "#ifdef __cplusplus ...
深入理解React、Redux
深入理解React.ReduReact+Redux非常精炼,良好运用将发挥出极强劲的生产力.但最大的挑战来自于函数式编程(FP)范式.在工程化过程中,架构(顶层)设计将是一个巨大的挑战.要不然做出来的 ...
在web项目中使用cxf开发webservice，包含spring支持
本文主要介绍了,如何使用cxf内置的例子,学会开发webserivce,在web项目中使用,且包含spring支持. webserivce的开发可以使用cxf或者axis,好像还有httpclient ...
在程序中用new ClassPathXmlApplicationContext()的注意事项
http://blog.csdn.net/budapest/article/details/38493003
SD卡的SPI模式的初始化顺序（转）
为了使SD卡初始化进入SPI模式,我们需要使用的命令有3个:CMD0,ACMD41,CMD55(使用ACMD类的指令前应先发CMD55,CMD55起到一个切换到ACMD类命令的作用). 为什么在使用C ...
js数据显示在文本框中(页面加载显示和按钮触动显示)
web代码如下: <!DOCTYPE html> <html> <head> <title>jsTest02.html</title> &l ...
修炼dp（1）
从最简单的开始: POJ:The Triangle #include <cstdio> #include <algorithm> #include <cstring> ...

Mapreduce 反向索引

Mapreduce 反向索引的更多相关文章

随机推荐

热门专题