初识lucene

lucene的介绍网上有好多，再写一遍可能有点多余了。

使用lucene之前，有一系列的疑问

为什么lucene就比数据库快？
倒排索引是什么，他是怎么做到的
lucene的数据结构是什么样的，cpu消耗，内存消耗主要因为什么
lucene的索引流程以及查询流程是什么样的

推荐两篇文章，更进一步了解lucene

可以参考lucene与数据库对比部分

http://www.chedong.com/tech/lucene.html

可以参考第一篇和第二篇部分对lucene有一部分了解

http://blog.csdn.net/forfuture1978/article/details/5668956

《Lucene 原理与代码分析》看过一点，但是有点难度。

现在从《lucene实战》这本书来看，lucene使用的是4.7可能与3.0有所区别。

下面是第一节的例子

package com.mitchz.lucence;

import java.io.File;

import java.io.FileFilter;

import java.io.FileReader;

import java.io.IOException;

import org.apache.lucene.analysis.core.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.StringField;

import org.apache.lucene.document.TextField;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

/**

 * @author mitchz

 * @version 1.0

 * @since 2014年4月30日

 * @category com.mitchz.lucence

 */

public class Indexer

{

	private IndexWriter writer;

	public Indexer(String indexDir) throws IOException

	{

		Directory dir = FSDirectory.open(new File(indexDir));

		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47,

				new SimpleAnalyzer(Version.LUCENE_47));

		writer = new IndexWriter(dir, config);

	}

	public int index(String dataDir, FileFilter filter) throws Exception

	{

		File[] files = (new File(dataDir)).listFiles();

		for (File file : files)

		{

			if (!file.isDirectory() && !file.isHidden() && file.canRead()

					&& (filter == null || filter.accept(file)))

			{

				indexFile(file);

			}

		}

		return writer.numDocs();

	}

	private static class TextFilesFilter implements FileFilter

	{

		@Override

		public boolean accept(File path)

		{

			return path.getName().toLowerCase().endsWith(".txt");

		}

	}

	protected Document getDocument(File file) throws Exception

	{

		Document doc = new Document();

		doc.add(new TextField("contents", new FileReader(file)));

		doc.add(new StringField("filename", file.getName(), Field.Store.YES));

		doc.add(new StringField("fullpath", file.getCanonicalPath(), Field.Store.YES));

		return doc;

	}

	protected void indexFile(File file) throws Exception

	{

		System.out.println("Indexing " + file.getCanonicalPath());

		Document doc = getDocument(file);

		writer.addDocument(doc);

	}

	protected void close() throws IOException

	{

		writer.close();

	}

	public static void main(String[] args) throws Exception

	{

		if (args.length != 2)

		{

			throw new IllegalArgumentException("Usage java " + Indexer.class.getName()

					+ "<index dir> <data dir>");

		}

		String indexDir = args[0];

		String dataDir = args[1];

		System.out.println("indexDir:" + indexDir);

		System.out.println("dataDir:" + dataDir);

		long start = System.currentTimeMillis();

		Indexer indexer = new Indexer(indexDir);

		int numIndexed;

		try

		{

			numIndexed = indexer.index(dataDir, new TextFilesFilter());

		}

		finally

		{

			indexer.close();

		}

		long end = System.currentTimeMillis();

		System.out.println("Indexing " + numIndexed + " files took " + (end - start)

				+ " milliseconds");

	}

}

package com.mitchz.lucence;

import java.io.File;

import java.io.IOException;

import org.apache.lucene.analysis.core.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.queryparser.classic.ParseException;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

/**

 * @author mitchz

 * @version 1.0

 * @since 2014年4月30日

 * @category com.mitchz.lucence

 */

public class Searcher

{

	public static void main(String args[]) throws IOException, ParseException

	{

		if (args.length != 2)

		{

			throw new IllegalArgumentException("Usage java " + Searcher.class.getName()

					+ "<index dir> <query>");

		}

		String indexDir = args[0];

		String q = args[1];

		search(indexDir, q);

	}

	public static void search(String indexDir, String q) throws IOException,

			ParseException

	{

		Directory dir = FSDirectory.open(new File(indexDir));

		DirectoryReader dirReader = DirectoryReader.open(dir);

		IndexSearcher is = new IndexSearcher(dirReader);

		QueryParser parser = new QueryParser(Version.LUCENE_47, "contents",

				new SimpleAnalyzer(Version.LUCENE_47));

		Query query = parser.parse(q);

		long start = System.currentTimeMillis();

		TopDocs hits = is.search(query, 10);

		long end = System.currentTimeMillis();

		System.out.println("Found " + hits.totalHits + " document(s) (in "

				+ (end - start) + " milliseconds) that matched query '" + q + "':");

		for (ScoreDoc scoreDoc : hits.scoreDocs)

		{

			Document doc = is.doc(scoreDoc.doc);

			System.out.println(doc.get("filename"));

		}

	}

}

maven的配置如下：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

	<modelVersion>4.0.0</modelVersion>

	<groupId>com.mitchz</groupId>

	<artifactId>lucence-test</artifactId>

	<version>0.0.1-SNAPSHOT</version>

	<packaging>jar</packaging>

	<name>lucence-test</name>

	<url>http://maven.apache.org</url>

	<properties>

		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

	</properties>

	<dependencies>

		<dependency>

			<groupId>junit</groupId>

			<artifactId>junit</artifactId>

			<version>3.8.1</version>

			<scope>test</scope>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-core</artifactId>

			<version>4.7.0</version>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-analyzers-common</artifactId>

			<version>4.7.0</version>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-queryparser</artifactId>

			<version>4.7.0</version>

		</dependency>

	</dependencies>

</project>

初识lucene的更多相关文章

初识Lucene.net
最近想提高下自己的能力,也是由于自己的项目中需要用到Lucene,所以开始接触这门富有挑战又充满新奇的技术.. 刚刚开始,只是写了个小小的demo,用了用lucene,确实很好创建索引 Data ...
初识 Lucene
Lucene是一个信息检索工具库,而不是一个完整的搜索程序搜索程序 Lucene索引核心类 Lucene索引核心类: Document: 文档对象代表一些域(field)的集合 Field: 每个文 ...
第一章初识Lucene
多看几遍,慢就是快 1.1 应对信息爆炸 1.2 Lucene 是什么 1.2.1 Lucene 能做些什么 1.2.2 Lucene 的历史 1.3 Lucene 和搜索程序组件基本概念索引操作 ...
初识lucene（想看代码的跳过）
最早是在百度贴吧里看到的lucene这个名称,只知道跟搜索引擎有关,因为工作中一直以来没有类似的需求,所以没有花时间学习这方面的知识. 刚过完年,公司不忙,自己闲不住把<Netty权威指南> ...
1. 初识 Lucene
在学习Lucene之前呢,我们当然首先要了解下什么是Lucene. 0x01 什么是Lucene ? Lucene是一套用于全文检索和搜索的开放源代码程序库,由Apache软件基金会支持和提供. Lu ...
（转）初识 Lucene
Lucene 是一个基于 Java 的全文信息检索工具包,它不是一个完整的搜索应用程序,而是为你的应用程序提供索引和搜索功能.Lucene 目前是 Apache Jakarta 家族中的一个开源项目. ...
实战 Lucene，第 1 部分: 初识 Lucene (zhuan)
http://www.ibm.com/developerworks/cn/Java/j-lo-lucene1/ ******************************************** ...
搜索引擎学习（一）初识Lucene
一.Lucene相关基础概念定义:一个简易的工具包,实现文件搜索的功能,支持中文,关键字,多条件查询,凡是文件名或文件内容包含的都查出来. 数据分类:结构化数据(固定格式或有限长度的数据)和非结构化 ...
【转载】Lucene.Net入门教程及示例
本人看到这篇非常不错的Lucene.Net入门基础教程,就转载分享一下给大家来学习,希望大家在工作实践中可以用到. 一.简单的例子 //索引Private void Index(){ Index ...

随机推荐

算法导论(第三版)Exercises2.1(插入排序、线性查找、N位大数相加)
关于练习程序的说明参见置顶的那篇. 2.1-1: 31 41 59 26 41 58 31 41 59 26 41 58 31 41 59 26 41 58 26 31 41 59 41 58 26 ...
Centos 6.5中安装后不能打开emacs的问题
问题的发现过程: 安装了最新的centos版本后发现居然打不开emacs,然后在终端中输入emacs后还是不能打开,出现了下面的提示: emacs: error while loading share ...
Java高级软件工程师面试考纲
如果要应聘高级开发工程师职务,仅仅懂得Java的基础知识是远远不够的,还必须懂得常用数据结构.算法.网络.操作系统等知识.因此本文不会讲解具体的技术,笔者综合自己应聘各大公司的经历,整理了一份大公司对 ...
c语言指向结构体数组的指针
#include <stdio.h> #include <stdlib.h> struct dangdang { ]; ]; ]; int num; int bugnum; ] ...
jQuery获取select option
jQuery的一些方法理出一些常用的方法: //获取第一个option的值 $('#test option:first').val(); //最后一个option的值 $('#test option: ...
Appium 一个测试套件多次启动android应用
AppiumDriver<WebElement> driver; File classpathRoot = new File(System.getProperty("user.d ...
Android：在WebView中获取网页源码
1. 使能javascript: ? 1 webView.getSettings().setJavaScriptEnabled(true); 2. 编写本地接口 ? 1 2 3 4 5 final c ...
【leetcode】Merge k Sorted Lists(按大小顺序连接k个链表)
题目:Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity ...
JavaScript运算符有哪些
JavaScript中的运算符有很多,主要分为算术运算符,等同全同运算符,比较运算符,字符串运算符,逻辑运算符,赋值运算符等.这些运算符都有一些属于自己的运算规则,下面就为大家介绍一下JavaScri ...
[CSAPP笔记][第十章系统级I/O]
第十章系统级I/O 输入/输出(I/O) : 是指主存和外部设备(如磁盘,终端,网络)之间拷贝数据过程. 高级别I/O函数 scanf和printf <<和>> 使用系统级I ...

初识lucene

初识lucene的更多相关文章

随机推荐

热门专题