solr6.6 导入文本（txt/json/xml/csv）文件

　　重点就是三个配置文件

　　1、建立的data-config.xml

　　　　内容如下：

<dataConfig>

  <dataSource name="fileDataSource" type="FileDataSource" />

    <!--<document>

        <entity name="tika-test" processor="TikaEntityProcessor"

                url="C:/docs/solr-word.pdf" format="text">

                <field column="Author" name="author" meta="true"/>

                <field column="title" name="title" meta="true"/>

                <field column="text" name="text"/>

        </entity>

    </document>-->  

  <dataSource name="urlDataSource" type="BinURLDataSource" />

  <!--baseDir="D:/work/Solr/solr-6.6.0/ImportDoc" fileName=".*\.(doc)|(pdf)|(docx)|(txt)"-->

  <document>

    <entity name="files" dataSource="null" rootEntity="false"

    processor="FileListEntityProcessor"

    baseDir="D:/work/Solr/solr-6.6.0/ImportDoc" fileName=".*\.(json)|(txt)|(csv)|(xml)"

    onError="skip"

    recursive="true">

      <field column="file" name="id"/>

      <field column="fileAbsolutePath" name="filePath" />

      <field column="fileSize" name="size" />

      <field column="fileLastModified" name="lastModified" />

      <entity processor="PlainTextEntityProcessor" name="txtfile" url="${files.fileAbsolutePath}" dataSource="fileDataSource">

        <field column="plainText" name="text"/>

      </entity>

    </entity>

  </document>

</dataConfig>

　2、修改managed-schema文件

　　　　增加如下内容：

 <!-- mmseg4j fieldType-->

  <fieldType name="text_mmseg4j_complex" class="solr.TextField" positionIncrementGap="100" >

    <analyzer>

      <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" />

    </analyzer>

  </fieldType>

  <fieldType name="text_mmseg4j_maxword" class="solr.TextField" positionIncrementGap="100" >

    <analyzer>

      <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" />

    </analyzer>

  </fieldType>

  <fieldType name="text_mmseg4j_simple" class="solr.TextField" positionIncrementGap="100" >

    <analyzer>

      <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" />

    </analyzer>

  </fieldType>

  <field name="text" type="text_mmseg4j_complex" indexed="true" stored="true" omitNorms="true" multiValued="false"/>

  <field name="fileName" type="string" indexed="true" stored="true" />

  <field name="filePath" type="string" indexed="true" stored="true" required="true" multiValued="false" />

  <field name="size" type="long" indexed="true" stored="true" />

  <field name="lastModified" type="date" indexed="true" stored="true" />

　　3、修改solrconfig.xml文件

 <lib dir="./lib" regex=".*\.jar"/>

　　4、导入文件

　　　　注意，txt文件编码请保证是UTF-8编码，默认txt文件的编码是GBK

　　5、查询

　　　　导入成功后，查询

　　　　从上面可以看到，pdf和word文件是乱码，必须用其它Processor进行处理

solr6.6 导入文本（txt/json/xml/csv）文件的更多相关文章

导出Excel/Pdf/txt/json/XML/PNG/CSV/SQL/MS-Word/ Ms-Powerpoint/等通过tableExport.js插件来实现
首先去我的云盘下载需要的js: 链接:https://pan.baidu.com/s/13vC-u92ulpx3RbljsuadWw 提取码:mo8m 页面代码: <!DOCTYPE html& ...
Java 对不同类型的数据文件的读写操作整合器[JSON,XML,CSV]-[经过设计模式改造]（2020年寒假小目标03）
日期:2020.01.16 博客期:125 星期四我想说想要构造这样一个通用文件读写器确实不容易,嗯~以后会添加更多的文件类型,先来熟悉一下文件内容样式: <?xml version=&quo ...
json和csv文件存储
一. json 1:基本概念 1.1 Json和Javascript JSON, 全称JavaScript Object Notation,它通过对象和数组的组合来表示数据.在JavaScript中一 ...
MySQL导入含有中文字段(内容)CSV文件乱码解决方法
特别的注意:一般的CSV文件并不是UTF-8编码,而是10008(MAC-Simplified Chinese GB 2312),所以再通过Navicat导入数据的时候需要指定的编码格式是10008( ...
Solr json,xml等文件数据导入(添加索引)linux下操作
使用solr-5.3.1\example\exampledocs下的post.jar来完成数据导入 1.将想要导入的文件放在solr-5.3.1\example\exampledocs中,如aaa.x ...
CSV文件导入导mysql数据库
1.导入基本语法: load data [low_priority] [local] infile 'file_name txt' [replace | ignore] into table tbl ...
Oracle数据库导入csv文件(sqlldr命令行)
1.说明 Oracle数据库导入csv文件, 当csv文件较小时, 可以使用数据库管理工具, 比如DBevaer导入到数据库, 当csv文件很大时, 可以使用Oracle提供的sqlldr命令行工具, ...
mysql SQLyog导入导出csv文件
1.选择数据库表 --> 右击属性 --> 备份/导出 --> 导出表数据作为 --> 选择cvs --> 选择下面的“更改” --> 字段 --> 变量长度 ...
MongoDB：数据导入CSV文件之错误记录
测试主机1:Windows 10,MongoDB 3.6.3,WPS 10.1,Notepad++ 7.5.3, 测试主机2:Ubuntu 16.04,MongoDB 4, 今天测试了将数据从文件—— ...

随机推荐

algorithm ch6 heapsort
堆排序利用的是堆这种数据结构来对进行排序,(二叉)堆可以被视为一棵完全的二叉树,树的每个节点与数组中存放该节点的值得那个元素对应.这里使用最大堆进行排序算法设计,最大堆就是parent(i) > ...
suse下自动启动脚本
suse下自动启动脚本 http://blog.csdn.net/herobox/article/details/8961358 suse下自动启动脚本也许你对SUSE Linux很了解,也许你不太 ...
PL/SQL 09 包 package
--定义包头 create or replace package 包名as 变量.常量声明; 函数声明; 过程声明;end; --定义包体 create or replace package b ...
Oracle基础 03 回滚表空间 undo
--查询默认的undo表空间 select name,value from v$parameterwhere name like '%undo%'; --创建 undotbs2 表空间 create ...
Unicode(UTF-8, UTF-16)令人混淆的概念----我看完了不错
来自:http://www.cnblogs.com/kingcat/archive/2012/10/16/2726334.html ---------------------------------- ...
C#关于log4net(Log For Net)
1 介绍 log4net(Log For Net)是Apache开源的应用于.Net框架的日志记录工具,详细信息参见Apache网站.它是针对Java的log4j(Log For Java ...
（六）MySQL数据操作DML
(1)insert:插入数据顺序插入数据 insert into 表名 values(值1,值2,值3); 指定字段插入数据 insert into 表名(字段1,字段2,字段3) values(值 ...
前端nginx后端tomcat记录真实ip
修改nginx主配置文件:/usr/local/nginx/conf/nginx.conf proxy_set_header Host $host; proxy_set_header X-Real-I ...
备份文件的python脚本(转)
作用:将目录备份到其他路径.实际效果:假设给定目录"/media/data/programmer/project/python" ,备份路径"/home/diegoyun ...
Linux下编译C++/C以及常用的几种命令（ubuntu）
http://blog.csdn.net/bob1993_dev/article/details/45973919

solr6.6 导入 文本（txt/json/xml/csv）文件

1、建立的data-config.xml

2、修改managed-schema文件

3、修改solrconfig.xml文件

4、导入文件

5、查询

solr6.6 导入 文本（txt/json/xml/csv）文件的更多相关文章

随机推荐

热门专题

solr6.6 导入文本（txt/json/xml/csv）文件

　　1、建立的data-config.xml

　2、修改managed-schema文件

　　3、修改solrconfig.xml文件

　　4、导入文件

　　5、查询

solr6.6 导入文本（txt/json/xml/csv）文件的更多相关文章