Solution for automatic update of Chinese word segmentation full-text index in NEO4J
<p></p>
Solution for automatic update of Chinese word segmentation full-text index in NEO4J
- 1. Sample data
- 2. Differences between English and Chinese Full-Text Indexes
- 3. APOC has its own English full-text indexing process (indexing can be updated automatically)
- 4. Custom Chinese word segmentation full-text index plug-in (unsuccessful automatic index update)
- V. Label Cross-search
- 6. Custom Chinese Word Segmentation Plugin (Failed to Update Indexes Independently of Nodes)
- 1. Add Full-Text Index
- 2. Add Nodes and Attributes and Update Full-Text Index
- 3. Add 2 new nodes or updated attributes to the index
- 4. Retrieval
- 7. Resolve Transaction Submission Timeout
Failed to implement automatic updates using the NEO4J INDEX API, converting a way of thinking to solve this problem (synchronizing updates to the corresponding full-text index when updating a node or creating a new one.)
1. Sample data
2. Differences between English and Chinese Full-Text Indexes
1. Create NEO4J default index
CALL apoc.index.addAllNodes('Loc', {Loc:["description","cause","year"]})
// The following retrieval was unsuccessful:
CALL apoc.index.search('Loc', 'Loc.description:Chinese~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Chinese*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:test~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese~') YIELD node RETURN node
2. Delete Index
CALL apoc.index.remove('Loc')
3. Create an index that supports Chinese words
CALL zdr.index.addChineseFulltextIndex('Loc', ["description","cause","year"], 'Loc') YIELD message RETURN message
// The following retrieval was successful:
CALL apoc.index.search('Loc', 'description:Chinese~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:Chinese*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:test~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'description:Test Chinese~') YIELD node RETURN node
3. APOC has its own English full-text indexing process (indexing can be updated automatically)
1. Add Full-Text Index
CALL apoc.index.addAllNodes('Loc', {Loc:["description","cause","year"]},{autoUpdate:true})
2. New Nodes and Attributes
CREATE (n:Loc {name:'V'}) SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n
3. Retrieval
Indexes can be updated automatically, but they are not friendly to Chinese retrieval, such as the following tests:
// Retrieval failed:
CALL apoc.index.search('Loc', 'Loc.cause:Test English word breakers~') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese word segmentation~') YIELD node RETURN node
// Retrieved successfully:
CALL apoc.index.search('Loc', 'Loc.cause:Test English word breakers*') YIELD node RETURN node
CALL apoc.index.search('Loc', 'Loc.description:Test Chinese word segmentation*') YIELD node RETURN node
4. Custom Chinese word segmentation full-text index plug-in (unsuccessful automatic index update)
The addChineseFulltextAutoIndex process succeeds in creating a full-text index to add a full-text indexing process that supports Chinese, but automatic updates are not supported for updating new attributes of nodes.
1. Add Full-Text Index
CALL zdr.index.addChineseFulltextAutoIndex('IKAnalyzer',["description","cause","year"],'Loc',{autoUpdate:'true'}) YIELD message RETURN message
2. New Nodes and Attributes
CREATE (n:Loc {name:'V'}) SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n
3. Retrieval
After adding a full-text search, you can retrieve:
CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:Acridyl Aminomethane Sulfonymethoxyaniline', 100) YIELD node RETURN node
Re-index before retrieving:
CALL zdr.index.chineseFulltextIndexSearch('IKAnalyzer', 'description:test~', 100) YIELD node RETURN node
V. Label Cross-search
Add ChineseFulltextAutoIndex/addChineseFulltextIndex supports multiple tags while retrieving, using the same index name when building the index.
Tag: Loc
CALL zdr.index.addChineseFulltextAutoIndex('Loc',["description","cause","name"],'Loc',{autoUpdate:'true'}) YIELD message RETURN message
Tag: LocProvince'
CALL zdr.index.addChineseFulltextAutoIndex('Loc',["description","cause","name"],'LocProvince',{autoUpdate:'true'}) YIELD message RETURN message
Retrieve node:
CALL apoc.index.search('Loc', 'name:p~') YIELD node RETURN node

6. Custom Chinese Word Segmentation Plugin (Failed to Update Indexes Independently of Nodes)
To support single-node index updates, develop the following process.(The automatic update scheme described in the third section fails, and updates to the corresponding full-text index synchronously when updating or creating a new node.)
1. Add Full-Text Index
CALL apoc.index.remove('Loc')
CALL zdr.index.addChineseFulltextIndex('Loc',["description","cause","year"],'Loc') YIELD message RETURN message
2. Add Nodes and Attributes and Update Full-Text Index
CREATE (n:Loc {name:'V'}) SET n.description='Testing Chinese word segmentation, the final chapter of the duplicate show was very exciting. It is said that knowledge mapping and artificial intelligence technology were applied to that movie!',n.cause='Test the English word breaker, Mobile World Congress, the world's largest gathering for the mobile industry, ' RETURN n
3. Add 2 new nodes or updated attributes to the index
MATCH (n) WHERE n.name='V' WITH n CALL zdr.index.addNodeChineseFulltextIndex(n, ['description']) RETURN *
4. Retrieval
CALL zdr.index.chineseFulltextIndexSearch('Loc', 'description:Test Chinese~') YIELD node RETURN node

7. Resolve Transaction Submission Timeout
If the transaction commit timeout setting is configured, Cancel when building the index.
#********************************************************************
### Neo4j transcation timeout
###******************************************************************
#dbms.transaction.timeout=180s
Use a background script to execute the indexer:
# index.sh
#!/usr/bin/env bash
nohup /neo4j-community-3.4.9/bin/neo4j-shell -file build.cql >>indexGraph.log 2>&1 &
// build.cql
CALL zdr.index.addChineseFulltextIndex('IKAnalyzer', ['description','fullname','name','lnkurl'], 'LinkedinID') YIELD message RETURN message;
All of the above references to the NEO4J custom process
原文地址:https://programmer.ink/think/5cd0160be03d2.html
Solution for automatic update of Chinese word segmentation full-text index in NEO4J的更多相关文章
- 长短时间记忆的中文分词 (LSTM for Chinese Word Segmentation)
翻译学长的一片论文:Long Short-Term Memory Neural Networks for Chinese Word Segmentation 传统的neural Model for C ...
- zpar使用方法之Chinese Word Segmentation
第一步在这里: http://people.sutd.edu.sg/~yue_zhang/doc/doc/qs.html 你可以找到这句话, 所以在命令行中分别敲入 make zpar make zp ...
- 论文阅读及复现 | Effective Neural Solution for Multi-Criteria Word Segmentation
主要思想 这篇文章主要是利用多个标准进行中文分词,和之前复旦的那篇文章比,它的方法更简洁,不需要复杂的结构,但比之前的方法更有效. 方法 堆叠的LSTM,最上层是CRF. 最底层是字符集的Bi-LST ...
- The solution for apt-get update Err 404
最近在ubuntu 12.10上执行sudo apt-get update 命令后出现了如下错误: Ign http://extras.ubuntu.com natty/main Translatio ...
- Chinese word segment based on character representation learning 论文笔记
论文名和编号 摘要/引言 相关背景和工作 论文方法/模型 实验(数据集)及 分析(一些具体数据) 未来工作/不足 是否有源码 问题 原因 解决思路 优势 基于表示学习的中文分词 编号:1001-908 ...
- LIST OF NOSQL DATABASES [currently 150]
http://nosql-database.org Core NoSQL Systems: [Mostly originated out of a Web 2.0 need] Wide Column ...
- Pyhton开源框架(加强版)
info:Djangourl:https://www.oschina.net/p/djangodetail: Django 是 Python 编程语言驱动的一个开源模型-视图-控制器(MVC)风格的 ...
- Python开源框架
info:更多Django信息url:https://www.oschina.net/p/djangodetail: Django 是 Python 编程语言驱动的一个开源模型-视图-控制器(MVC) ...
- 【DeepLearning】一些资料
记录下,有空研究. http://nlp.stanford.edu/projects/DeepLearningInNaturalLanguageProcessing.shtml http://nlp. ...
随机推荐
- Tomcat 激活spring profile
springboot打包war部署到外部tomcat的时候指定profile启动 windows 在%tomcat%/bin下创建setenv.bat文件 linux 在%tomcat%/bin下创建 ...
- JS框架_(Qrcode.js)将你的内容转换成二维码格式
百度云盘 传送门 密码:304e 输入网址点击按钮生成二维码,默认为我的博客首页 二维码格式演示 <!DOCTYPE html> <html lang="en"& ...
- Unity3D_(游戏)2D简单游戏制作过程:捕获高空掉落保龄球
游戏介绍:通过鼠标的左右移动,可以控制帽子的移动,当帽子接到下落的保龄球时,会出现火花效果.没有接到保龄球时,保龄球落到草地上,过10S后会自动消失. 实现效果: 素材+Unity3D源代码:传送 ...
- Windows下安装jdk
1. 下载安装包:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 2. 双击e ...
- [洛谷P1501] [国家集训队]Tree II(LCT模板)
传送门 这是一道LCT的板子题,说白了就是在LCT上支持线段树2的操作. 所以我只是来存一个板子,并不会讲什么(再说我也不会,只能误人子弟2333). 不过代码里的注释可以参考一下. Code #in ...
- HTML和CSS 入门系列(一):超链接、选择器、颜色、盒模式、DIV布局、图片
一.超链接 二.CSS选择器 CSS的全称叫做: Cascading Style Sheets 级联样式表的缩写. 2.1 类型选择器 2.2 派生选择器 2.3 伪类选择器 <style &g ...
- 顺序表栈C语言实现
/* * SeqStack.h * * Created on: 2019年8月1日 * Author: Administrator */ #ifndef SEQSTACK_H_ #define SEQ ...
- better-scroll 滑动插件的使用
better-scroll 滑动插件的使用 拥有的效果:下拉刷新.上拉加载.滑动.轮播
- 微信小程序中的不同场景,不同的判断,请求的时机
本来5月1之前就想写一下一篇关于小程序不同场景下发送ajax请求的问题,但是放假的前一天,出了个大bug,就是因为我修改不同的场景下执行不同的逻辑造成的 1.首先,在小程序里,微信做了很多的缓存,我们 ...
- Jenkins发布
右键查看图片显示全图
