Last week I was here Natural Language Processing in NZ.

Someone asked a question, is there any existed library or solution which can extract certain information out of a natural language dataset of a specific topic, for example, to find objective facts and subjective sentimental information out of a bunch of customer complaints.

I was doing something similar in this realm, and I think LSI or Word2Vec is the best solution of this problem now. But my poor english speaking could not match either my confidence or LSI's power, my explanation of LSI was so terrible that someone gave me a few hints very politely and cryptically. And I could tell some of you were far from convinced.

I felt sorry for LSI, it was not its fault that I didn't present it in a proper way. Thanks to Alyona and everyone in the meetup, I appreciate the opportunity to meet nice people and interesting ideas there. So, here is my 2 cent.

We have been trying to build different models of natural language. The problem of querying natural language dataset is the problem of how to build a model which could interpret a query into results. The question raised at last meetup was how to build such a model which can extract information from natural language dataset based on specific syntactic structure and some specific semantics.

We have plenty natural language parsers now, so I guess syntactic structure is not the problem here.

As for semantics, there are some rule-based approach, such as, WordNet , FrameNet, however, I found it was difficult to map a word to similar words in semantics because a word usaually has multiple meanings, and there is no way to find the right semantics of a word in a specific context with these approach. And you could end up with different results with different meaning of a word. Furthermore, there is not any threshold which can determine how far you can go in a WordNet graph, at least not in a mathematically decidable way.

LSI is a tool of finding similarity among documents, it stems from SVD. After mapping all documents into a space, we could find similarities between documents in different dimension. And of course we can get a distance between any two words in a specific dimension.

Word2Vec is based on the same idea but the algorithm is different.

With these tools, by training a model with a dataset of some topics, we can determine the distance of any two words in a specific dataset(or topic) with this model. Then we can map a query into a set of sentences with this similarity or distance in respect to the specific topic.

LSI or similar algorithm is actually another piece of the puzzle.

'''Note'''
We still have to deal with the problem of how to model multi meanings of a word

Another attempt about LSI的更多相关文章

  1. yarn关于app max attempt深度解析,针对长服务appmaster平滑重启

    在YARN上开发长服务,需要注意fault-tolerance,本篇文章对appmaster的平滑重启的一个参数做了解析,如何设置可以有助于达到appmaster平滑重启. 在yarn-site.xm ...

  2. ORA-14450: attempt to access a transactional temp table already in use

    在ORACLE数据中修改会话级临时表时,有可能会遇到ORA-14550错误,那么为什么会话级全局临时表会报ORA-14450错误呢,如下所示,我们先从一个小小案例入手: 案例1: SQL> CR ...

  3. Attempt to fetch logical page (...) in database 2 failed. It belongs to allocation unit xxxx not to xxx

    今天一个同事说在一个生产库执行某个存储过程,遇到了错误: Fatal error 605 occurred at jul 29 2014 我试着执行该存储过程,结果出现下面错误,每次执行该存储过程,得 ...

  4. 错误信息:attempt to create saveOrUpdate event with null entity

    错误信息:attempt to create saveOrUpdate event with null entity; 这个错误网上答案比较多,我也不多说了. 我遇到的问题是在前台传过来的参数是nul ...

  5. lua协程一则报错解决“attempt to yield across metamethod/C-call boundary”

    问题 attempt to yield across metamethod/C-call boundary 需求跟如下帖子中描述一致: http://bbs.chinaunix.net/forum.p ...

  6. Warning: Attempt to present on whose view is not in the window hierarchy!

    当我想从一个VC跳转到另一个VC的时候,一般会用 - (void)presentViewController:(UIViewController *)viewControllerToPresent a ...

  7. tomcat提示警告: An attempt was made to authenticate the locked user"tomcat"

    启动tomcat7之后,运行正常,但是运行一段时间就会提示以下警告: 十二月 04, 2013 5:10:15 下午 org.apache.catalina.realm.LockOutRealm au ...

  8. Attempt to present <vc> on <vc> which is already presenting <vc>/(null)

    在给 tableViewCell 添加长按手势弹出一个 popViewController 的时候,遇到的这个变态问题: Warning: Attempt to present <UINavig ...

  9. An attempt was made to load a program with an incorrect format

      用.net调用一个C++ 32位的DLL, 编译的时候选择x86, 在部署到一个64位的机器上的时候报错:"An attempt was made to load a program w ...

随机推荐

  1. netstat和telnet命令在Windows7中的用法(转载)

    在网络方面我们常常会用到如下命令: (1)ping命令:我们常常用来判断2台或2台以上的机器间是否网络连通. ping 192.168.1.88 -t 如果想看任何命令的参数是什么意思,我们只需要:命 ...

  2. Maximum Subarray 解答

    Question Find the contiguous subarray within an array (containing at least one number) which has the ...

  3. 转化率最高的16个WordPress 电子商务主题

    想自己开一个WordPress的电子商务商店?下面我们分享转化率最高的16个WordPress 电子商务主题,它们拥有最棒的用户体验,集成最新的用户体验,慢慢欣赏吧! 原文地址:http://thet ...

  4. iOS https认证 && SSL/TLS证书申请

    1.下面列出截止2016年底市面上常见的免费CA证书: 腾讯云SSL证书管理(赛门铁克TrustAsia DV SSL证书)阿里云云盾证书服务(赛门铁克DV SSL证书)百度云SSL证书服务Let's ...

  5. 采用标准c进行目录文件遍历

    图像处理的时候经常需要对一个目录的所有图像进行处理,遍历文件得c代码: 在windows中需要使用到宽字符. 另外,可以使用opencv封装的目录访问操作,下次给出. // DirTraverse.c ...

  6. 本人的cocos2d-x之路

        大学基本上算是混着过去了- -,说起学到的东西,感觉真的不多.然后吧.在大四这年在大妈的带动下,来到了一家棋牌游戏公司,详细就不说了.刚进去的时候真的是啥也不懂.先是看了项目代码,自己捉摸了1 ...

  7. 在opensips中记录通话记录

    1.为acc表增加额外的字段记录主叫被叫进入mysql,选取opensips的数据库ALTER TABLE acc ADD from_uri VARCHAR(64) DEFAULT '' NOT NU ...

  8. 使用Unicorn-engine 续1

    续上次,在ubuntu server 14.04交叉编译好后,下一步就是在windows上使用了. 在windows上,我主要是用python进行分析程序,因此我最初安装的是官网上的 unicorn- ...

  9. NET基础课--异常处理X

    通常不建议如下的捕获方式  正确的方法是:某一功能函数的入口捕获基本异常即exception,分支方法或片段方法中捕获特定异常 高级: 另附:Fxcop异常监控工具

  10. SpringMVC+JPA+Hibernate配置

    首先,Spring配置文件 <?xml version="1.0" encoding="UTF-8"?><beans xmlns=" ...