A Generative Entity-Mention Model for Linking Entities with Knowledge Base

 

一.主要方法

提出了一种生成概率模型,叫做entity-mention model.

Explanation:

In our model, each name mention to be linked is modeled as a sample generated through a three-step generative story, and the entity knowledge is encoded in the distribution of entities in document P(e), the distribution of possible names of a specific entity P(s|e), and the distribution of possible contexts of a specific entity P(c|e). To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e).

The P(e), P(s|e) and P(c|e) are respectively called the entity popularity model, the entity name model and the entity context model

二.相关介绍

建模

Given a set of name mentions M = {m1, m2, …, mk} contained in documents and a knowledge base KB containing a set of entities E = {e1, e2, …, en}, an entity linking system is a function s : M ® E which links these name mentions to their referent entities in KB.

Popularity Knowledge

实体的流行度知识告诉我们一个实体出现在文档中的可能性

Name Knowledge

名称知识告诉我们实体的可能名称,以及名称引用特定实体的可能性。

Context Knowledge

上下文知识告诉我们一个实体出现在特定上下文中的可能性。

三.The Generative Entity-Mention Model for Entity Linking

Explanation

  1. 首先,该模型根据P(e)中实体的分布情况,从给定知识库中选择提及名称的引用实体e。
  2. 其次,该模型根据被引用实体P(s|e)的可能名称的分布情况输出所述名称的名称s。
  3. 最后,模型根据被引用实体P(c|e)可能的上下文分布输出所提到的名称的上下文c。

model

The probability of a name mention m (its context is c and its name is s) referring to a specific entity e can be expressed as the following formula (here assume that s and c are independent):

Give a name mention m, to perform entity linking, we need to find the entity e which maximizes the probability P(e|m).

               

Candidate Selection

building a name-to-entity dictionary using the redirect links, disambiguation pages, anchor texts of Wikipedia, then the candidate entities of a name mention are selected by finding its name’s corresponding entry in the dictionary

四.Model Estimation

Entity Popularity Model

----》

where Count(e) is the count of the name mentions whose referent entity is e, and the |M| is the total name mention size.

Entity Name Model

比如,我们希望 P(Michael Jordan|Michael Jeffrey Jordan) 高,,P(MJ|Michael Jeffrey Jordan) 也高。 P(Michael I. Jordan|Michael Jeffrey Jordan) 应该是0.

因此,名称模型可以通过首先从数据集中收集所有(实体、名称)对来估计。

缺点:它不能正确地处理一个不可见的实体或一个不可见的名称。

Eg: “MJ”在Wikipedia指的并不是Michael Jeffrey Jordan, 这个the name model 将不能识别 “MJ” 就是Michael Jeffrey Jordan.

    ↓

1) It is retained (translated into itself);

2) It is translated into its acronym;

3) It is omitted(translated into the word NULL);

4) It is translated into another word (misspelling or alias).

wheree is a normalization factor, f is the full name of entity e, lf is the length of f, ls is the length of the name s, si the i th word of s, fj is the j th word of f and t(si|fj) is the lexical translation probability which indicates the probability of a word fj in the full name will be written as si in the output name.

Entity Context Model

例如:

C1: __wins NBA MVP.

C2: __is a researcher in machine learning

P(C1|Michael Jeffrey Jordan)应该很高,因为NBA球员迈克尔杰弗里乔丹经常出现在C1和P(C2|Michael Jeffrey Jordan)应该是非常低的,因为他很少出现在C2.

a context c containing n terms t1,t2…tn (term: a word; a named entity; a Wikipedia concept) ,the entity context model estimates the probability P(c|e) as

                  

where Pg(t) is a general language model which is estimated using the whole Wikipedia data, and the optimal value of λ is set to 0.2

                     

where Counte(t) is the frequency of occurrences of a term t in the contexts of the name mentions whose referent entity is e

The NIL Entity Problem

假设:“如果一个名字被提到是指一个特定的实体,那么这个名字被提到的概率是由特定实体的模型产生的,应该显著高于由一般语言模型产生的概率

1. add a pseudo entity, the NIL entity, into the knowledge base

2. the probability of a name mention is generated by the NIL entity is higher than all other entities in Knowledge base, we link the name mention to the NIL entity.

五.Experiments

论文《A Generative Entity-Mention Model for Linking Entities with Knowledge Base》的更多相关文章

  1. Entity Framework Model First下改变数据库脚本的生成方式

    在Entity Framework Model First下, 一个非常常见的需求是改变数据库脚本的生成方式.这个应用场景是指,当用户在Designer上单击鼠标右键,然后选择Generate Dat ...

  2. Entity Framework的核心 – EDM(Entity Data Model) 一

    http://blog.csdn.net/wangyongxia921/article/details/42061695 一.EnityFramework EnityFramework的全程是ADO. ...

  3. EF,ADO.NET Entity Data Model简要的笔记

    1. 新建一个项目,添加一个ADO.NET Entity Data Model的文件,此文件会生成所有的数据对象模型,如果是用vs2012生的话,在.Designer.cs里会出现“// Defaul ...

  4. Create Entity Data Model

    http://www.entityframeworktutorial.net/EntityFramework5/create-dbcontext-in-entity-framework5.aspx 官 ...

  5. 论文分享|《Universal Language Model Fine-tuning for Text Classificatio》

    https://www.sohu.com/a/233269391_395209 本周我们要分享的论文是<Universal Language Model Fine-tuning for Text ...

  6. Entity Framework Tutorial Basics(5):Create Entity Data Model

    Create Entity Data Model: Here, we are going to create an Entity Data Model (EDM) for SchoolDB datab ...

  7. ASP.NET-MVC中Entity和Model之间的关系

    Entity 与 Model之间的关系图 ViewModel类是MVC中与浏览器交互的,Entity是后台与数据库交互的,这两者可以在MVC中的model类中转换 MVC基础框架 来自为知笔记(Wiz ...

  8. How to: Use the Entity Framework Model First in XAF 如何:在 XAF 中使用EF ModelFirst

    This topic demonstrates how to use the Model First entity model and a DbContext entity container in ...

  9. 创建实体数据模型【Create Entity Data Model】(EF基础系列5)

    现在我要来为上面一节末尾给出的数据库(SchoolDB)创建实体数据模型: SchoolDB数据库的脚本我已经写好了,如下: USE master GO IF EXISTS(SELECT * FROM ...

随机推荐

  1. 重拾c++第三天(6):分支语句与逻辑运算符

    1.逻辑运算符 && || ! 2.关系运算符优先级高于逻辑运算符 3.cctype库中好用的判断 4. ?:符号用法: 状态1?结果1:结果2 5.switch用法: switch ...

  2. 网络流入门题目 - bzoj 1001

    现在小朋友们最喜欢的"喜羊羊与灰太狼",话说灰太狼抓羊不到,但抓兔子还是比较在行的, 而且现在的兔子还比较笨,它们只有两个窝,现在你做为狼王,面对下面这样一个网格的地形: 左上角点 ...

  3. NETCore下IConfiguration和IOptions的用法

    NETCore下IConfiguration和IOptions的用法 https://www.cnblogs.com/RainingNight/p/strongly-typed-options-con ...

  4. 关于neo4j初入门(1)

    图形数据库也称为图形数据库管理系统或GDBMS. Neo4j的官方网站:http://www.neo4j.org Neo4j的优点 它很容易表示连接的数据 检索/遍历/导航更多的连接数据是非常容易和快 ...

  5. PHPStorm 最新版下载

    2019最新版phpstorm   包含其他版下载地址   https://www.jetbrains.com/phpstorm/download/other.html

  6. [bzoj4942] [洛谷P3822] [NOI2017] 整数

    题目链接 https://www.luogu.org/problemnew/show/P3822 想法 这个啊,就是线段树哇 最初的想法是每位一个节点,然后进位.退位找这一位前面第一个0或第一个1,然 ...

  7. .NetCore自定义WebAPI返回Json的格式大小写的三种方式

    .NetCore的Controller/WebAPI可以帮我们将返回结果自动转换为Json格式给前台,而且可以自由设定格式(大写.小写.首字母大写等),我总结了三种方法,对应三种灵活度,供大家参考 ( ...

  8. python3小脚本-监控服务器性能并插入mysql数据库

    操作系统: centos版本 7.4 防火墙 关闭 selinux 关闭 python版本 3.6 mysql版本 5.7 #操作系统性能脚本 [root@localhost sql]# cat cp ...

  9. C++封装的基于WinSock2的TCP服务端、客户端

    无聊研究Winsock套接字编程,用原生的C语言接口写出来的代码看着难受,于是自己简单用C++封装一下,把思路过程理清,方便自己后续翻看和新手学习. 只写好了TCP通信服务端,有空把客户端流程也封装一 ...

  10. xpath写法大全(适用于selenium、robotframework)

    1.//input[contains(@id, 'txttags')] 定位出来是个ID,但是ID后面的“102”是个随机数,所以用定位ID的方法就不行了,用firepath生成的xpath也会包括这 ...