lemma与stem的区别

Difference between stem and lemma

先从wikipedia上看看什么是stem,什么是lemma?

Lemma(morphology):In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words(headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Turkish and Czech. The process of determining the lemma for a given word is called lemmatisation.

word stem:In linguistics, a stem is a part of a word. The term is used with slightly different meanings. In one usage, a stem is a form to which affixes can be attached. Thus, in this usage, the English word friendships contains the stem friend, to which the derivational suffix -ship is attached to form a new stem friendship, to which the inflectional suffix -s is attached. In a variant of this usage, the root of the word (in the example, friend) is not counted as a stem.In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants.Thus, in this usage, all derivational affixes are part of the stem. For example, the stem offriendships is friendship, to which the inflectional suffix -s is attached.

Difference between stem and lemma:

Stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production.In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" /prəˈdjuːst/ vs. "production" /prəˈdʌkʃən/.Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

从上面我们可以看出,lemma一般是指词型的还原,一般就是一个结果,而stem是词干,根据不同的定义略微不同,下面我们看下使用程序分析的结果,其中lemma使用stanford的NLP工具,stem使用NLTK包中的stem(snow,porter,lancaster三个算法)

原句:This work shows that single and double Ala substitutions of His18 and Phe21 in IL-8 reduced up to 77-fold the binding affinity to IL-8 receptor subtypes A (CXCR1) and B (CXCR2) and to the Duffy antigen.

lemma:this work show that single and double alum substitution of his18 and phe21 in il-8 reduce up to 77-fold the binding affinity to il-8 receptor subtype a -lrb- cxcr1 -rrb- and b -lrb- cxcr2 -rrb- and to the duffy antigen .

snowstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

porterstem:Thi work show that singl and doubl Ala substitut of His18 and Phe21 in IL-8 reduc up to 77-fold the bind affin to IL-8 receptor subtyp A ( CXCR1 ) and B ( CXCR2 ) and to the Duffi antigen .

lancasterstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

Difference between stem and lemma的更多相关文章

  1. Java 堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)

    --reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有 ...

  2. What's the difference between a stub and mock?

    I believe the biggest distinction is that a stub you have already written with predetermined behavio ...

  3. [转载]Difference between <context:annotation-config> vs <context:component-scan>

    在国外看到详细的说明一篇,非常浅显透彻.转给国内的筒子们:-) 原文标题: Spring中的<context:annotation-config>与<context:componen ...

  4. What's the difference between <b> and <strong>, <i> and <em> in HTML/XHTML? When should you use each?

    ref:http://stackoverflow.com/questions/271743/whats-the-difference-between-b-and-strong-i-and-em The ...

  5. difference between forward and sendredirect

    Difference between SendRedirect and forward is one of classical interview questions asked during jav ...

  6. Add Digits, Maximum Depth of BinaryTree, Search for a Range, Single Number,Find the Difference

    最近做的题记录下. 258. Add Digits Given a non-negative integer num, repeatedly add all its digits until the ...

  7. MySQL: @variable vs. variable. Whats the difference?

    MySQL: @variable vs. variable. Whats the difference?   up vote351down votefavorite 121 In another qu ...

  8. 茎叶图(stem)

    介绍 茎叶图(Stem-and-Leaf display)又称“枝叶图”,由统计学家约翰托奇( Arthur Bowley)设计,它的思路是将数组中的数按位数进行比较,将数的大小基本不变或变化不大的位 ...

  9. Distribute numbers to two “containers” and minimize their difference of sum

    it can be solved by Dynamical Programming.Here are some useful link: Tutorial and Code: http://www.c ...

随机推荐

  1. Android开发:《Gradle Recipes for Android》阅读笔记(翻译)2.4——更新新版本的Gradle

    问题: 你需要更新应用的Gradle版本. 解决方案: 生成一个新的wrapper,或者直接修改属性文件(.properties). 讨论: Android Studio包含了一个Gradle的分发. ...

  2. 启发式搜索技术A*

    开篇 这篇文章介绍找最短路径的一种算法,它的字我比较喜欢:启发式搜索. 对于入门的好文章不多,而这篇文章就是为初学者而写的,很适合入门的一篇.文章定位:非专业性A*文章,很适合入门. 有图有真相,先给 ...

  3. form表单提交中文乱码(前台中文到JAVA后台乱码)问题及解决

    form表单提交中文乱码(前台中文到JAVA后台乱码)问题及解决 一.问题: 页面输入框中的中文内容,在后台乱码,导致搜索功能失效:(详细可以见后面的重现) 二.原因: 浏览器对于数据的默认编码格式为 ...

  4. 关于angularjs的select下拉列表存在空白的解决办法

    angularjs 的select的option是通过循环造成的,循环的方式可能有ng-option或者</option  ng-repeat></option>option中 ...

  5. <2013 07 31> 没有必然的理由

    <2013 07 31> 没有必然的理由 没有必然的理由 人类从野蛮走向文明 也可能,从野蛮走向更野蛮 没有必然的理由 人群从疯狂走向理智 也可能,从疯狂走向更疯狂 没有必然的理由 你我从 ...

  6. python系列三:python3运算符

    '''python 没有自增运算符python 中,变量是以内容为基准而不是像 c 中以变量名为基准,所以只要你的数字内容是5,不管你起什么名字,这个变量的 ID 是相同的,同时也就说明了 pytho ...

  7. SQL SERVER临时表的使用

    SQL SERVER临时表的使用 drop table #Tmp   --删除临时表#Tmpcreate table #Tmp --创建临时表#Tmp(    ID   int IDENTITY (1 ...

  8. SE80 开发对象

    偶然发现:开发对象可以自动识别对象的类别.

  9. 20170411 F-02创建财务凭证

    SAPMF05A 100 X       0   BDC_OKCODE /00   0   BKPF-BLDAT 20170411   0   BKPF-BLART SA   0   BKPF-BUK ...

  10. hibernate 操作 Postgresql 数据库报 operator does not exist: integer = character varying

    网上的说法如下: Java开发Postgresql 数据库兼容应用的问题,与Oracle有一些不同: Java类型映射数据库类型的不同,Oracle jdbc驱动程序处理Java String类型可正 ...