lemma与stem的区别

Difference between stem and lemma

先从wikipedia上看看什么是stem,什么是lemma?

Lemma(morphology):In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words(headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Turkish and Czech. The process of determining the lemma for a given word is called lemmatisation.

word stem:In linguistics, a stem is a part of a word. The term is used with slightly different meanings. In one usage, a stem is a form to which affixes can be attached. Thus, in this usage, the English word friendships contains the stem friend, to which the derivational suffix -ship is attached to form a new stem friendship, to which the inflectional suffix -s is attached. In a variant of this usage, the root of the word (in the example, friend) is not counted as a stem.In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants.Thus, in this usage, all derivational affixes are part of the stem. For example, the stem offriendships is friendship, to which the inflectional suffix -s is attached.

Difference between stem and lemma:

Stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production.In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" /prəˈdjuːst/ vs. "production" /prəˈdʌkʃən/.Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

从上面我们可以看出,lemma一般是指词型的还原,一般就是一个结果,而stem是词干,根据不同的定义略微不同,下面我们看下使用程序分析的结果,其中lemma使用stanford的NLP工具,stem使用NLTK包中的stem(snow,porter,lancaster三个算法)

原句:This work shows that single and double Ala substitutions of His18 and Phe21 in IL-8 reduced up to 77-fold the binding affinity to IL-8 receptor subtypes A (CXCR1) and B (CXCR2) and to the Duffy antigen.

lemma:this work show that single and double alum substitution of his18 and phe21 in il-8 reduce up to 77-fold the binding affinity to il-8 receptor subtype a -lrb- cxcr1 -rrb- and b -lrb- cxcr2 -rrb- and to the duffy antigen .

snowstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

porterstem:Thi work show that singl and doubl Ala substitut of His18 and Phe21 in IL-8 reduc up to 77-fold the bind affin to IL-8 receptor subtyp A ( CXCR1 ) and B ( CXCR2 ) and to the Duffi antigen .

lancasterstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

Difference between stem and lemma的更多相关文章

  1. Java 堆内存与栈内存异同(Java Heap Memory vs Stack Memory Difference)

    --reference Java Heap Memory vs Stack Memory Difference 在数据结构中,堆和栈可以说是两种最基础的数据结构,而Java中的栈内存空间和堆内存空间有 ...

  2. What's the difference between a stub and mock?

    I believe the biggest distinction is that a stub you have already written with predetermined behavio ...

  3. [转载]Difference between <context:annotation-config> vs <context:component-scan>

    在国外看到详细的说明一篇,非常浅显透彻.转给国内的筒子们:-) 原文标题: Spring中的<context:annotation-config>与<context:componen ...

  4. What's the difference between <b> and <strong>, <i> and <em> in HTML/XHTML? When should you use each?

    ref:http://stackoverflow.com/questions/271743/whats-the-difference-between-b-and-strong-i-and-em The ...

  5. difference between forward and sendredirect

    Difference between SendRedirect and forward is one of classical interview questions asked during jav ...

  6. Add Digits, Maximum Depth of BinaryTree, Search for a Range, Single Number,Find the Difference

    最近做的题记录下. 258. Add Digits Given a non-negative integer num, repeatedly add all its digits until the ...

  7. MySQL: @variable vs. variable. Whats the difference?

    MySQL: @variable vs. variable. Whats the difference?   up vote351down votefavorite 121 In another qu ...

  8. 茎叶图(stem)

    介绍 茎叶图(Stem-and-Leaf display)又称“枝叶图”,由统计学家约翰托奇( Arthur Bowley)设计,它的思路是将数组中的数按位数进行比较,将数的大小基本不变或变化不大的位 ...

  9. Distribute numbers to two “containers” and minimize their difference of sum

    it can be solved by Dynamical Programming.Here are some useful link: Tutorial and Code: http://www.c ...

随机推荐

  1. 1280 前缀后缀集合(map)

    1280 前缀后缀集合 题目来源: Codility 基准时间限制:1 秒 空间限制:131072 KB 分值: 40 难度:4级算法题 一个数组包含N个正整数,其中有些是重复的.一个前缀后缀集是满足 ...

  2. 摄像机互联网直播之EasyCloud云平台与EasyNVS云端管控的全局对比

    背景分析 近期,Easy系列推出了EasyNVS,在功能上也是可以满足将内网的视频直播转发到公网,再由公网进行视频流的分发. 听起来和EasyCloud功能上是冲突的,其实两者之间的差别还是存在的,本 ...

  3. Chart控件文档

    假设c1Chart1为Chart控件的一个实例. 一.基本框架图 二.主要外层属性(即this.c1Chart1的主要属性) 1.Header和Footer,上标题和下标题.位于this.c1Char ...

  4. jetty 通过配置文件嵌入式启动web服务

    定义 jetty.xml 启动文件 <?xml version="1.0"?><!DOCTYPE Configure PUBLIC "-//Jetty/ ...

  5. 第一次打开Pycharm如何操作?

    1.第一次打开pycharm的界面: 2.一些pycharm的选择: 3.上一步,红字4的位置,点击进去,对下面界面进行选择,也就是选择System Interpreter解释器,然后对Interpr ...

  6. JS 插件 fastclick.js 解决手机端click点击延迟

    FastClick 是一个简单,易于使用的JS库用于消除在移动浏览器上触发click事件与一个物理Tap(敲击)之间的300延迟. 对于非移动浏览器不启作用,禁用缩放标签. <meta name ...

  7. OC中nil、Nil、NULL、NSNull的区别

    nil:指向OC中对象的空指针 e.g.: NSString *string = nil; Nil:指向OC中类的空指针    e.g.:Class class = Nil; NULL:指向其他类型的 ...

  8. 前端发起resultUrl请求,服务端收到后做逆向处理,校验sign后,执行originUrl逻辑

    originUrl=http://test.com:8080/user/alipay_phone?uid=123&amount=21.3第0步:前后端约定32位密钥KEY第一步:对参数按照ke ...

  9. 数据库时间类型和 util 包下时间类型转换

    Java 中的类型 1. java.sql 包下给出三个数据库相关的日期时间类型,分别是 java.sql.Date, 表示日期,只有年月日,没有时分秒. java.sql.Time, 表示时间, 只 ...

  10. Java中的异常和处理详解(转发:https://www.cnblogs.com/lulipro/p/7504267.html)

    简介 程序运行时,发生的不被期望的事件,它阻止了程序按照程序员的预期正常执行,这就是异常.异常发生时,是任程序自生自灭,立刻退出终止,还是输出错误给用户?或者用C语言风格:用函数返回值作为执行状态?. ...