生物医学命名实体识别(BioNER)研究进展
生物医学命名实体识别(BioNER)研究进展
最近把之前整理的一些生物医学命名实体识别(Biomedical Named Entity Recognition, BioNER)相关的论文做了一个BioNER Progress放在了github(https://github.com/lingluodlut/BioNER-Progress)上。主要内容包括BioNER进展中的代表论文列表,以及目前各个主要数据集上的一些先进结果和相关论文,希望对入门的同学有所帮助。
论文列表首先给出一些综述论文,然后根据BioNER研究的发展历程依次给出了基于词典,基于规则和基于机器学习方法的代表性工作。机器学习的方法又细分为了基于传统机器学习模型(SVM、HMM、MEMM和CRF模型)以及现在主流的神经网络方法。
BioNER Papers
A paper list for BioNER
Over the past decades, many automatic BioNER methods have been proposed and used to recognise biomeidcal entities. They can be categorised into dictionary-based, rule-based and machine learning-based methods. Recently, neural network-based machine learning methods exhibit promising results.
Survey Papers
- Overview of BioCreative II gene mention recognition. Smith L, Tanabe L K, nee Ando R J, et al. Genome biology, 2008, 9(2): S2. [paper]
- Biomedical named entity recognition: a survey of machine-learning tools. Campos D, Matos S, Oliveira J L. Theory and Applications for Advanced Text Mining, 2012: 175-195. [paper]
- Chemical named entities recognition: a review on approaches and applications. Eltyeb S, Salim N. Journal of cheminformatics, 2014, 6(1): 17. [paper]
- CHEMDNER: The drugs and chemical names extraction challenge. Krallinger M, Leitner F, Rabal O, et al. Journal of cheminformatics, 2015, 7(1): S1. [paper]
- A comparative study for biomedical named entity recognition. Wang X, Yang C, Guan R. International Journal of Machine Learning and Cybernetics, 2015, 9(3): 373-382. [paper]
Dictionary-based Methods
- Using BLAST for identifying gene and protein names in journal articles. Krauthammer M, Rzhetsky A, Morozov P, et al. Gene, 2000, 259(1-2): 245-252. [paper]
- Boosting precision and recall of dictionary-based protein name recognition. Tsuruoka Y, Tsujii J. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13, 2003: 41-48. [paper]
- Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature. Yang Z, Lin H, Li Y. Computational Biology and Chemistry, 2008, 32(4): 287-291. [paper]
- A dictionary to identify small molecules and drugs in free text. Hettne K M, Stierum R H, Schuemie M J, et al. Bioinformatics, 2009, 25(22): 2983-2991. [paper] [dictionary]
- LINNAEUS: a species name identification system for biomedical literature. Gerner M, Nenadic G, Bergman C M. BMC bioinformatics, 2010, 11(1): 85. [paper]
Rule-based Methods
- Toward information extraction: identifying protein names from biological papers. Fukuda K, Tsunoda T, Tamura A, et al. Pac symp biocomput. 1998, 707(18): 707-718. [paper]
- A biological named entity recognizer. Narayanaswamy M, Ravikumar K E, Vijay-Shanker K. Biocomputing 2003. 2002: 427-438. [paper]
- ProMiner: rule-based protein and gene entity recognition. Hanisch D, Fundel K, Mevissen H T, et al. BMC bioinformatics, 2005, 6(1): S14. [paper]
- MutationFinder: a high-performance system for extracting point mutation mentions from text. Caporaso J G, Baumgartner Jr W A, Randolph D A, et al. Bioinformatics, 2007, 23(14): 1862-1865. [paper] [code]
- Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems. Segura-Bedmar I, Martínez P, Segura-Bedmar M. Drug discovery today, 2008, 13(17-18): 816-823. [paper]
- Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. Xu R, Morgan A, Das A K, et al. Proceedings of the workshop on current trends in biomedical natural language processing, 2009: 63-70. [paper]
- Linguistic approach for identification of medication names and related information in clinical narratives. Hamon T, Grabar N. Journal of the American Medical Informatics Association, 2010, 17(5): 549-554. [paper]
- SETH detects and normalizes genetic variants in text. Thomas P, Rocktäschel T, Hakenberg J, et al. Bioinformatics, 2016, 32(18): 2883-2885. [paper] [code]
- PENNER: Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature. Wang X, Zhang Y, Li Q, et al. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018: 540-547. [paper]
Machine Learning-based Methods
SVM-based Methods
- Tuning support vector machines for biomedical named entity recognition. Kazama J, Makino T, Ohta Y, et al. Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain-Volume 3, 2002: 1-8. [paper]
- Biomedical named entity recognition using two-phase model based on SVMs. Lee K J, Hwang Y S, Kim S, et al. Journal of Biomedical Informatics, 2004, 37(6): 436-447. [paper]
- Exploring deep knowledge resources in biomedical name recognition. GuoDong Z, Jian S. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004: 96-99. [paper]
HMM-based Methods
- Named entity recognition in biomedical texts using an HMM model. Zhao S. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004: 84-87.[paper]
- Annotation of chemical named entities. Corbett P, Batchelor C, Teufel S. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, 2007: 57-64. [paper]
- Conditional random fields vs. hidden markov models in a biomedical named entity recognition task. Ponomareva N, Rosso P, Pla F, et al. Proc. of Int. Conf. Recent Advances in Natural Language Processing, RANLP. 2007, 479: 483.[paper]
MEMM-based Mehtods
- Cascaded classifiers for confidence-based chemical named entity recognition. Corbett P, Copestake A. BMC bioinformatics, 2008, 9(11): S4. [paper]
- OSCAR4: a flexible architecture for chemical text-mining. Jessop D M, Adams S E, Willighagen E L, et al. Journal of cheminformatics, 2011, 3(1): 41. [paper]
CRF-based Methods
- ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Settles B. Bioinformatics, 2005, 21(14): 3191-3192.[paper]
- BANNER: an executable survey of advances in biomedical named entity recognition. Leaman R, Gonzalez G. Biocomputing 2008. 2008: 652-663.[paper]
- Detection of IUPAC and IUPAC-like chemical names. Klinger R, Kolářik C, Fluck J, et al. Bioinformatics, 2008, 24(13): i268-i276. [paper]
- Incorporating rich background knowledge for gene named entity classification and recognition. Li Y, Lin H, Yang Z. BMC bioinformatics, 2009, 10(1): 223. [paper]
- A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Jiang M, Chen Y, Liu M, et al. Journal of the American Medical Informatics Association, 2011, 18(5): 601-606. [paper]
- ChemSpot: a hybrid system for chemical named entity recognition. Rocktäschel T, Weidlich M, Leser U. Bioinformatics, 2012, 28(12): 1633-1640. [paper]
- Gimli: open source and high-performance biomedical name recognition. Campos D, Matos S, Oliveira J L. BMC bioinformatics, 2013, 14(1): 54. [paper]
- tmVar: a text mining approach for extracting sequence variants in biomedical literature. Wei C H, Harris B R, Kao H Y, et al. Bioinformatics, 2013, 29(11): 1433-1439. [paper] [code]
- Evaluating word representation features in biomedical named entity recognition tasks. Tang B, Cao H, Wang X, et al. BioMed research international, 2014, 2014. [paper]
- Drug name recognition in biomedical texts: a machine-learning-based method. He L, Yang Z, Lin H, et al. Drug discovery today, 2014, 19(5): 610-617. [paper]
- tmChem: a high performance approach for chemical named entity recognition and normalization. Leaman R, Wei C H, Lu Z. Journal of cheminformatics, 2015, 7(1): S3. [paper]
- GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Wei C H, Kao H Y, Lu Z. BioMed research international, 2015, 2015. [paper]
- Mining chemical patents with an ensemble of open systems[J]. Leaman R, Wei C H, Zou C, et al. Database, 2016, 2016. [paper]
- nala: text mining natural language mutation mentions. Cejuela J M, Bojchevski A, Uhlig C, et al. Bioinformatics, 2017, 33(12): 1852-1858. [paper]
Neural Network-based Methods
- Recurrent neural network models for disease name recognition using domain invariant features. Sahu S, Anand A. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 2216-2225. [paper]
- Deep learning with word embeddings improves biomedical named entity recognition. Habibi M, Weber L, Neves M, et al. Bioinformatics, 2017, 33(14): i37-i48. [paper]
- A neural joint model for entity and relation extraction from biomedical text. Li F, Zhang M, Fu G, et al. BMC bioinformatics, 2017, 18(1): 198. [paper]
- A neural network multi-task learning approach to biomedical named entity recognition. Crichton G, Pyysalo S, Chiu B, et al. BMC bioinformatics, 2017, 18(1): 368. [paper] [code]
- Disease named entity recognition from biomedical literature using a novel convolutional neural network. Zhao Z, Yang Z, Luo L, et al. BMC medical genomics, 2017, 10(5): 73. [paper]
- An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Luo L, Yang Z, Yang P, et al. Bioinformatics, 2018, 34(8): 1381-1388. [paper] [code]
- GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Zhu Q, Li X, Conesa A, et al. Bioinformatics, 2018, 34(9): 1547-1554. [paper] [code]
- D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Dang T H, Le H Q, Nguyen T M, et al. Bioinformatics, 2018, 34(20): 3539-3546. [paper] [code]
- Transfer learning for biomedical named entity recognition with neural networks. Giorgi J M, Bader G D. Bioinformatics, 2018, 34(23): 4087-4094. [paper]
- Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition. Wang Z, Qu Y, Chen L, et al. NAACL. 2018: 1-15. [paper]
- Recognizing irregular entities in biomedical text via deep neural networks. Li F, Zhang M, Tian B, et al. Pattern Recognition Letters, 2018, 105: 105-113. [paper]
- Cross-type biomedical named entity recognition with deep multi-task learning. Wang X, Zhang Y, Ren X, et al. Bioinformatics, 2019, 35(10): 1745-1752. [paper] [code]
- Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. Zhai Z, Nguyen D Q, Akhondi S, et al. Proceedings of the 18th BioNLP Workshop and Shared Task. 2019: 328-338. [paper] [code]
- Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field. Qiu J, Zhou Y, Wang Q, et al. IEEE Transactions on NanoBioscience, 2019, 18(3): 306-315. [paper]
- A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization. Zhao S, Liu T, Zhao S, et al. Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 817-824. [paper]
- CollaboNet: collaboration of deep neural networks for biomedical named entity recognition. Yoon W, So C H, Lee J, et al. BMC bioinformatics, 2019, 20(10): 249. [paper] [code]
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Lee J, Yoon W, Kim S, et al. Bioinformatics, Advance article, 2019. [paper] [code]
- HUNER: Improving Biomedical NER with Pretraining. Weber L, Münchmeyer J, Rocktäschel T, et al. Bioinformatics, Advance article, 2019. [paper] [code]
Others
- TaggerOne: joint named entity recognition and normalization with semi-Markov Models. Leaman R, Lu Z. Bioinformatics, 2016, 32(18): 2839-2846. [paper] [code]
- A transition-based joint model for disease named entity recognition and normalization. Lou Y, Zhang Y, Qian T, et al. Bioinformatics, 2017, 33(15): 2363-2371. [paper] [code]
此外,还总结给出了目前各个主要数据集上的一些先进结果。根据实体类型的不同分别为化学药物(Chemical)、疾病(Disease)、基因蛋白(Gene/Protein)、基因变异(Mutation)和物种(Species)的实体识别。
Chemical NER
CHEMDNER
CHEMDNER (chemical compound and drug name recognition) task as part of the BioCreative IV challenge aims to promote the development of systems for the automatic recognition of chemical entities in text. It was divided into two tasks: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). Here, we only focus on the CEM task.
The CHEMDNER corpus consists of 10,000 PubMed abstracts, which contains a total of 84,355 chemical entity mentions. The original corpus is divided into training set (3,500 abstracts), development set (3,500 abstracts) and test set (3,000 abstracts)
|
Method |
P |
R |
F1 |
Paper |
|
tmChem (Leaman et al., 2015) |
89.09 |
85.75 |
87.39 |
tmChem: a high performance approach for chemical named entity recognition and normalization |
|
CRF (Lu et al., 2015) |
88.73 |
87.41 |
88.06 |
CHEMDNER system with mixed conditional random fields and multi-scale word clustering |
|
BiLSTM-CRF (Lample et al., 2016), Luo et al. (2018) rebuilt the model on the dataset |
91.31 |
87.73 |
89.48 |
|
|
Att-BiLSTM-CRF (Luo et al., 2018) |
92.29 |
90.01 |
91.14 |
An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition |
|
MTM-CW (Wang et al., 2019) |
91.30 |
87.53 |
89.37 |
Cross-type biomedical named entity recognition with deep multi-task learning |
|
BioBERT (Lee et al., 2019) |
92.80 |
91.92 |
92.36 |
BioBERT: a pre-trained biomedical language representation model for biomedical text mining |
CDR-Chemical
CDR (chemical disease relation) task as part of the BioCreative V challenge aims to automatically extract CDRs from the literature. The CDR corpus consists of 1,500 PubMed abstracts with annotated chemicals, diseases and chemical-disease interactions, which contains a total of 15,933 chemical entity mentions. The original corpus is separated into training set (500 abstracts), development set (500 abstracts) and test set (500 abstracts).
|
Method |
P |
R |
F1 |
Paper |
|
TaggerOne (Leaman and Lu, 2016) |
94.20 |
88.80 |
91.40 |
TaggerOne: joint named entity recognition and normalization with semi-Markov Models |
|
BiLSTM-CRF (Lample et al., 2016), Luo et al. (2018) rebuilt the model on the dataset |
92.82 |
88.52 |
90.62 |
|
|
Att-BiLSTM-CRF (Luo et al., 2018) |
93.49 |
91.68 |
92.57 |
An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition |
|
D3NER (Dang et al., 2018) |
93.73 |
92.56 |
93.14 |
|
|
CollaboNet (Yoon et al., 2018) |
94.26 |
92.38 |
93.31 |
CollaboNet: Collaboration of deep neural networks for biomedical named entity recognition |
|
ELMo (Peng et al., 2019) |
- |
- |
91.5 |
|
|
BERT (Peng et al., 2019) |
- |
- |
93.5 |
|
|
BioBERT (Lee et al., 2019) |
93.68 |
93.26 |
93.47 |
BioBERT: a pre-trained biomedical language representation model for biomedical text mining |
Disease NER
NCBI-Disease
The NCBI Disease
corpus consists of 793 PubMed abstracts separated into training
(593), development (100) and test (100) subsets. It contains a total of 6,892
disease entity mentions.
|
Method |
P |
R |
F1 |
Paper |
|
DNorm (Leaman et al., 2015) |
82.2 |
77.5 |
79.8 |
DNorm: Disease name normalization with pairwise learning to rank |
|
TaggerOne (Leaman and Lu, 2016) |
85.1 |
80.8 |
82.9 |
TaggerOne: joint named entity recognition and normalization with |
|
MCNN (Zhao et al., 2017) |
85.08 |
85.26 |
85.17 |
|
|
BiLSTM-CRF (Lample et al., 2016), |
86.11 |
85.49 |
85.80 |
|
|
CollaboNet (Yoon et al., 2018) |
85.61 |
82.61 |
84.08 |
CollaboNet: Collaboration of deep neural networks for biomedical named |
|
GRAM-CNN (Zhu et al., 2018) |
86.46 |
88.07 |
87.26 |
|
|
MTM-CW (Wang et al., 2019) |
85.86 |
86.42 |
86.14 |
Cross-type biomedical named entity recognition with deep multi-task |
|
Dic-Att-BiLSTM-CRF (Xu et al., |
88.3 |
89.0 |
88.6 |
|
|
BioBERT (Lee et al., 2019) |
88.22 |
91.25 |
89.71 |
BioBERT: a pre-trained biomedical language representation model for |
CDR-Disease
CDR (chemical disease relation) task as part of the BioCreative V
challenge aims to automatically extract CDRs from the literature. The CDR
corpus consists of 1,500 PubMed abstracts with annotated chemicals, diseases
and chemical-disease interactions, which contains a total of 12,864 disease
entity mentions. The original corpus is separated into training set (500
abstracts), development set (500 abstracts) and test set (500 abstracts).
|
Method |
P |
R |
F1 |
Paper |
|
HITSZ_CDR (Li et al., 2016) |
88.68 |
85.23 |
86.93 |
HITSZ_CDR: an end-to-end chemical and disease relation extraction |
|
TaggerOne (Leaman and Lu, 2016) |
85.2 |
80.2 |
82.6 |
TaggerOne: joint named entity recognition and normalization with |
|
BiLSTM-CRF (Lample et al., 2016), |
87.60 |
86.25 |
86.92 |
|
|
MCNN (Zhao et al., 2017) |
88.20 |
87.46 |
87.83 |
|
|
Transition-based joint model (Lou |
89.61 |
83.09 |
86.23 |
A Transition-based Joint Model for Disease Named Entity Recognition and |
|
CollaboNet (Yoon et al., 2018) |
85.61 |
82.61 |
84.08 |
CollaboNet: Collaboration of deep neural networks for biomedical named |
|
MTM-CW (Wang et al., 2019) |
89.10 |
88.47 |
88.78 |
Cross-type biomedical named entity recognition with deep multi-task |
|
Dic-Att-BiLSTM-CRF (Xu et al., |
89.1 |
87.5 |
88.3 |
|
|
BERT (Peng et al., 2019) |
- |
- |
86.6 |
|
|
BioBERT (Lee et al., 2019) |
86.47 |
87.84 |
87.15 |
BioBERT: a pre-trained biomedical language representation model for |
Gene/Protein NER
BC2GM
Gene Mention Tagging task as part of the BioCreative II
challenge is concerned with the named entity extraction of gene and gene
product mentions in text. The BC2GM corpus contains a total of 24,583 gene
entity mentions.
|
Method |
P |
R |
F1 |
Paper |
|
BiLSTM-CRF (Lample et al., 2016), |
81.57 |
79.48 |
80.51 |
|
|
CollaboNet (Yoon et al., 2018) |
80.49 |
78.99 |
79.73 |
CollaboNet: Collaboration of deep neural networks for biomedical named |
|
BiLM-NER (Sachan et al., 2018) |
81.81 |
81.57 |
81.69 |
|
|
MTM-CW (Wang et al., 2019) |
82.10 |
79.42 |
80.74 |
Cross-type biomedical named entity recognition with deep multi-task |
|
BioBERT (Lee et al., 2019) |
84.32 |
85.12 |
84.72 |
BioBERT: a pre-trained biomedical language representation model for |
JNLPBA
JNLPBA corpus contains 2,404 abstracts extracted from MEDLINE using
the MeSH terms “human”, “blood- cell” and “transcription factor”. The manual annotation
of these abstracts was based on five classes of the GENIA ontology, namely
protein, DNA, RNA, cell line, and cell type. This corpus was used in the
Bio-Entity Recognition Task in BioNLP/NLPBA 2004, providing 2,000 abstracts for
training and the remaining 404 abstracts for testing. The overall results are
shown in the following table.
|
Method |
P |
R |
F1 |
Paper |
|
SVM (Zhou and Su., 2004) |
69.42 |
75.99 |
72.55 |
Exploring Deep |
|
CRF_NERBio (Tsai et al., 2006) |
72.01 |
73.98 |
72.98 |
|
|
BiLSTM-CRF (Lample et al., 2016), |
71.35 |
75.74 |
73.48 |
|
|
BiLM-NER (Sachan et al., 2018) |
71.39 |
79.06 |
75.03 |
|
|
CollaboNet (Yoon et al., 2018) |
74.43 |
83.22 |
78.58 |
CollaboNet: Collaboration of deep neural networks for biomedical named |
|
MTM-CW (Wang et al., 2019) |
70.91 |
76.34 |
73.52 |
Cross-type biomedical named entity recognition with deep multi-task |
|
BioBERT (Lee et al., 2019) |
72.24 |
83.56 |
77.49 |
BioBERT: a pre-trained biomedical language representation model for |
Mutation NER
MuatationFinder corpus and tmVar
corpus
The MutationFinder corpus was established to guide the
construction of the patterns. The development data set is made up of 605 point
mutation mentions in 305 abstracts selected randomly from primary citations in
PDB. The evaluation data set is made up of 910 point mutation mentions in 508
abstracts annotated by two of the authors, not involved in the development of
the system.
The tmVar corpus comprises 500 abstracts manually annotated from which
334 were used for training tmVar while the remaining 166 were used for testing
it.
|
Method |
MF-P |
MF-R |
MF-F1 |
tmvar-P |
tmvar-R |
tmvar-F1 |
Paper |
|
MutationFinder (Caporaso et al., |
98.41 |
81.92 |
89.41 |
89.66 |
69.15 |
78.08 |
MutationFinder: A high-performance system for extracting point mutation |
|
tmVar (Wei et al., 2013) |
98.80 |
89.62 |
93.98 |
91.38 |
91.40 |
91.39 |
tmVar: a text mining approach for extracting sequence variants in biomedical |
|
SETH (Thomas et al., 2016) |
98 |
82 |
89 |
95 |
77 |
85 |
|
|
Character-based network (Thomas et |
- |
- |
- |
88.1 |
86.6 |
87.4 |
Recognition of |
Species NER
LINNAEUS corpus
The LINNAEUS corpus: A set of open access documents in text format, manually
annotated for species mention tags. It consists of 100 full-text documents from
the PMC OA document, which contains a total of 4,259 species entity mentions.
|
Method |
P |
R |
F1 |
Paper |
|
LINNAEUS (Gerner et al., 2010) |
97.07 |
94.28 |
95.65 |
LINNAEUS : A species name identification system for biomedical |
|
SR4GN (Wei et al., 2012) |
86 |
85 |
86 |
SR4GN : A Species Recognition Software Tool for Gene Normalization |
|
BiLSTM-CRF (Habibi et al., 2017) |
- |
- |
94.03 |
Deep learning with word embeddings improves biomedical named entity |
|
Transfer learning-based model |
92.80 |
94.29 |
93.54 |
Transfer learning for biomedical named entity recognition with neural |
|
BioBERT (Lee et al., 2019) |
93.84 |
86.11 |
89.81 |
BioBERT: a pre-trained biomedical language representation model for |
生物医学命名实体识别(BioNER)研究进展的更多相关文章
- 中文电子病历命名实体识别(CNER)研究进展
中文电子病历命名实体识别(CNER)研究进展 中文电子病历命名实体识别(Chinese Clinical Named Entity Recognition, Chinese-CNER)任务目标是从给定 ...
- 神经网络结构在命名实体识别(NER)中的应用
神经网络结构在命名实体识别(NER)中的应用 近年来,基于神经网络的深度学习方法在自然语言处理领域已经取得了不少进展.作为NLP领域的基础任务-命名实体识别(Named Entity Recognit ...
- 【神经网络】神经网络结构在命名实体识别(NER)中的应用
命名实体识别(Named Entity Recognition,NER)就是从一段自然语言文本中找出相关实体,并标注出其位置以及类型,如下图.它是NLP领域中一些复杂任务(例如关系抽取,信息检索等)的 ...
- 『深度应用』NLP命名实体识别(NER)开源实战教程
近几年来,基于神经网络的深度学习方法在计算机视觉.语音识别等领域取得了巨大成功,另外在自然语言处理领域也取得了不少进展.在NLP的关键性基础任务—命名实体识别(Named Entity Recogni ...
- 【转】基于VSM的命名实体识别、歧义消解和指代消解
原文地址:http://blog.csdn.net/eastmount/article/details/48566671 版权声明:本文为博主原创文章,转载请注明CSDN博客源地址!共同学习,一起进步 ...
- 2. 知识图谱-命名实体识别(NER)详解
1. 通俗易懂解释知识图谱(Knowledge Graph) 2. 知识图谱-命名实体识别(NER)详解 3. 哈工大LTP解析 1. 前言 在解了知识图谱的全貌之后,我们现在慢慢的开始深入的学习知识 ...
- DL4NLP —— 序列标注:BiLSTM-CRF模型做基于字的中文命名实体识别
三个月之前 NLP 课程结课,我们做的是命名实体识别的实验.在MSRA的简体中文NER语料(我是从这里下载的,非官方出品,可能不是SIGHAN 2006 Bakeoff-3评测所使用的原版语料)上训练 ...
- 用深度学习做命名实体识别(六)-BERT介绍
什么是BERT? BERT,全称是Bidirectional Encoder Representations from Transformers.可以理解为一种以Transformers为主要框架的双 ...
- 【NER】对命名实体识别(槽位填充)的一些认识
命名实体识别 1. 问题定义 广义的命名实体识别是指识别出待处理文本中三大类(实体类.时间类和数字类).七小类(人名.机构名.地名.日期.货币和百分比)命名实体.但实际应用中不只是识别上述所说的实体类 ...
随机推荐
- NDK Cmake
CMake与NDK搭配使用时,可以配置的部分变量: 1. `ANDROID_PLATFORM`:指定Android的目标版本,对应`$NDK/platforms/`目录下的版本.通常情况下是`defa ...
- 打造适用于c#的feign
之前因为工作原因使用spring cloud全家桶开发过若干项目,发现其中的feign非常好用,以前开发接口客户端的时候都是重复使用HttpClient实现业务,每次新增接口都十分繁琐,故萌生了自定义 ...
- Java中时间格式处理,指定N天/小时等之后的时间
1)根据当前时间,获取具体的时刻的时间 N天前 M小时之前 可用 new Date().getTime() - 24 * 60 * 60 * 1000*N[N天之前]的方法来获取处理时间之后的具体的值 ...
- Count the string[KMP]HDU3336
题库链接http://acm.hdu.edu.cn/showproblem.php?pid=3336 这道题是KMP的next数组的一个简单使用,首先要理解next数组的现实意义:next[i]表示模 ...
- python多进程通信实例分析
操作系统会为每一个创建的进程分配一个独立的地址空间,不同进程的地址空间是完全隔离的,因此如果不加其他的措施,他们完全感觉不到彼此的存在.那么进程之间怎么进行通信?他们之间的关联是怎样的?实现原理是什么 ...
- 用户数从 0 到亿,我的 K8s 踩坑血泪史
作者 | 平名 阿里服务端开发技术专家 导读:容器服务 Kubernetes 是目前炙手可热的云原生基础设施,作者过去一年上线了一个用户数极速增长的应用:该应用一个月内日活用户从零至四千万,用户数从零 ...
- Elasticsearch和solr之我为什么选择solr
老大:这个项目需要用到搜索引擎,小李你去学习一下. 小李:喳! 小李:以前用过的搜索引擎是solr4.7,那已经是两年前使用的了不知道现在有没有更好的解决方案了呢? 小李打开了google,百度,bi ...
- POJ-2502 Subway( 最短路 )
题目链接:http://poj.org/problem?id=2502 Description You have just moved from a quiet Waterloo neighbourh ...
- 并发、线程的基本概念&线程启动结束
并发.进程.可执行程序.进程.线程的基本概念 1.并发 并发当有多个线程在操作时,如果系统只有一个CPU,则它根本不可能真正同时进行一个以上的线程,它只能把CPU运行时间划分成若干个时间段,再将时间段 ...
- 百度之星资格赛 调查问卷 bitset模板(直接将字符串转化成二进制数组并可以计算出十进制值)
Problem Description 度度熊为了完成毕业论文,需要收集一些数据来支撑他的论据,于是设计了一份包含 mm 个问题的调查问卷,每个问题只有 'A' 和 'B' 两种选项. 将问卷散发出去 ...