Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.
hyphenator:
h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]
the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.
语言:葡萄牙语
one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)
3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.
hypotaxis 从属结构
parataxis 并列结构
4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.
名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长
Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章
- DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
- Measuring Text Difficulty Using Parse-Tree Frequency
https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...
- OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper
这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...
- hiho一下 第一百零七周 Give My Text Back(微软笔试题)
题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...
- Give My Text Back
Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...
- Text Style Transfer论文笔记
Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...
- Official Program for CVPR 2015
From: http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...
- SAP常用命令及BASIS操作
Pfcg 角色,权限参数文件配置Su53 查看权限对象 st01 跟踪St22 看dump,以分析错误 eg.找到ABAP程序出错的地方,找出fou ...
- SAP ECC FI配置文档
SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...
随机推荐
- 开发环境转Mac FAQ
vs2017 for mac, 默认的源代码管理工具是git, 不是svn, 安装source tree,注册bitbucket(免费1G私有空间),整合的比较好(国内的码云也能支持,不过是用账号密码 ...
- 服务消费和负载(Feign)
Spring Cloud Feign Spring Cloud Feign是一套基于Netflix Feign实现的声明式服务调用客户端.它使得编写Web服务客户端变得更加简单.我们只需要通过创建接口 ...
- xadmin后台导出时gunicorn报错ascii
django + xadmin + nginx + gunicorn部署后,xadmin后台导出model数据报错,gunicorn日志记录为:UnicodeEncodeError: 'ascii' ...
- unity中制作模拟第一人称视角下的指南针
private int zRotation; public GameObject obj; public void Update() { //obj = GameObject.Find("C ...
- ural1517
题解: 后缀数组 求一下最长公共字串 代码: #include<cstdio> #include<cmath> #include<algorithm> #inclu ...
- 《Python》并发编程
手工操作 —— 穿孔卡片 1946年第一台计算机诞生--20世纪50年代中期,计算机工作还在采用手工操作方式.此时还没有操作系统的概念. 程序员将对应于程序和数据的已穿孔的纸带(或卡片)装入输 ...
- python常用内建模块 collections,bs64,struct,hashlib,itertools,contextlib,xml
# 2 collections 是Python内建的一个集合模块,提供了许多有用的集合类. # 2.1 namedtuple #tuple可以表示不变集合,例如,一个点的二维坐标就可以表示成: p ...
- Azulão--青鸟--IPA--巴西葡萄牙语
这是巴西很有名的民谣.
- python学习例子
http://www.runoob.com/python/python-100-examples.html
- Zabbix4.0监控URL
一:新建群组 1.1:web monitor 二:新建模板 2.1:配置-模板-模板 2.3:创建应用集 配置-模板-web monitor-应用集 2.4:创建web场景 2.5:创建场景步骤: 以 ...