the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. 开发环境转Mac FAQ

    vs2017 for mac, 默认的源代码管理工具是git, 不是svn, 安装source tree,注册bitbucket(免费1G私有空间),整合的比较好(国内的码云也能支持,不过是用账号密码 ...

  2. 服务消费和负载(Feign)

    Spring Cloud Feign Spring Cloud Feign是一套基于Netflix Feign实现的声明式服务调用客户端.它使得编写Web服务客户端变得更加简单.我们只需要通过创建接口 ...

  3. xadmin后台导出时gunicorn报错ascii

    django + xadmin + nginx + gunicorn部署后,xadmin后台导出model数据报错,gunicorn日志记录为:UnicodeEncodeError: 'ascii' ...

  4. unity中制作模拟第一人称视角下的指南针

    private int zRotation; public GameObject obj; public void Update() { //obj = GameObject.Find("C ...

  5. ural1517

    题解: 后缀数组 求一下最长公共字串 代码: #include<cstdio> #include<cmath> #include<algorithm> #inclu ...

  6. 《Python》并发编程

    手工操作 —— 穿孔卡片 1946年第一台计算机诞生--20世纪50年代中期,计算机工作还在采用手工操作方式.此时还没有操作系统的概念.     程序员将对应于程序和数据的已穿孔的纸带(或卡片)装入输 ...

  7. python常用内建模块 collections,bs64,struct,hashlib,itertools,contextlib,xml

    #  2  collections 是Python内建的一个集合模块,提供了许多有用的集合类. # 2.1 namedtuple #tuple可以表示不变集合,例如,一个点的二维坐标就可以表示成: p ...

  8. Azulão--青鸟--IPA--巴西葡萄牙语

    这是巴西很有名的民谣.

  9. python学习例子

    http://www.runoob.com/python/python-100-examples.html

  10. Zabbix4.0监控URL

    一:新建群组 1.1:web monitor 二:新建模板 2.1:配置-模板-模板 2.3:创建应用集 配置-模板-web monitor-应用集 2.4:创建web场景 2.5:创建场景步骤: 以 ...