the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. vue中nextTick和$nextTick的差别

    <ul id="demo">     <li v-for="item in list">{{item}}</div> < ...

  2. angular4,angular6 父组件异步获取数据传值子组件 undefined 问题

    通过输入和输出属性 实现数据在父子组件的交互在子组件内部使用@input接受父组件传入数据,使用@output传出数据到父组件详细标准讲解参考官方文档https://angular.cn/guide/ ...

  3. Oracle如何解决日期格式“01-3月 -18”的显示问题(3种处理方式)

    今天查询表数据还是出现上次那种问题,但是每次都要去调用转化函数,比较麻烦,所以找一下资料,得到几种方式解决oralce的日期数据显示格式 问题描述: 解决方法 1)方法1:调用Oracle函数转化成日 ...

  4. 蓝桥杯—BASIC-21 sine之舞(递归递推)

    题目:最近FJ为他的奶牛们开设了数学分析课,FJ知道若要学好这门课,必须有一个好的三角函数,所以他准备和奶牛们做一个“Sine之舞”的游戏,寓教于乐,提高奶牛们的计算能力. 不妨设 An=sin(1– ...

  5. unity中实现三个Logo图片进行若隐若现的切换并有延时切换图片的效果

    public GameObject canvas; private Transform logoParent; private Transform Logo_logo; //logo一 private ...

  6. js两种打开新窗口

    1.超链接<a href="http://www.jb51.net" title="脚本之家">Welcome</a> 等效于js代码 ...

  7. jvm常用参数

    -Xms512m:初始堆内存 -Xmx512m:最大堆内存 -XX:PermSize=256m:初始永久代内存(方法区,非堆) -XX:MaxPermSize=256m:最大永久代内存(方法区,非堆) ...

  8. 和的java程序

  9. mac下VirtualBox跟linux虚拟机共享文件夹

    1.在VirtualBox中设置好共享目录,设置自动挂载/固定分配 2.安装增强工具,为了避免安装出错需要安装依赖文件 #更新内核. yum update kernel#需要安装相应的kernel-d ...

  10. macOS Sierra 如何打开任何来源

    1.打开应用程序-实用工具-终端: 2.复制以下代码(红色处注意是两个-)到终端中,回车(输入电脑密码): sudo spctl --master-disable 3.打开应用程序-系统偏好设置-安全 ...