the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. Mysql索引引起的死锁

    提到索引,首先想到的是效率提高,查询速度提升,不知不觉都会有一种心理趋向,管它三七二十一,先上个索引提高一下效率..但是索引其实也是暗藏杀机的... 今天压测带优化项目,开着Jmeter高并发访问项目 ...

  2. bzoj3412

    题解: 先把询问排序 然后根据单调性来做 代码: #include<bits/stdc++.h> using namespace std; ],b[],f[],ans[]; int cmp ...

  3. nginx:在linux上进行nginx的安装

    -----1.安装 换源 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 下载: wget ...

  4. day35 数据库介绍和初识sql

    今日内容: 1. 代码: 简易版socketsever 2.数据库(mysql)简单介绍和分类介绍 3.mysql root修改密码 4.修改字符集编码 5.初识sql语句 1.简易版socketse ...

  5. vue-router-3-嵌套路由

    <div id="app"> <router-view></router-view> </div> const User = { t ...

  6. 理解JavaScript的运行

    JavaScript可以运行在head和body标签中! HTML的脚本必须放在<script></script>标签中间! 浏览器会解释并执行位于script标签中的脚本! ...

  7. ADO.NET 连接池 Session 状态分析

    ADO.NET 中提供连接池避免 在业务操作中频繁打开,关闭连接. 当客户端释放连接后,连接池并未真正将数据库连接资源释放 , 而是根据连接字符串特征,将资源放到连接池中, 方便下次重用. 因此问题来 ...

  8. transiton,transform,animation,border-image

    animation,transition,transform三者联系与区别: https://www.jianshu.com/p/0e0e1903b80d transform: 使用小技巧: tran ...

  9. angular2的模板语法

    Angular 应用管理着用户之所见和所为,并通过 Component 类的实例(组件)和面向用户的模板来与用户交互. 从使用模型-视图-控制器 (MVC) 或模型-视图-视图模型 (MVVM) 的经 ...

  10. linux程序的常用保护机制

    操作系统提供了许多安全机制来尝试降低或阻止缓冲区溢出攻击带来的安全风险,包括DEP.ASLR等.在编写漏洞利用代码的时候,需要特别注意目标进程是否开启了DEP(Linux下对应NX).ASLR(Lin ...