the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.

hyphenator:

h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]

the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.

语言:葡萄牙语

one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)

3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.

hypotaxis 从属结构
parataxis 并列结构

4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.

名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长

Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章

  1. DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)

    两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...

  2. Measuring Text Difficulty Using Parse-Tree Frequency

    https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...

  3. OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper

    这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...

  4. hiho一下 第一百零七周 Give My Text Back(微软笔试题)

    题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...

  5. Give My Text Back

    Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...

  6. Text Style Transfer论文笔记

    Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...

  7. Official Program for CVPR 2015

    From:  http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...

  8. SAP常用命令及BASIS操作

    Pfcg         角色,权限参数文件配置Su53        查看权限对象  st01  跟踪St22         看dump,以分析错误  eg.找到ABAP程序出错的地方,找出fou ...

  9. SAP ECC FI配置文档

    SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...

随机推荐

  1. Linux只下载不安装软件包

    有时我们并不需要安装软件而只要下载软件包. 包格式 命令 命令所属包 命令下载格式 rpm yumdownloader yum-utils yumdownloader package_name deb ...

  2. Windows服务手动关闭教程

    Windows上关闭软件自启动,经常使用360等软件的开机加速功能去优化. 但有时候有些服务开机是启动的但在360中并没有找到那如何手动关闭这些服务的自启动呢? 下边以Autodesk Applica ...

  3. Win10系列:VC++媒体播放控制4

    (7)音量控制 MediaElement控件具有一个Volume属性,通过设置此属性的值可以改变视频音量的大小.接下来介绍如何实现视频的音量控制,首先打开MainPage.xaml文件,并在Grid元 ...

  4. Python之路-python基础二(补充)

    本章内容: 三元运算 八进制,十六进制,十进制与二进制的转换 集合的修改方法 字符串常用方法            三元运算  三元运算简化了if else的语句,将四行代码简化为一行.三元运算的格式 ...

  5. MATLAB 图片折腾4

    重新安排矩阵的x,y,z , 在二维中就相当于把x,y 对换,在三维中相当于可以把三个坐标的位置互换. 比如A = A(:,:,1)=repmat(1,3,3);A(:,:,2)=repmat(2,3 ...

  6. Cracking The Coding Interview 1.4

    //Write a method to decide if two strings are anagrams or not. // // 变位词(anagrams)指的是组成两个单词的字符相同,但位置 ...

  7. EMIF接口与FPGA的互联(转)

    reference: https://blog.csdn.net/ruby97/article/details/7539151 DSP6455的EMIFA模块 之前介绍了DSP6455的GPIO和中断 ...

  8. 全栈框架mk-js

    今天听朋友说,才知道原来还有全栈框架这么一说. 厉害了. meteor EggBorn.js mk-js cordova 记录下,后面研究研究.

  9. activiti 插件安装,以及初始化配置

    1.安装插件 2.添加pom 3.配置activiti.cfg.xml 4.绘制业务流程图 MyProcess.bpmn 5.加载activiti数据表 6.创建流程 1.安装eclipse acti ...

  10. codeforces546D(从一个数中拆分素数)

    D. Soldier and Number Game time limit per test 3 seconds memory limit per test 256 megabytes input s ...