Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and two corpora, previously annotated by readability level.
hyphenator:
h_en.pairs('beautiful'
[['beau', 'tiful'], [u'beauti', 'ful']]
the system extracts 52 features, grouped in 7 groups: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, and some extra features.
语言:葡萄牙语
one based on a five-levels scale
(A1, A2, B1, B2, C1)
a second experiment based in a simplified
three-levels scale (A, B and C)
3 nlp工具
STRING:相当于葡萄牙语的nltk
The YAH Hyphenator:This is a rule-based system that applies
various word processing division rules.
hypotaxis 从属结构
parataxis 并列结构
4 特征
The set of 52 features extracted by the system consists
in: (i) part-of-speech (POS) tags, chunks, words
and sentences features; (ii) verb features and different
metrics involving averages and frequencies; (iii)
several metrics involving syllables; and (iv) extra features.
名词、命名体识别对文本理解很重要
句法结构:名词短语、介词短语
助动词可以形成更长更复杂的动词链
hypotaxis 从属结构
parataxis 并列结构
Word frequency:unigram-based,拉普拉斯平滑
动词、名词比例,句长
Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper的更多相关文章
- DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
- Measuring Text Difficulty Using Parse-Tree Frequency
https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf In p ...
- OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification-paper
这篇论文的related work非常详尽地介绍了各种readability的语料 abstract这个paper描述了onestopengilish这个三个level的文本语料的收集和整理,阐述了再 ...
- hiho一下 第一百零七周 Give My Text Back(微软笔试题)
题目1 : Give My Text Back 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exam Littl ...
- Give My Text Back
Give My Text Back 标签(空格分隔): 算法 时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 To prepare for the English exa ...
- Text Style Transfer论文笔记
Text Style Transfer主要是指Non-Parallel Data条件下的,具体的paper list见: https://github.com/fuzhenxin/Style-Tran ...
- Official Program for CVPR 2015
From: http://www.pamitc.org/cvpr15/program.php Official Program for CVPR 2015 Monday, June 8 8:30am ...
- SAP常用命令及BASIS操作
Pfcg 角色,权限参数文件配置Su53 查看权限对象 st01 跟踪St22 看dump,以分析错误 eg.找到ABAP程序出错的地方,找出fou ...
- SAP ECC FI配置文档
SAP ECC 6.0 Configuration Document Financial Accounting (FI) Table of Content TOC \O "1-2" ...
随机推荐
- Struts 2 初步入门(三)
接Struts 2初步入门(二) 若想用多个通配符设定访问: <struts> <package name="default" namespace="/ ...
- vm安装diagram
xxx1234ZZ xxx1234ZZ@
- Java Web(二) Servlet详解
什么是Servlet? Servlet是运行在Web服务器中的Java程序.Servlet通常通过HTTP(超文本传输协议)接收和响应来自Web客户端的请求.Java Web应用程序中所有的请求-响应 ...
- css3 min-content,max-content,fit-content, fill属性
css3里有四个属性,用来实现以内容为主的尺寸计算方式,intrinsic sizing min-content max-content fit-content fill 其中 fill 关键字,需要 ...
- mysql不会使用索引,导致全表扫描情况
不会使用索引,导致全表扫描情况 1.不要使用in操作符,这样数据库会进行全表扫描,推荐方案:在业务密集的SQL当中尽量不采用IN操作符 2.not in 使用not in也不会走索引推荐方案:用not ...
- ubuntu compile openjdk87
0. use oracle JDK,not OpenJDK 1. 遇到错误Error:./gamma: relocation error: /usr/lib/jvm/java-7-openjdk-am ...
- linux:centOs7换源阿里云
备份: mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 下载: wget -O /etc/y ...
- 给msde加装企业管理器
-=给msde加装企业管理器=- 首先,反对所谓的绿色版,运行那是 相~~~当 不稳定,自动关闭,要你有什么用?还广告飞扬!为了调试,花了我整整一天的时间.给大家节省的时间,也为了让大家少走点弯路. ...
- 手动配置 Windows 时间服务
手动配置 Windows 时间服务 要将内部时间服务器配置为与外部时间源同步,请按照下列步骤操作: 将服务器类型更改为 NTP. 为此,请按照下列步骤操作: 选择 “开始” . “运行”,键入 reg ...
- 每天CSS学习之box-shadow
box-shadow是CSS3的属性,目的是给盒子添加一个或多个阴影.怎么感觉有点像光明使者使用该法术照亮敌人的阴暗面? box-shadow一共有六个属性,请看: box-shadow: h-sha ...