Measuring Text Difficulty Using Parse-Tree Frequency
https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf
In previous work, we conducted a preliminary corpus study of grammar frequency which showed that difficult texts use a wider variety of high-level grammatical structures (Kauchak et al., 2012). However, because of the large number of structural variations possible, no clear indication was found showing specific structures predominantly appearing in either easy or difficult documents.
In this work, we propose a much more fine-grained analysis. We propose a measure of text difficulty based on grammatical frequency and show how it can be used to identify sentences with difficult syntactic structures. In particular, the grammatical difficulty of a sentence is measured based on the frequency of occurrence of the top-level parse tree structure of the sentence in a large corpus
根据term familiarity创造了grammer familiarity的概念:
Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree
实验:
将wiki根据第三等级的parse tree分成了11 bins,计算每个bin出现的频率;
然后每个bin随机挑选20个句子(抛去句子长度和ter familiarity的影响),招募了三十几个人对句子评分(5 points)以及完形填空;
结果发现,即使表现看不出区别的句子,3rd parse tree出现频率越高的句子,评分越简单,所花时间越短
假设:
examine how grammatical frequency impacts the difficulty of a sentence and introduce a new measure of sentence-level text difficulty based on the grammatical structure of the sentence.
句子难度氛围perceived和actual
Our work here makes a step towards better simplification tools by 1) introducing a sentence-level, data-driven approach for measuring the grammatical difficulty of a sentence and 2) specifically measuring the impact of this measure using both how difficult a sentence looks (perceived difficulty) as well as how difficult a sentence is to understand (actual difficulty).
实词和动词在简单文本中更普遍
simple texts use simpler words, fewer overall words and words that are more general (Coster & Kauchak, 2011; Napoles & Dredze, 2010; Zhu, Bernhard, & Gurevych, 2010). Certain types of words have also been found to be more prevalent in simpler texts including function words and verbs (Kauchak, Leroy, & Coster, 2012; Leroy & Endicott, 2011).
The Role of Syntax in Simplification
The syntax or grammar of a language dictates how words and phrases interact to form sentences。
splitting long sentences has been show to improve Cloze scores (Kandula, Curtis, & Zeng-Treitler, 2010) and additive and causal表原因或目的connectors were easier to fill in than adversative or sequential connectors转折或者表时间顺序的连词 (Goldman & Murray, 1992). It has been suggested that grammatical difficulty is particularly important for L2 learners since they are still trying to learn appropriate grammatical structures for the language (Callan & Eskenazi, 2007; Clahsen & Felser, 2006).
(注:LOGICAL CONNECTORS https://staff.washington.edu/marynell/grammar/logicalconnectors.html)
简单文本更高比例的使用动词、function words和副词,难文本更高比例使用形容词和名词、更长的名词短语;医疗文本中,简单文本更容易使用主动语态;subject-verb-object versus object-subject-verb ordering也有区别
Some initial success has been achieved by automated simplification systems that perform syntactic transformations, 例如减少介词短语、不定式、改变动词时态
如何选择parse tree structure
We chose to focus on the 3rd level since it represents a compromise between generality and specificity.
45% of sentences in the corpus (2.47M) have unique 4th level parse tree structures, often because the 4th level regularly includes lexical components. 这样包含单词之后,很难泛华到其他句子
To remove anomalous data and likely misparses, we ignored any structure that had only been seen once among the 5.4 millions sentences. After filtering, this results in 139,969 unique 3rd level structures.
表一:
two sentences that have the same 3rd level structure, but that have varying frequency, ordered from most frequent to least. Because we focus on the high-level structure, the length of the sentences with the same structure also can vary widely
图2
grammatical frequency follows a Zipf – like distribution, with the most common structures occurring very frequently and many structures occurring infrequently
This approach for measuring the grammatical difficulty of text represents a generalized and datadriven approach that goes beyond specific, theory-based grammatical components of text difficult (e.g. active vs. passive voice, self-embedded clauses, etc. (Meyer & Rice, 1984)) and provides a generic framework for measuring grammatical difficulty.超越了基于理论的语法成分
评估:
To minimize confounding factors that might influence sentence difficulty we control for sentence length and term familiarity
1)We ranked the 139,939 unique 3rd level structures and divided them into 11 frequency bins.第一个bin占签1%频率的structure,后面是个依次占10%
2)Each of the 5.4 million Wikipedia sentences can be mapped to one of the 11 frequency bins and we selected a subset of these for our study.
3)只取长度均值附近的句子,假设句子长度服从均匀分布,去除六分之一最长的和最短的,还剩三分之二的句子
4)每个bin随机选20句,探究grammer frequency和句子长度term familiarity的关系:
考虑句子长度:在剩下的句子中按照句子长度分为3个等级,每个bin选择的10个属于最长的句子和10属于最短的句子
考虑term familiarity:在剩下的句子中选择Google web语料,计算句子单词的familiarity的均值,按照familiarity分为3个等级,每个bin选择的10个属于familiarity得分最高的句子和20个最低的
--》
This process resulted in a sample of 220 sentences in 11 frequency bins with each bin containing 5 long sentences with high familiarity, 5 long with low familiarity, 5 short with high familiarity, and 5 short with low familiarity
For each of the 220 sentences, we recruited 30 participants for a total of N=6,600 samples. To ensure the quality and accuracy of the data, participants were restricted to be within the United States and to have a previous approval rating of 95%.
众包:MTurk is a crowdsourcing tool where requesters can upload tasks to be accomplished by a set of workers for a fee.
结果:
A paired-samples t–test showed our two control variables to be effective, with length significantly different between short and long sentences (t(10) = -60.47, p < 0.001) and word frequency significantly different between the high and low group (t(10) = -38.47, p < 0.001).
1)实际难度:To measure actual difficulty (first dependent variable) we used a Cloze test. The basic Cloze test involves replacing every nth word in a text with a blank. Participants are then asked to fill in the blanks and are scored based on how many of their answers matched the original text (Taylor, 1953).
We employed a multiple-choice Cloze test. For each sentence, four nouns were randomly selected and replaced with blanks. For each sentence, we create five multiple-choice options containing the four removed words in different random orders, one of which is the correct ordering.
2)To measure perceived difficulty (second dependent variable), participants were asked to rate the sentences on a 5-point Likert scale with higher numbers representing more difficult sentences.
Each condition (11 x 2 x 2) had 5 sentences and for each sentence we gathered 30 responses, resulting in a dataset of N=6,600. T
An ANOVA shows these differences to be significant (F(10,6556)= 3.453, p < 0.001), for grammar frequency and sentence length, and (F(10,6556)= 1.870, p = 0.044), for grammar frequency and term familiarity. In addition, the interaction between all three variables is also significant (F(10, 6556) = 4.650, p < 0.001)
(注:方差分析(Analysis of Variance,简称ANOVA),又称“变异数分析”)
1、独立样本T检验一般仅仅比较两组数据有没有区别,区别的显著性,如比较两组人的身高,体重等等,而这两组一般都是独立的,没有联系的,只是比较这两组数据有没有统计学上的区别或差异。
2、单因素ANOVA也就是单因素方差分析,是用来研究一个控制变量的不同水平是否对观测变量产生了显著影响。说白了就是分析x的变化对y的影响的显著性,所以一般变量之间存在某种影响关系的,验证一种变量的变化对另一种变量的影响显著性的检验。一般的,方差分析都是配对的。
更少见的structure更难~one-tailed Pearson correlation coefficient:
To complete this analysis and understand the strength of the effect on actual difficulty, we calculated a one-tailed Pearson correlation coefficient between the grammar frequency and the actual difficulty (percentage correct) for both the raw scores and scores aggregated by frequency bin. There was a negative correlation between grammar frequency and the actual difficulty of the sentence (raw scores: N = 6,600, r = -0.053, p < 0.01; bin averages: N = 11, r = -0.596, p < 0.05) indicating that sentences that used less frequent structures were harder to understand.
在中等structure frequency的时候,长句和短句的准确率差不多:
In contrast to actual difficulty, we also find a main effect of the sentence length on perceived difficulty with longer sentences seen as more difficult (averaged 2.2) than the shorter sentences (averaged 2.0). Surprisingly, there was no effect of the average term frequency on perceived difficulty.
The effect of grammar frequency on perceived difficulty is smaller in shorter sentences and those with lower term frequency
Both high and low frequency sentences show a jump in difficulty, though it occurs earlier (bin 7) for low frequency sentences than for high frequency sentences (bin 8)
we found a significant correlation between how well readers performed on the Cloze test and how difficult they thought a sentence was. Lower accuracy correlated with higher difficulty scores (N = 11, r = -0.574, p < 0.05; N = 6600, r = -0.203, p < 0.01)
Actual and perceived difficulty as measured in our user study for the 220 sentences binned by the Flesch-Kincaid grade level:
即使fk公式来看220个句子的难度差别很小,但是perceived difficulty确实很大的
GRAMMAR FAMILIARITY AS AN ANALYSIS TOOL:
corpus:
Each of the texts were tokenized and split into sentences using the Stanford CoreNLP toolkit and then parsed using the Berkeley Parser (the same preprocessing as the frequency bins)
总结:
1、阐述了现有的研究上,actual readability和perceived difficulty的不对等
2、阐述了parse tree leve3 frequency和actual和perceived difficulty的相关性,有效性
3、在短句中actual difficulty受到grammer的影响很小,因为shorter sentences are easy to understand and any effect of grammar is difficult to detect (ceiling effect)
Similarly, in sentences with low term familiarity (i.e. more difficult words) the grammar familiarity doesn’t impact text difficulty since users are struggling with the lexical difficulty
However, in sentences with very familiar terms, which are easier to understand, grammar frequency does have an impact on actual difficulty; only in sentences where the words are more familiar does the grammatical frequency have a strong effect. Interestingly, there was very little impact overall of term frequency on actual difficulty.
Based on these observations, we hypothesize that there is a relation between grammatical frequency and term frequency. Future studies are required to fully validate these hypotheses. Our study has limitations. Text comprehension was measured with individual
Measuring Text Difficulty Using Parse-Tree Frequency的更多相关文章
- 编译-构建Shell语法的语法树(parse tree)
翻译自:Generating a parse tree from a shell grammar - DEV Community
- Automatic Text Difficulty Classifier Assisting the Selection Of Adequate Reading Materials For European Portuguese Teaching --paper
the system uses existing Natural Language Processing (NLP) tools, a parser and an hyphenator, and tw ...
- 工具类-Fastjson入门使用
简介 什么是Fastjson? fastjson是阿里巴巴的开源JSON解析库,它可以解析JSON格式的字符串,支持将Java Bean序列化为JSON字符串,也可以从JSON字符串反序列化到Java ...
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(38)-Easyui-accordion+tree漂亮的菜单导航
系列目录 本节主要知识点是easyui 的手风琴加树结构做菜单导航 有园友抱怨原来菜单非常难看,但是基于原有树形无限级别的设计,没有办法只能已树形展示 先来看原来的效果 改变后的效果,当然我已经做好了 ...
- 构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(38)-Easyui-accordion+tree漂亮的菜单导航
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(38)-Easyui-accordion+tree漂亮的菜单导航 系列目录 本节主要知识点是easyui ...
- DL4NLP —— seq2seq+attention机制的应用:文档自动摘要(Automatic Text Summarization)
两周以前读了些文档自动摘要的论文,并针对其中两篇( [2] 和 [3] )做了presentation.下面把相关内容简单整理一下. 文本自动摘要(Automatic Text Summarizati ...
- go标准库的学习-text/template
参考:https://studygolang.com/pkgdoc 导入方式: import "text/template" template包实现了数据驱动的用于生成文本输出的模 ...
- READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification-paper
https://aclanthology.info/pdf/W/W11/W11-2308.pdf 2 background2000年以前 ----传统可读性准则局限于表面的文本特征,例如the Fle ...
- 无限分级和tree结构数据增删改【提供Demo下载】
无限分级 很多时候我们不确定等级关系的层级,这个时候就需要用到无限分级了. 说到无限分级,又要扯到递归调用了.(据说频繁递归是很耗性能的),在此我们需要先设计好表机构,用来存储无限分级的数据.当然,以 ...
随机推荐
- 该 URL“XX”无效。它可能指向不存在的文件或文件夹,或者是指向不在当前网站中的有效文件或文件夹
当使用SharePoint SPWeb.Files.Add()方法往文档库中写入文件时,报 该 URL“XX”无效.它可能指向不存在的文件或文件夹,或者是指向不在当前网站中的有效文件或文件夹, 原因 ...
- calc属性
在单个元素,或者多个可重复css元素使用, 如width: calc(100% - 100px);此元素宽度等于父级宽度下减100像素
- python format(格式化)
自 python 2.6 开始,新增了一种格式化字符串的函数str.format(),可谓威力十足.那么,他跟之前的%型格式化字符串相比,有什么优越的存在呢?让我们来揭开它羞答答的面纱.#语法它通过{ ...
- 用generator 根据oracle表生成java代码,数据库与代码字段不一致
前两天用generator生成java代码时发现,生成的javabean和数据库里的字段对应不上,不是少几个就是有几个字段不一样,感觉很怪异,后来发现日志里边这个表转换成bean是日志打印了好几遍,所 ...
- 深入java----java内存区域及对象的创建
看完深入理解jvm之后自己再用图的方式进行一遍梳理,用以加深理解. 第一部分,首先对整体java运行时内存区域有一个整体框架式的了解. 运行时内存区域的划分如上图所示,那么接下里看看一个对象的创建又怎 ...
- 结对作业收获_core组
收获:编码之前必须的思考是逃不掉的,而且这一步是磨刀不误砍柴工,而且会加速以后的步骤 分析: 首要重要的事情是:需要完成的功能,理清逻辑关系.我们要随机产生一定要求的算式,并且计算出算式的值. 其次的 ...
- 非关系数据库一Memcached
第三十四课 非关系数据库一Memcached 目录 一.nosql介绍 二.memrcached介绍 三.安装memcached 四.查看memcachedq状态 五.memcached命令行 六.m ...
- Python 在已创建的数据表添加字段报错问题
django.db.utils.IntegrityError: (1062, “Duplicate entry ’1234567891011’ for key_’dingdanid’”) 这个错误是之 ...
- Node.js和html数据交互(一) form表单
一.form表单提交数据 数据流向:前端 - > 后端 1.get方法 (action是提交路径,method是提交方法,get方法可以不写) 前端: <form action=" ...
- SharePoint Framework 基于团队的开发(五)
博客地址:http://blog.csdn.net/FoxDave 升级SharePoint Framework项目 部署SharePoint自定制解决方案到生产环境并不意味着生命周期的结束,因为还有 ...