4.7.6 Compaction of LR Parsing Tables
4.7.6 Compaction of LR Parsing Tables
A typical programming language grammar with 50 to 100 terminals and 100 productions may have an LALR parsing table with several hundred states. The action function may easily have 20,000 entries, each requiring at least 8 bits to encode. On small devices, a more efficient encoding than a two-dimensional array may be important. We shall mention briefly a few techniques that have been used to compress the ACTION and GOTO fields of an LR parsing table.
One useful technique for compacting the action field is to recognize that usually many rows of the action table are identical. For example, in Fig. 4.42, states 0 and 3 have identical action entries, and so do 2 and 6. We can therefore save considerable space, at little cost in time, if we create a pointer for each state into a one-dimensional array. Pointers for states with the same actions point to the same location. To access information from this array, we assign each terminal a number from zero to one less than the number of terminals, and we use this integer as an offset from the pointer value for each state. In a given state, the parsing action for the ith terminal will be found i locations past the pointer value for that state.
Further space efficiency can be achieved at the expense of a somewhat slower parser by creating a list for the actions of each state. The list consists of (terminal-symbol, action) pairs. The most frequent action for a state can be placed at the end of the list, and in place of a terminal we may use the notation “any,” meaning that if the current input symbol has not been found so far on the list, we should do that action no matter what the input is. Moreover, error entries can safely be replaced by reduce actions, for further uniformity along a row. The errors will be detected later, before a shift move.
Example 4.65: Consider the parsing table of Fig. 4.37. First, note that the actions for states 0, 4, 6, and 7 agree. We can represent them all by the list
|
SYMBOL |
ACTION |
|
id |
s5 |
|
( |
s4 |
|
any |
error |
State 1 has a similar list:
|
SYMBOL |
ACTION |
|
+ |
s6 |
|
$ |
acc |
|
any |
error |
In state 2, we can replace the error entries by r2, so reduction by production 2 will occur on any input but *. Thus the list for state 2 is
|
SYMBOL |
ACTION |
|
* |
s7 |
|
any |
r2 |
State 3 has only error and r4 entries. We can replace the former by the latter, so the list for state 3 consists of only the pair (any, r4). States 5, 10, and 11 can be treated similarly. The list for state 8 is
|
SYMBOL |
ACTION |
|
+ |
s6 |
|
) |
s11 |
|
any |
error |
and for state 9
|
SYMBOL |
ACTION |
|
* |
S7 |
|
any |
R1 |
□
We can also encode the GOTO table by a list, but here it app ears more efficient to make a list of pairs for each nonterminal A. Each pair on the list for A is of the form (currentState, nextState), indicating
GOTO [currentState, A] = nextState
This technique is useful because there tend to be rather few states in any one column of the GOTO table. The reason is that the GOTO on nonterminal A can only be a state derivable from a set of items in which some items have A immediately to the left of a dot. No set has items with X and Y immediately to the left of a dot if X ≠ Y. Thus, each state app ears in at most one GOTO column.
For more space reduction, we note that the error entries in the goto table are never consulted. We can therefore replace each error entry by the most common non-error entry in its column. This entry becomes the default; it is represented in the list for each column by one pair with any in place of currentState.
Example 4.66: Consider Fig. 4.37 again. The column for F has entry 10 for state 7, and all other entries are either 3 or error. We may replace error by 3 and create for column F the list
|
CURRENTSTATE |
NEXTSTATE |
|
7 |
10 |
|
any |
3 |
Similarly, a suitable list for column T is
|
CURRENTSTATE |
NEXTSTATE |
|
6 |
9 |
|
any |
2 |
For column E we may choose either 1 or 8 to be the default; two entries are necessary in either case. For example, we might create for column E the list
|
CURRENTSTATE |
NEXTSTATE |
|
4 |
8 |
|
any |
1 |
□
This space savings in these small examples may be misleading, because the total number of entries in the lists created in this example and the previous one together with the pointers from states to action lists and from nonterminals to next-state lists, result in unimpressive space savings over the matrix implementation of Fig. 4.37. For practical grammars, the space needed for the list representation is typically less than ten percent of that needed for the matrix representation. The table-compression methods for finite automata that were discussed in Section 3.9.8 can also be used to represent LR parsing tables.
4.7.6 Compaction of LR Parsing Tables的更多相关文章
- 4.7.3 Canonical LR(1) Parsing Tables
4.7.3 Canonical LR(1) Parsing Tables We now give the rules for constructing the LR(1) ACTION and GOT ...
- 4.7.5 Efficient Construction of LALR Parsing Tables
4.7.5 Efficient Construction of LALR Parsing Tables There are several modifications we can make to A ...
- 4.7.4 Constructing LALR Parsing Tables
4.7.4 Constructing LALR Parsing Tables We now introduce our last parser construction method, the LAL ...
- 4.4 Top-Down Parsing
4.4 Top-Down Parsing Top-down parsing can be viewed as the problem of constructing a parse tree for ...
- (转)Understanding C parsers generated by GNU Bison
原文链接:https://www.cs.uic.edu/~spopuri/cparser.html Satya Kiran PopuriGraduate StudentUniversity of Il ...
- 基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#)
基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#) 为了完美解析GLSL源码,获取其中的信息(都有哪些in/out/uniform等),我决定做个GLSL编译器的前端(以后简称编译器或 ...
- Donald Ervin Knuth:最年轻的图灵奖高德纳
高德纳(Donald Ervin Knuth,1938年),美国著名计算机科学家,斯坦福大学电脑系荣誉教授.高德纳教授被誉为现代计算机科学的鼻祖,在计算机科学及数学领域发表了多部 具广泛影响的论文和著 ...
- 4.8 Using Ambiguous Grammars
4.8 Using Ambiguous Grammars It is a fact that every ambiguous grammar fails to be LR and thus is no ...
- Lexer and parser generators (ocamllex, ocamlyacc)
Chapter 12 Lexer and parser generators (ocamllex, ocamlyacc) This chapter describes two program gene ...
随机推荐
- Linux中搭建FTP服务器
FTP工作原理 (1)FTP使用端口 [root@localhost ~]# cat /etc/services | grep ftp ftp-data 20/tcp #数据链路:端口20 ftp 2 ...
- Linux一键安装web环境全攻略phpstudy版
此教程主要是应对阿里云Linux云服务器ecs的web环境安装,理论上不限于阿里云服务器,此教程对所有Linux云服务器都具有参考价值. 写这篇文章的目的:网上有很多关于Linux一键安装web环境全 ...
- php - namespace篇
之前没有系统学习过PHP语言,直接上手TP框架了,所以认为namespace和use是TP框架的一部分,最近学习语言模块的时候遇到了这个问题,所以汇总了一下. PHP中命名空间可以解决两类问题: 用户 ...
- buf.swap16()
buf.swap16() 返回:{Buffer} 将 Buffer 解释执行为一个16位的无符号整数数组并以字节顺序交换到位.如果 Buffer 的长度不是16位的倍数,则抛出一个 RangeErro ...
- 谷歌应用商店chrome扩展程序和APP的发布流程
互联网上有很多大牛,他们再工作中需要一些难题,再找到解决办法后,如果会使用js的话,大多数人就可以自己动手写一个chrome插件,而且非常容易.开发人员都喜欢与大家分享自己的成就!google是一个全 ...
- Django所包含属性
Django包含的属性 定义属性 概述: 1.django根据属性的类型确定以下信息 2.当前选择的数据库支持字段的类型 3.渲染管理表单时使用的默认html空间 4.在管理站点最低限度的验证 注意: ...
- Quartz --quartz.properties
quartz.properties 如果项目中没有该配置文件,则会去jar包中读取自带配置文件 默认的配置如下 # Default Properties file for use by StdSche ...
- poj2325 大数除法+贪心
将输入的大数除以9 无法整除再除以 8,7,6,..2,如果可以整除就将除数记录,将商作为除数继续除9,8,...,3,2. 最后如果商为1 证明可以除尽 将被除过的数从小到大输出即可 #includ ...
- CentOS7 Failed to start LSB: Bring up/down解决方法(真正有效的方法)
刚刚装好的虚拟机突然不能上网了,报错很诡异,具体报错如下: /etc/init.d/network restart Restarting network (via systemctl): Job f ...
- orcad中的快捷键
在画原理图的时候,不能正常的将将要放下的器件与旁边的对其,一种解决办法是按F6,调出大的水平竖直线,在按F6,此线标消失. Ctrl+F8是全屏模式,关闭的方法暂时不知道,退出方式是点击按钮. F10 ...