4.7.6 Compaction of LR Parsing Tables

A typical programming language grammar with 50 to 100 terminals and 100 productions may have an LALR parsing table with several hundred states. The action function may easily have 20,000 entries, each requiring at least 8 bits to encode. On small devices, a more efficient encoding than a two-dimensional array may be important. We shall mention briefly a few techniques that have been used to compress the ACTION and GOTO fields of an LR parsing table.

One useful technique for compacting the action field is to recognize that usually many rows of the action table are identical. For example, in Fig. 4.42, states 0 and 3 have identical action entries, and so do 2 and 6. We can therefore save considerable space, at little cost in time, if we create a pointer for each state into a one-dimensional array. Pointers for states with the same actions point to the same location. To access information from this array, we assign each terminal a number from zero to one less than the number of terminals, and we use this integer as an offset from the pointer value for each state. In a given state, the parsing action for the ith terminal will be found i locations past the pointer value for that state.

Further space efficiency can be achieved at the expense of a somewhat slower parser by creating a list for the actions of each state. The list consists of (terminal-symbol, action) pairs. The most frequent action for a state can be placed at the end of the list, and in place of a terminal we may use the notation “any,” meaning that if the current input symbol has not been found so far on the list, we should do that action no matter what the input is. Moreover, error entries can safely be replaced by reduce actions, for further uniformity along a row. The errors will be detected later, before a shift move.

Example 4.65: Consider the parsing table of Fig. 4.37. First, note that the actions for states 0, 4, 6, and 7 agree. We can represent them all by the list

SYMBOL

ACTION

id

s5

(

s4

any

error

State 1 has a similar list:

SYMBOL

ACTION

+

s6

$

acc

any

error

In state 2, we can replace the error entries by r2, so reduction by production 2 will occur on any input but *. Thus the list for state 2 is

SYMBOL

ACTION

*

s7

any

r2

State 3 has only error and r4 entries. We can replace the former by the latter, so the list for state 3 consists of only the pair (any, r4). States 5, 10, and 11 can be treated similarly. The list for state 8 is

SYMBOL

ACTION

+

s6

)

s11

any

error

and for state 9

SYMBOL

ACTION

*

S7

any

R1

We can also encode the GOTO table by a list, but here it app ears more efficient to make a list of pairs for each nonterminal A. Each pair on the list for A is of the form (currentState, nextState), indicating

GOTO [currentState, A] = nextState

This technique is useful because there tend to be rather few states in any one column of the GOTO table. The reason is that the GOTO on nonterminal A can only be a state derivable from a set of items in which some items have A immediately to the left of a dot. No set has items with X and Y immediately to the left of a dot if X ≠ Y. Thus, each state app ears in at most one GOTO column.

For more space reduction, we note that the error entries in the goto table are never consulted. We can therefore replace each error entry by the most common non-error entry in its column. This entry becomes the default; it is represented in the list for each column by one pair with any in place of currentState.

Example 4.66: Consider Fig. 4.37 again. The column for F has entry 10 for state 7, and all other entries are either 3 or error. We may replace error by 3 and create for column F the list

CURRENTSTATE

NEXTSTATE

7

10

any

3

Similarly, a suitable list for column T is

CURRENTSTATE

NEXTSTATE

6

9

any

2

For column E we may choose either 1 or 8 to be the default; two entries are necessary in either case. For example, we might create for column E the list

CURRENTSTATE

NEXTSTATE

4

8

any

1

This space savings in these small examples may be misleading, because the total number of entries in the lists created in this example and the previous one together with the pointers from states to action lists and from nonterminals to next-state lists, result in unimpressive space savings over the matrix implementation of Fig. 4.37. For practical grammars, the space needed for the list representation is typically less than ten percent of that needed for the matrix representation. The table-compression methods for finite automata that were discussed in Section 3.9.8 can also be used to represent LR parsing tables.

4.7.6 Compaction of LR Parsing Tables的更多相关文章

  1. 4.7.3 Canonical LR(1) Parsing Tables

    4.7.3 Canonical LR(1) Parsing Tables We now give the rules for constructing the LR(1) ACTION and GOT ...

  2. 4.7.5 Efficient Construction of LALR Parsing Tables

    4.7.5 Efficient Construction of LALR Parsing Tables There are several modifications we can make to A ...

  3. 4.7.4 Constructing LALR Parsing Tables

    4.7.4 Constructing LALR Parsing Tables We now introduce our last parser construction method, the LAL ...

  4. 4.4 Top-Down Parsing

    4.4 Top-Down Parsing Top-down parsing can be viewed as the problem of constructing a parse tree for ...

  5. (转)Understanding C parsers generated by GNU Bison

    原文链接:https://www.cs.uic.edu/~spopuri/cparser.html Satya Kiran PopuriGraduate StudentUniversity of Il ...

  6. 基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#)

    基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#) 为了完美解析GLSL源码,获取其中的信息(都有哪些in/out/uniform等),我决定做个GLSL编译器的前端(以后简称编译器或 ...

  7. Donald Ervin Knuth:最年轻的图灵奖高德纳

    高德纳(Donald Ervin Knuth,1938年),美国著名计算机科学家,斯坦福大学电脑系荣誉教授.高德纳教授被誉为现代计算机科学的鼻祖,在计算机科学及数学领域发表了多部 具广泛影响的论文和著 ...

  8. 4.8 Using Ambiguous Grammars

    4.8 Using Ambiguous Grammars It is a fact that every ambiguous grammar fails to be LR and thus is no ...

  9. Lexer and parser generators (ocamllex, ocamlyacc)

    Chapter 12 Lexer and parser generators (ocamllex, ocamlyacc) This chapter describes two program gene ...

随机推荐

  1. 语法,if,while循环,for循环

    目录 一.语法 二.while循环 三.for循环 一.语法 if: if判断其实是在模拟人做判断.就是说如果这样干什么,如果那样干什么.对于ATM系统而言,则需要判断你的账号密码的正确性. if 条 ...

  2. 88-On Balance Volume 能量潮指标.(2015.7.4)

    On Balance Volume 能量潮指标 ~计算方法: 如果当天的收盘价高于昨天的话,那么:OBV(i) = OBV(i-1)+VOLUME(i) 如果当天的收盘价低于昨天的话,那么:OBV(i ...

  3. LeetCode(60) Permutation Sequence

    题目 The set [1,2,3,-,n] contains a total of n! unique permutations. By listing and labeling all of th ...

  4. Git和SVN共存的方法

    刚工作的时候都是用的cvs和svn,对git不熟悉,随着工作的需要,打分支和版本管理的需要,熟悉起来了git,这一用不可收拾,比svn远远好用,尤其是版本分支管理上,切换分支的方便性,现在这家公司还是 ...

  5. <转> 二分图多重匹配问题

    在二分图最大匹配中,每个点(不管是X方点还是Y方点)最多只能和一条匹配边相关联,然而,我们经常遇到这种问题,即二分图匹配中一个点可以和多条匹配边相关联,但有上限,或者说,Li表示点i最多可以和多少条匹 ...

  6. centOS 7 换ssh端口

    centos 最少安装时缺少semanage的,要这样装补上,因为默认是安装了SElinux的 [root@localhost ~]# sestatus SELinux status: enabled ...

  7. HDU 4465 递推与double的精确性

    题目大意不多说了 这里用dp[i][0] 代表取完第一个盒子后第二个盒子剩 i 个的概率,对应期望就是dp[i][0] *i dp[i][1] 就代表取完第二个盒子后第一个盒子剩 i 个的概率 dp[ ...

  8. POJ 1741 Tree【Tree,点分治】

    给一棵边带权树,问两点之间的距离小于等于K的点对有多少个. 模板题:就是不断找树的重心,然后分开了,分治,至少分成两半,就是上限为log 然后一起统计就ok了 #include<iostream ...

  9. 听dalao讲课 7.26

    XFZ今天讲了些关于多项式求ln和多项式求导以及多项式求积分的东西 作为一个连导数和积分根本就不会的蒟蒻,就像在听天书,所以不得不补点前置知识 1.积分 积分是微积分学与数学分析里的一个核心概念.通常 ...

  10. Jquery EasyUI动态生成Tab

    function addTab(title, url) { if ($('#tt').tabs('exists', title)) { $('#tt').tabs('select', title); ...