4.7.4 Constructing LALR Parsing Tables
4.7.4 Constructing LALR Parsing Tables
We now introduce our last parser construction method, the LALR (lookahead-LR) technique. This method is often used in practice, because the tables obtained by it are considerably smaller than the canonical LR tables, yet most common syntactic constructs of programming languages can be expressed conveniently by an LALR grammar. The same is almost true for SLR grammars, but there are a few constructs that cannot be conveniently handled by SLR techniques (see Example 4.48, for example).
For a comparison of parser size, the SLR and LALR tables for a grammar always have the same number of states, and this number is typically several hundred states for a language like C. The canonical LR table would typically have several thousand states for the same-size language. Thus, it is much easier and more economical to construct SLR and LALR tables than the canonical LR tables.
By way of introduction, let us again consider grammar (4.55), whose sets of LR(1) items were shown in Fig. 4.41. Take a pair of similar looking states, such as I4 and I7. Each of these states has only items with first component C→d@. In I4, the lookaheads are c or d; in I7, $ is the only lookahead.
To see the difference between the roles of I4 and I7 in the parser, note that the grammar generates the regular language c*dc*d. When reading an input cc···cdcc···cd, the parser shifts the first group of c's and their following d onto the stack, entering state 4 after reading the d. The parser then calls for a reduction by C→d, provided the next input symbol is c or d. The requirement that c or d follow makes sense, since these are the symbols that could begin strings in c*d. If $ follows the first d, we have an input like ccd, which is not in the language, and state 4 correctly declares an error if $ is the next input.
The parser enters state 7 after reading the second d. Then, the parser must see $ on the input, or it started with a string not of the form c *dc*d. It thus makes sense that state 7 should reduce by C→d on input $ and declare error on inputs c or d.
Let us now replace I4 and I7 by I47, the union of I4 and I7, consisting of the set of three items represented by [C→d@, c/d/$]. The goto’s on d to I4 or I7 from I0, I2, I3, and I6 now enter I47. The action of state 47 is to reduce on any input. The revised parser behaves essentially like the original, although it might reduce d to C in circumstances where the original would declare error, for example, on input like ccd or cdcdc. The error will eventually be caught; in fact, it will be caught before any more input symbols are shifted.
More generally, we can look for sets of LR(1) items having the same core, that is, set of first components, and we may merge these sets with common cores into one set of items. For example, in Fig. 4.41, I 4 and I 7 form such a pair, with core {C→d@}. Similarly, I3 and I6 form another pair, with core {C→c@C, C→@cC , C→@d}. There is one more pair, I8 and I9, with common core {C→cC@}. Note that, in general, a core is a set of LR(0) items for the grammar at hand, and that an LR(1) grammar may produce more than two sets of items with the same core.
Since the core of GOTO(I, X) depends only on the core of I, the goto’s of merged sets can themselves be merged. Thus, there is no problem revising the goto function as we merge sets of items. The action functions are modified to reflect the non-error actions of all sets of items in the merger.
Suppose we have an LR(1) grammar, that is, one whose sets of LR(1) items produce no parsing-action conflicts. If we replace all states having the same core with their union, it is possible that the resulting union will have a conflict, but it is unlikely for the following reason: Suppose in the union there is a conflict on lookahead a because there is an item [A→α@, a] calling for a reduction by A→α, and there is another item [B→β@aγ, b] calling for a shift. Then some set of items from which the union was formed has item [A→α@, a], and since the cores of all these states are the same, it must have an item [B→β@aγ, c] for some c. But then this state has the same shift/reduce conflict on a, and the grammar was not LR(1) as we assumed. Thus, the merging of states with common cores can never produce a shift/reduce conflict that was not present in one of the original states, because shift actions depend only on the core, not the lookahead.
It is possible, however, that a merger will produce a reduce/reduce conflict, as the following example shows.
Example 4.58: Consider the grammar
S’→S
S→a A d | b B d | a B e | b A e
A→c
B→c
which generates the four strings acd, ace, bcd, and bce. The reader can check that the grammar is LR(1) by constructing the sets of items. Up on doing so, we find the set of items {[A→c@, d]; [B→c@, e]} valid for viable prefix ac and {[A→c@, e]; [B→c@, d]} valid for bc. Neither of these sets has a conflict, and their cores are the same. However, their union, which is
A→c@, d/e
B→c@, d/e
generates a reduce/reduce conflict, since reductions by both A→c and B→c are called for on inputs d and e. □
We are now prepared to give the first of two LALR table-construction algorithms. The general idea is to construct the sets of LR(1) items, and if no conflicts arise, merge sets with common cores. We then construct the parsing table from the collection of merged sets of items. The method we are about to describe serves primarily as a definition of LALR(1) grammars. Constructing the entire collection of LR(1) sets of items requires too much space and time to be useful in practice.
Algorithm 4.59: An easy, but space-consuming LALR table construction.
INPUT: An augmented grammar G’.
OUTPUT: The LALR parsing-table functions ACTION and GOTO for G’.
METHOD:
1. Construct C = {I0, I1, …, In}, the collection of sets of LR(1) items.
2. For each core present among the set of LR(1) items, find all sets having that core, and replace these sets by their union.
3. Let C’ = {J0, J1, …, Jm} be the resulting sets of LR(1) items. The parsing actions for state i are constructed from Ji in the same manner as in Algorithm 4.56. If there is a parsing action conflict, the algorithm fails to produce a parser, and the grammar is said not to be LALR(1).
4. The GOTO table is constructed as follows. If J is the union of one or more sets of LR(1) items, that is, J = I1∪I2∪…∪Ik, then the cores of GOTO(I1, X), GOTO (I2, X), …, GOTO(Ik, X) are the same, since I1, I2, …, Ik all have the same core. Let K be the union of all sets of items having the same core as GOTO(I1, X). Then GOTO(J, X) = K.
□
The table produced by Algorithm 4.59 is called the LALR parsing table for G. If there are no parsing action conflicts, then the given grammar is said to be an LALR(1) grammar. The collection of sets of items constructed in step (3) is called the LALR(1) col lection.
Example 4.60: Again consider grammar (4.55) whose GOTO graph was shown in Fig. 4.41. As we mentioned, there are three pairs of sets of items that can be merged. I3 and I6 are replaced by their union:
|
I36 : |
C→c@C, c/d/$ C→@cC, c/d/$ C→@d, c/d/$ |
I4 and I7 are replaced by their union:
|
I47 : |
C→d@, c/d/$ |
and I8 and I9 are replaced by their union:
|
I89 : |
C→cC@, c/d/$ |
The LALR action and goto functions for the condensed sets of items are shown in Fig. 4.43.
|
STATE |
ACTION |
GOTO |
|||
|
c |
d |
$ |
S |
C |
|
|
0 |
s36 |
s47 |
1 |
2 |
|
|
1 |
acc |
||||
|
2 |
s36 |
s47 |
5 |
||
|
36 |
s36 |
s47 |
89 |
||
|
47 |
r3 |
r3 |
r3 |
||
|
5 |
r1 |
||||
|
89 |
r2 |
r2 |
r2 |
||
Figure 4.43: LALR parsing table for the grammar of Example 4.54
To see how the GOTO's are computed, consider GOTO (I36, C). In the original set of LR(1) items, GOTO(I3, C) = I8 , and I8 is now part of I89 , so we make GOTO(I36, C) be I89. We could have arrived at the same conclusion if we considered I6, the other part of I36. That is, GOTO(I6, C) = I9 , and I9 is now part of I89 . For another example, consider GOTO(I2, c), an entry that is exercised after the shift action of I2 on input c. In the original sets of LR(1) items, GOTO(I2, c) = I6. Since I6 is now part of I36 , GOTO(I2, c) becomes I36. Thus, the entry in Fig. 4.43 for state 2 and input c is made s36, meaning shift and push state 36 onto the stack.
□
When presented with a string from the language c*dc*d, both the LR parser of Fig. 4.42 and the LALR parser of Fig. 4.43 make exactly the same sequence of shifts and reductions, although the names of the states on the stack may differ. For instance, if the LR parser puts I3 or I6 on the stack, the LALR parser will put I36 on the stack. This relationship holds in general for an LALR grammar. The LR and LALR parsers will mimic one another on correct inputs.
When presented with erroneous input, the LALR parser may proceed to do some reductions after the LR parser has declared an error. However, the LALR parser will never shift another symbol after the LR parser declares an error.
For example, on input ccd followed by $, the LR parser of Fig. 4.42 will put
0 3 3 4
on the stack, and in state 4 will discover an error, because $ is the next input symbol and state 4 has action error on $. In contrast, the LALR parser of Fig.4.43 will make the corresponding moves, putting
0 36 36 47
on the stack. But state 47 on input $ has action reduce C→d. The LALR parser will thus change its stack to
0 36 36 89
Now the action of state 89 on input $ is reduce C→cC. The stack becomes
0 36 89
whereupon a similar reduction is called for, obtaining stack
0 2
Finally, state 2 has action error on input $, so the error is now discovered.
4.7.4 Constructing LALR Parsing Tables的更多相关文章
- 4.7.5 Efficient Construction of LALR Parsing Tables
4.7.5 Efficient Construction of LALR Parsing Tables There are several modifications we can make to A ...
- 4.7.6 Compaction of LR Parsing Tables
4.7.6 Compaction of LR Parsing Tables A typical programming language grammar with 50 to 100 terminal ...
- 4.7.3 Canonical LR(1) Parsing Tables
4.7.3 Canonical LR(1) Parsing Tables We now give the rules for constructing the LR(1) ACTION and GOT ...
- 4.4 Top-Down Parsing
4.4 Top-Down Parsing Top-down parsing can be viewed as the problem of constructing a parse tree for ...
- 基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#)
基于虎书实现LALR(1)分析并生成GLSL编译器前端代码(C#) 为了完美解析GLSL源码,获取其中的信息(都有哪些in/out/uniform等),我决定做个GLSL编译器的前端(以后简称编译器或 ...
- (转)Understanding C parsers generated by GNU Bison
原文链接:https://www.cs.uic.edu/~spopuri/cparser.html Satya Kiran PopuriGraduate StudentUniversity of Il ...
- 4.8 Using Ambiguous Grammars
4.8 Using Ambiguous Grammars It is a fact that every ambiguous grammar fails to be LR and thus is no ...
- Lexer and parser generators (ocamllex, ocamlyacc)
Chapter 12 Lexer and parser generators (ocamllex, ocamlyacc) This chapter describes two program gene ...
- 4.9 Parser Generators
4.9 Parser Generators This section shows how a parser generator can be used to facilitate the constr ...
随机推荐
- 树莓派 - 通过sysfs操控GPIO
点亮或熄灭LED 硬件上,一个LED灯接在pi的Pin-25. 该引脚为BCM的GPIO26 $ gpio readall +-----+-----+---------+------+---+--- ...
- python视频 神经网络 Tensorflow
python视频 神经网络 Tensorflow 模块 视频教程 (带源码) 所属网站分类: 资源下载 > python视频教程 作者:smile 链接:http://www.pythonhei ...
- Linux 搭建 squid 代理服务器 三种模式
CentOS 6.7 squid 代理服务器 一般有两张或以上网卡,一张链接公网,访问外网资源,一张位于局域网. 代理服务器可以提供文件缓存.复制和地址过滤等服务,充分利用有限的出口带宽,加快内部主机 ...
- 牛客网补题 New Game!(原Wannafly summer camp day2原题)
思路:这个题在秦皇岛的时候好像没有写出来,反正我是没有写出来,题解是听懂了:把直线和圆都看做一个结点,圆和直线用点到直线的距离与半径差求出来,圆和圆之间用点和点之间的距离和半径差表示,最后最短路跑一遍 ...
- **PHP分步表单提交思路(分页表单提交)
Q: 我用php做了3张表单 分布在3个页面 想在最后一页 再插入数据库 并且:在插入数据库之前 3个页面 后退 前进 表单的内容会被保留 以便随时更改能实现吗?想过session 感觉内容太多 给服 ...
- [bzoj1578][Usaco2009 Feb]Stock Market 股票市场_完全背包dp
Stock Market 股票市场 bzoj-1578 Usaco-2009 Feb 题目大意:给定一个$S\times D$的大矩阵$T$,其中$T[i][j]$表示第i支股票第j天的价格.给定初始 ...
- Ubuntu 16.04安装WinRAR/7-Zip(基于CrossOver)
基于CrossOver的WinRAR/7-Zip有如下缺点: 1.不能像Windows那样右键菜单解压 可以解决的问题: 1.可以使用提供的浏览工具进行文件选择再解压,只是在操作上多一步. 2.类似百 ...
- intellij使用tomcat搭建servlet运行时环境
http://suiyu.online/2017/08/01/intellij%E4%BD%BF%E7%94%A8tomcat%E6%90%AD%E5%BB%BAservlet%E8%BF%90%E8 ...
- list.ensureCapacity竟然会变慢
list.ensureCapacity竟然会变慢 jdk1.8 应该是做了优化了: public class Test10 { public static void main(String[] arg ...
- 免费好用的Microsoft iSCSI Software Target 3.3
我们在搭建Windows群集的时候往往会使用到IP-SAN.但是面对昂贵的硬件IP-SAN,我们在学习和实验的环境中更加多的用到的往往是软件模拟的IP-SAN.今天就给大家介绍一个微软出品的免费iSC ...