拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。

可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。

https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。

1234+567+89+0+0的语法树生成过程☟

(1234+567)/(89-0)的语法树生成过程☟

可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)

词法分析器的生成器

  • 分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)

  • 支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>,便于识别id = refIdstruct type_name<Comment>[^*\n]*subroutine(type_name_list)这类情况。类似lex,但不完全相同。

  • 无须显式书写Token类型、状态信号、保留字,即无须lex中的%s NUM ID ..%x Comment End Text ..[(] { return ('('); }

  • 注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。

  • 生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。

语法分析器的生成器

  • 分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。

  • 支持优先级指令%nonassoc%left%right%prec,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。

  • 注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。

  • 生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。

  • 生成nullable、FIRST集、FOLLOW集的文档。

其他

  • 无须lex和yacc中的%option%union%define%{%}%parse-param%lex-param%pure-parser%expect%token%type。做成类库,直接按如下方式调用即可:
var compiler = new CompilerXxx();
var sourceCode = File.ReadAllText("input.st");
var tokens = compiler.Analyze(sourceCode);
var syntaxTree = compiler.Parse(tokens);
var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);
// use extractedObj for user-defined business ..
  • 支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**///,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
%%#[^\r\n]*%% 'inlineComment'
  • 生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。

  • 大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。

点击查看 其他功能
  • 支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars,默认范围均为[\u0001-\uFFFF](即除'\0'外的全部Unicode字符),可自定义其范围。

  • 支持%omit指令,可指定要忽略的空白符。默认为'空格''\t''\n''\r''\0'

  • 支持%start指定起始语法结点。

举例-Calc.st

输入文件Calc.st

能够处理加减乘除和括号运算的解析器,其文法如下:

// 输入文件Calc.st
Exp : Exp '+' Term
| Exp '-' Term
| Term ;
Term : Term '*' Factor
| Term '/' Factor
| Factor ;
Factor : '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number' // 示例只处理非负整数
//无须书写 %%[+]%% '+' 等

据此文法,我们可以生成下述内容:

生成的词法分析器

生成的ε-NFA的词法分析器的状态图如下:

生成的miniDFA的词法分析器的状态图如下:

点击查看 生成的 终结点Vt和非终结点Vn 代码
// 如不需要,可删除此数组
public static readonly IReadOnlyList<string> stArray = new string[] {
"'¥'", // @终 = 0;
"'+'", // @Plus符 = 1; // '+'
"'-'", // @Dash符 = 2; // '-'
"'*'", // @Asterisk符 = 3; // '*'
"'/'", // @Slash符 = 4; // '/'
"'('", // @LeftParenthesis符 = 5; // '('
"')'", // @RightParenthesis符 = 6; // ')'
"'number'", // @number = 7; // 'number'
// end of 1 + 7 Vt
"Exp", // Exp枝 = 8; // Exp
"Term", // Term枝 = 9; // Term
"Factor", // Factor枝 = 10; // Factor
// end of 3 Vn
};
/// <summary>
/// Vt types are used both for lexical-analyze and syntax-parse.
/// <para>Vn types are only for syntax-parse.</para>
/// <para>Vt is quoted in ''.</para>
/// <para>Vn is not quoted in ''.</para>
/// </summary>
public static class st {
// Vt
/// <summary>
/// Something wrong within the source code.
/// </summary>
public const int Error错 = -1; // "'×'"; /// <summary>
/// end of token list.
/// </summary>
public const int @终 = 0; // "'¥'"; /// <summary>
/// '+'
/// </summary>
public const int @Plus符 = 1; // "'+'"
/// <summary>
/// '-'
/// </summary>
public const int @Dash符 = 2; // "'-'"
/// <summary>
/// '*'
/// </summary>
public const int @Asterisk符 = 3; // "'*'"
/// <summary>
/// '/'
/// </summary>
public const int @Slash符 = 4; // "'/'"
/// <summary>
/// '('
/// </summary>
public const int @LeftParenthesis符 = 5; // "'('"
/// <summary>
/// ')'
/// </summary>
public const int @RightParenthesis符 = 6; // "')'"
/// <summary>
/// 'number'
/// </summary>
public const int @number = 7; // "'number'"
/// <summary>
/// count of ('¥' + user-defined Vt)
/// </summary>
public const int VtCount = 8; // Vn
/// <summary>
/// Exp
/// </summary>
public const int Exp枝 = 8; // "Exp"
/// <summary>
/// Term
/// </summary>
public const int Term枝 = 9; // "Term"
/// <summary>
/// Factor
/// </summary>
public const int Factor枝 = 10; // "Factor"
}
点击查看 生成的 保留字(即一个语言中的keyword) 相关代码
public static class reservedWord {
/// <summary>
/// +
/// </summary>
public const string @Plus符 = "+";
/// <summary>
/// -
/// </summary>
public const string @Dash符 = "-";
/// <summary>
/// *
/// </summary>
public const string @Asterisk符 = "*";
/// <summary>
/// /
/// </summary>
public const string @Slash符 = "/";
/// <summary>
/// (
/// </summary>
public const string @LeftParenthesis符 = "(";
/// <summary>
/// )
/// </summary>
public const string @RightParenthesis符 = ")";
} /// <summary>
/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.
/// <para>otherwise, return false.</para>
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
private static bool CheckReservedWord(AnalyzingToken token) {
bool isReservedWord = true;
switch (token.value) {
case reservedWord.@Plus符: token.type = st.@Plus符; break;
case reservedWord.@Dash符: token.type = st.@Dash符; break;
case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;
case reservedWord.@Slash符: token.type = st.@Slash符; break;
case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;
case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break; default: isReservedWord = false; break;
} return isReservedWord;
}

以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表

点击查看 生成的 lexi状态0 相关代码
// lexicalState0
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* \) */
else if (/* possible Vt : ')' */
c == ')'/*'\u0029'(41)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState2;
}
/* \( */
else if (/* possible Vt : '(' */
c == '('/*'\u0028'(40)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState3;
}
/* \/ */
else if (/* possible Vt : '/' */
c == '/'/*'\u002F'(47)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState4;
}
/* \* */
else if (/* possible Vt : '*' */
c == '*'/*'\u002A'(42)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState5;
}
/* - */
else if (/* possible Vt : '-' */
c == '-'/*'\u002D'(45)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState6;
}
/* \+ */
else if (/* possible Vt : '+' */
c == '+'/*'\u002B'(43)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState7;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
wrapper.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
// lexicalState1
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* deal with everything else. */
else {
AcceptToken(st.@number, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
// lexicalState2
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@RightParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
// lexicalState3
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@LeftParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
// lexicalState4
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Slash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态5 相关代码
// lexicalState5
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Asterisk符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态6 相关代码
// lexicalState6
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Dash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态7 相关代码
// lexicalState7
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Plus符, context);
wrapper.currentState = lexicalState0;
}
};

其调用过程如下:

// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var wrapper = new CurrentStateWrapper(this.initialState);
do {
char currentChar = context.MoveForward();
wrapper.currentState(context, currentChar, wrapper);
// wrapper.currentState will be set to next lexi-state.
} while (!context.EOF); return context.result;
}

下面是用一个二维数组ElseIf[][]实现的词法状态转换表,其占用空间较少,且执行效率也有所提高。

private static readonly ElseIf[][] lexiStates = new ElseIf[8][];
static void InitializeLexiTable() {
lexiStates[0] = new ElseIf[] {
// possible Vt: '('
/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: ')'
/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),
// possible Vt: '*'
/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),
// possible Vt: '+'
/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),
// possible Vt: '-'
/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),
// possible Vt: '/'
/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),
// possible Vt: 'number'
/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'number'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),
// possible Vt: 'number'
/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),
};
lexiStates[2] = new ElseIf[] {
// possible Vt: ')'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: '('
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: '/'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),
};
lexiStates[5] = new ElseIf[] {
// possible Vt: '*'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),
};
lexiStates[6] = new ElseIf[] {
// possible Vt: '-'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),
};
lexiStates[7] = new ElseIf[] {
// possible Vt: '+'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),
};
}

顾名思义,一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样,就用一个ElseIf[]取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf

点击查看 调用二维数组实现的词法分析器 相关代码
// skip '\0' at lexi-state 0
private static readonly ElseIf skipZero = new(
char.MinValue, char.MaxValue, Acts.None,
nextState: 0);
// construct a error token at lexi-state 0
private static readonly ElseIf unexpectedChar = new(
char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'");
// construct a error token at other lexi-states
private static readonly ElseIf errorToken = new(
char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'"); // analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var currentStateId = 0;
do {
// read current char,
char currentChar = context.MoveForward();
ElseIf[] lexiState = lexiStates[currentStateId];
// binary search for the segment( else if (left <= c && c <= right) { ... } )
var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);
if (segment is null) {
if (currentStateId == 0) { // the initial state of lexical analyze.
segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);
if (segment is null) {
// '\0' must be skipped.
if (currentChar == '\0') { segment = skipZero; }
else { segment = unexpectedChar; }
}
}
else { // token with error type
segment = errorToken;
}
}
// construct the next token,
var scripts = segment.scripts;
if (scripts != 0) { // it is 0 in most cases.
if ((scripts & Acts.Begin) != 0) {
this.beginToken(context);
}
if ((scripts & Acts.Extend) != 0) {
this.extendToken(context);
} if ((scripts & Acts.Accept) != 0) {
this.acceptToken(context, segment.Vt);
}
else if ((scripts & Acts.Accept2) != 0) {
this.acceptToken2(context, segment.ifVts);
}
}
// point to next state.
currentStateId = segment.nextStateId;
} while (!context.EOF); return context.result;
} private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {
var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);
if (right < 0) { }
else {
var result = -1;
while (left < right) {
var mid = (left + right) / 2;
var current = lexiState[mid];
if (currentChar < current.min) { result = -1; }
else if (current.max < currentChar) { result = 1; }
else { return current; } if (result < 0) { right = mid; }
else { left = mid + 1; }
}
{
var current = lexiState[left];
if (current.min <= currentChar && currentChar <= current.max) {
return current;
}
}
}
if (specialLast) {
var last = lexiState[lexiState.Length - 1];
return last;
// no need to compare, because it's ['\0', '\uFFFF']
//if (last.min <= currentChar && currentChar <= last.max) {
// return last;
//}
} return null;
}

生成的语法分析器

点击查看 nullable、FIRST集、FOLLOW集
nullable:
[0]: nullable( Exp' ) = False
[1]: nullable( Exp ) = False
[2]: nullable( Term ) = False
[3]: nullable( Factor ) = False
[4]: nullable( '¥' ) = False
[5]: nullable( '+' ) = False
[6]: nullable( '-' ) = False
[7]: nullable( '*' ) = False
[8]: nullable( '/' ) = False
[9]: nullable( '(' ) = False
[10]: nullable( ')' ) = False
[11]: nullable( 'number' ) = False FIRST集:
[0]: FIRST( Exp' ) = { '(' 'number' }
[1]: FIRST( Exp ) = { '(' 'number' }
[2]: FIRST( Term ) = { '(' 'number' }
[3]: FIRST( Factor ) = { '(' 'number' }
[4]: FIRST( '¥' ) = { '¥' }
[5]: FIRST( '+' ) = { '+' }
[6]: FIRST( '-' ) = { '-' }
[7]: FIRST( '*' ) = { '*' }
[8]: FIRST( '/' ) = { '/' }
[9]: FIRST( '(' ) = { '(' }
[10]: FIRST( ')' ) = { ')' }
[11]: FIRST( 'number' ) = { 'number' }
[12]: FIRST( Exp '+' Term ) = { '(' 'number' }
[13]: FIRST( Exp '-' Term ) = { '(' 'number' }
[14]: FIRST( Term '*' Factor ) = { '(' 'number' }
[15]: FIRST( Term '/' Factor ) = { '(' 'number' }
[16]: FIRST( '(' Exp ')' ) = { '(' } FOLLOW集:
[0]: FOLLOW( Exp' ) = { '¥' }
[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }
[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }
[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }
点击查看 生成的 规约规则 代码
public static IReadOnlyList<Regulation> Regulations => regulations;
private static readonly Regulation[] regulations = new Regulation[] {
// [0]=Exp : Exp '+' Term ;
new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝),
// [1]=Exp : Exp '-' Term ;
new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝),
// [2]=Exp : Term ;
new(2, st.Exp枝, st.Term枝),
// [3]=Term : Term '*' Factor ;
new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝),
// [4]=Term : Term '/' Factor ;
new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝),
// [5]=Term : Factor ;
new(5, st.Term枝, st.Factor枝),
// [6]=Factor : '(' Exp ')' ;
new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符),
// [7]=Factor : 'number' ;
new(7, st.Factor枝, st.@number),
};
点击查看 生成的 LALR(1)语法分析表 代码
const int syntaxStateCount = 16;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 5);
states[1] = new(capacity: 3);
states[2] = new(capacity: 6);
states[3] = new(capacity: 6);
states[4] = new(capacity: 5);
states[5] = new(capacity: 6);
states[6] = new(capacity: 4);
states[7] = new(capacity: 4);
states[8] = new(capacity: 3);
states[9] = new(capacity: 3);
states[10] = new(capacity: 3);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
states[14] = new(capacity: 6);
states[15] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times
LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times
LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times
LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times
LRParseAction aReduce2 = new(regulations[2]);// refered 4 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce7 = new(regulations[7]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 4 times
LRParseAction aReduce1 = new(regulations[1]);// refered 4 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
LRParseAction aReduce6 = new(regulations[6]);// refered 6 times
#endregion re-used actions // 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Term ; '-' '+' '¥'
// [1] Exp : Exp '-' Term ; '-' '+' '¥'
// [2] Exp : Term ; '-' '+' '¥'
// [3] Term : Term '*' Factor ; '-' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' '*' '/' '+' '¥'
/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));
/*1*/states[0].Add(st.Term枝, aGoto2);
/*2*/states[0].Add(st.Factor枝, aGoto3);
/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);
/*4*/states[0].Add(st.@number, aShift5);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Term ; '-' '+' '¥'
// [1] Exp : Exp '-' Term ; '-' '+' '¥'
/*5*/states[1].Add(st.@Plus符, aShift6);
/*6*/states[1].Add(st.@Dash符, aShift7);
/*7*/states[1].Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [2] Exp : Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*8*/states[2].Add(st.@Asterisk符, aShift8);
/*9*/states[2].Add(st.@Slash符, aShift9);
/*10*/states[2].Add(st.@Dash符, aReduce2);
/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);
/*12*/states[2].Add(st.@Plus符, aReduce2);
/*13*/states[2].Add(st.@终, aReduce2);
// syntaxStates[3]:
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
/*14*/states[3].Add(st.@Dash符, aReduce5);
/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);
/*16*/states[3].Add(st.@Asterisk符, aReduce5);
/*17*/states[3].Add(st.@Slash符, aReduce5);
/*18*/states[3].Add(st.@Plus符, aReduce5);
/*19*/states[3].Add(st.@终, aReduce5);
// syntaxStates[4]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Term ; '-' ')' '+'
// [1] Exp : Exp '-' Term ; '-' ')' '+'
// [2] Exp : Term ; '-' ')' '+'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+'
// [5] Term : Factor ; '-' ')' '*' '/' '+'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+'
/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));
/*21*/states[4].Add(st.Term枝, aGoto2);
/*22*/states[4].Add(st.Factor枝, aGoto3);
/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);
/*24*/states[4].Add(st.@number, aShift5);
// syntaxStates[5]:
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*25*/states[5].Add(st.@Dash符, aReduce7);
/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);
/*27*/states[5].Add(st.@Asterisk符, aReduce7);
/*28*/states[5].Add(st.@Slash符, aReduce7);
/*29*/states[5].Add(st.@Plus符, aReduce7);
/*30*/states[5].Add(st.@终, aReduce7);
// syntaxStates[6]:
// [0] Exp : Exp '+' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));
/*32*/states[6].Add(st.Factor枝, aGoto3);
/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);
/*34*/states[6].Add(st.@number, aShift5);
// syntaxStates[7]:
// [1] Exp : Exp '-' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));
/*36*/states[7].Add(st.Factor枝, aGoto3);
/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);
/*38*/states[7].Add(st.@number, aShift5);
// syntaxStates[8]:
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));
/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);
/*41*/states[8].Add(st.@number, aShift5);
// syntaxStates[9]:
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));
/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);
/*44*/states[9].Add(st.@number, aShift5);
// syntaxStates[10]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Term ; '-' ')' '+'
// [1] Exp : Exp '-' Term ; '-' ')' '+'
/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));
/*46*/states[10].Add(st.@Plus符, aShift6);
/*47*/states[10].Add(st.@Dash符, aShift7);
// syntaxStates[11]:
// [0] Exp : Exp '+' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*48*/states[11].Add(st.@Asterisk符, aShift8);
/*49*/states[11].Add(st.@Slash符, aShift9);
/*50*/states[11].Add(st.@Dash符, aReduce0);
/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);
/*52*/states[11].Add(st.@Plus符, aReduce0);
/*53*/states[11].Add(st.@终, aReduce0);
// syntaxStates[12]:
// [1] Exp : Exp '-' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*54*/states[12].Add(st.@Asterisk符, aShift8);
/*55*/states[12].Add(st.@Slash符, aShift9);
/*56*/states[12].Add(st.@Dash符, aReduce1);
/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);
/*58*/states[12].Add(st.@Plus符, aReduce1);
/*59*/states[12].Add(st.@终, aReduce1);
// syntaxStates[13]:
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
/*60*/states[13].Add(st.@Dash符, aReduce3);
/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);
/*62*/states[13].Add(st.@Asterisk符, aReduce3);
/*63*/states[13].Add(st.@Slash符, aReduce3);
/*64*/states[13].Add(st.@Plus符, aReduce3);
/*65*/states[13].Add(st.@终, aReduce3);
// syntaxStates[14]:
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*66*/states[14].Add(st.@Dash符, aReduce4);
/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);
/*68*/states[14].Add(st.@Asterisk符, aReduce4);
/*69*/states[14].Add(st.@Slash符, aReduce4);
/*70*/states[14].Add(st.@Plus符, aReduce4);
/*71*/states[14].Add(st.@终, aReduce4);
// syntaxStates[15]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
/*72*/states[15].Add(st.@Dash符, aReduce6);
/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);
/*74*/states[15].Add(st.@Asterisk符, aReduce6);
/*75*/states[15].Add(st.@Slash符, aReduce6);
/*76*/states[15].Add(st.@Plus符, aReduce6);
/*77*/states[15].Add(st.@终, aReduce6);
#endregion init actions of syntax states
}

生成的LALR(1)语法分析器的状态图和状态表如下:

由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。

状态 '+' '-' '*' '/' '(' ')' 'number' '¥' Exp Term Factor
0 S4 S5 G1 G2 G3
1 S6 S7
2 R[2] R[2] S8 S9 R[2] R[2]
3 R[5] R[5] R[5] R[5] R[5] R[5]
4 S4 S5 G10 G2 G3
5 R[7] R[7] R[7] R[7] R[7] R[7]
6 S4 S5 G11 G3
7 S4 S5 G12 G3
8 S4 S5 G13
9 S4 S5 G14
10 S6 S7 S15
11 R[0] R[0] S8 S9 R[0] R[0]
12 R[1] R[1] S8 S9 R[1] R[1]
13 R[3] R[3] R[3] R[3] R[3] R[3]
14 R[4] R[4] R[4] R[4] R[4] R[4]
15 R[6] R[6] R[6] R[6] R[6] R[6]

上表中:

  • S6表示Shift并进入状态6

  • R[2]表示用regulations[2]=Exp : Term ;规约

  • G1表示进入状态1

  • 表示Accept,即分析完毕,语法树生成成功;

  • 空白的地方表示遇到语法错误

点击查看 调用语法分析表 代码
public SyntaxTree Parse(TokenList tokenList) {
var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);
var accept = false;
do {
Token token = context.CurrentToken;
int nodeType = token.type;// auto-convert from string to string.
while (nodeType == blockComment || nodeType == inlineComment) {
// skip comment token
context.cursor++;
token = context.CurrentToken;
nodeType = token.type;// auto-convert from string to string.
} Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context);
accept = parseAction.kind == LRParseAction.Kind.Accept;
}
else { // syntax error happened.
return new SyntaxTree(context);
}
} while (!accept); var root = context.root;
Debug.Assert(root != null);
return new SyntaxTree(root);
} public class LRParseAction {
public enum Kind {
Error,
Shift,
Reduce,
Goto,
Accept,
} public readonly Kind kind;
[StructLayout(LayoutKind.Explicit)]
struct Union {
[FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;
[FieldOffset(0)] public Regulation regulation;
}
private Union union; // 执行分析动作。
public void Execute(LRSyntaxContext context) {
switch (this.kind) {
case Kind.Error: { throw new NotImplementedException(); }
//break;
case Kind.Shift: {
var token = context.CurrentToken;
var leaf = new LRNode(token);
context.nodeStack.Push(leaf);
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
// prepare for next loop.
context.cursor++;
}
break;
case Kind.Reduce: {
var regulation = this.union.regulation;
int count = regulation.right.Length;
var children = new LRNode[count];
var start = Token.empty; LRNode? lastNode = null;
var first = true;
for (int i = 0; i < count; i++) {
var state = context.stateStack.Pop();//只弹出,不再使用。
var node = context.nodeStack.Pop();
children[count - i - 1] = node;
if (node.start.index >= 0) { // this node includes token
if (first) { lastNode = node; first = false; }
start = node.start;
}
}
int tokenCount = 0;
if (lastNode is not null) {
tokenCount =
lastNode.start.index // comment tokens inside of parent are included.
- start.index // comment tokens before parent are excluded.
+ lastNode.tokenCount; // comment tokens after parent are excluded.
}
var parent = new LRNode(regulation, start, tokenCount, children);
for (var i = 0; i < count; i++) { children[i].parent = parent; }
context.nodeStack.Push(parent);
// goto next syntax-state
Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
var nodeType = regulation.left;
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context); // parseAction is supposed to be a Goto action
}
Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);
}
break;
case Kind.Goto: {
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
}
break;
case Kind.Accept: {
var state = context.stateStack.Pop();
context.root = context.nodeStack.Pop();
context.Finish(context.root, 40, context.stArray);
}
break;
default: { throw new NotImplementedException(); }
}
}
}

生成的语义内容提取器

举例来说,对于1234+567+89+0+0这个输入,Calc.st经过语法分析得到的语法树如下所示:(语法树的生成过程见本文开头)

R[0]=Exp : Exp '+' Term ;T[0->8]
├─R[0]=Exp : Exp '+' Term ;T[0->6]
│ ├─R[0]=Exp : Exp '+' Term ;T[0->4]
│ │ ├─R[0]=Exp : Exp '+' Term ;T[0->2]
│ │ │ ├─R[2]=Exp : Term ;T[0]
│ │ │ │ └─R[5]=Term : Factor ;T[0]
│ │ │ │ └─R[7]=Factor : 'number' ;T[0]
│ │ │ │ └─T[0]='number' 1234
│ │ │ ├─T[1]='+' +
│ │ │ └─R[5]=Term : Factor ;T[2]
│ │ │ └─R[7]=Factor : 'number' ;T[2]
│ │ │ └─T[2]='number' 567
│ │ ├─T[3]='+' +
│ │ └─R[5]=Term : Factor ;T[4]
│ │ └─R[7]=Factor : 'number' ;T[4]
│ │ └─T[4]='number' 89
│ ├─T[5]='+' +
│ └─R[5]=Term : Factor ;T[6]
│ └─R[7]=Factor : 'number' ;T[6]
│ └─T[6]='number' 0
├─T[7]='+' +
└─R[5]=Term : Factor ;T[8]
└─R[7]=Factor : 'number' ;T[8]
└─T[8]='number' 0

从左上到右下,连续4个R[0]显著增加了树的深度。

bitParser自动生成的语义内容提取器,会后序遍历此语法树,提取结点信息。

点击查看 通用的 遍历语法树 相关代码
// Extract some data structure from syntax tree.
// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>
public T? Extract(LRNode root, TokenList tokens, string sourceCode) {
var context = new TContext<T>(root, tokens, sourceCode); var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();
// init stack.
{
// push nextLeft and its next pending children.
var nextLeft = root; var index = 0;
nodeStack.Push(nextLeft); indexStack.Push(index);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
} while (nodeStack.Count > 0) {
var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;
if (index < current.children.Count) {
// push this node back again.
nodeStack.Push(current); indexStack.Push(index); // push nextLeft and its next pending children.
var nextLeft = current.children[index];
nodeStack.Push(nextLeft); indexStack.Push(0);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
}
else {
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
}
} {
var current = this.EOT;
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
} return context.result;
}
点击查看 生成的 在遍历语法树时提取结点信息 相关代码
private static readonly Dictionary<int/*LRNode.type*/,
Action<LRNode, TContext<Exp>>> @expExtractorDict = new(); private static readonly Action<LRNode, TContext<Exp>> VtHandler =
(node, context) => {
var token = node.start;
context.objStack.Push(token);
}; // initialize dict for extractor.
private static void InitializeExtractorDict() {
var extractorDict = @expExtractorDict;
extractorDict.Add(st.@Plus符, VtHandler);
extractorDict.Add(st.@Dash符, VtHandler);
extractorDict.Add(st.@Asterisk符, VtHandler);
extractorDict.Add(st.@Slash符, VtHandler);
extractorDict.Add(st.@LeftParenthesis符, VtHandler);
extractorDict.Add(st.@RightParenthesis符, VtHandler);
extractorDict.Add(st.@number, VtHandler);
extractorDict.Add(st.@终,
static (node, context) => {
// [-1]=Exp' : Exp ;
// dumped by ExternalExtractor
var @final = (VnExp?)context.objStack.Pop();
var left = new Exp(@final);
context.result = left; // final step, no need to push into stack.
}); // end of extractorDict.Add(st.@终, (node, context) => { ... });
extractorDict.Add(st.Exp枝,
static (node, context) => {
switch (node.regulation.index) {
case 0: { // [0]=Exp : Exp '+' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 1: { // [1]=Exp : Exp '-' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 2: { // [2]=Exp : Term ;
// dumped by ListExtractor 1
var r0 = (VnTerm?)context.objStack.Pop();
var left = new VnExp(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });
extractorDict.Add(st.Term枝,
static (node, context) => {
switch (node.regulation.index) {
case 3: { // [3]=Term : Term '*' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 4: { // [4]=Term : Term '/' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 5: { // [5]=Term : Factor ;
// dumped by ListExtractor 1
var r0 = (VnFactor?)context.objStack.Pop();
var left = new VnTerm(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });
extractorDict.Add(st.Factor枝,
static (node, context) => {
switch (node.regulation.index) {
case 6: { // [6]=Factor : '(' Exp ')' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var r1 = (VnExp?)context.objStack.Pop();
var r2 = (Token?)context.objStack.Pop();
var left = new VnFactor(r2, r1, r0);
context.objStack.Push(left);
}
break;
case 7: { // [7]=Factor : 'number' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var left = new VnFactor(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });
}

提取到的结点信息,展示出来如下:

VnExpT[0->8]
├─VnTermT[0]
│ └─VnFactorT[0]
│ └─T[0]='number' 1234
└─List T[1->8]
├─Item T[1->2]
│ ├─T[1]='+' +
│ └─VnTermT[2]
│ └─VnFactorT[2]
│ └─T[2]='number' 567
├─Item T[3->4]
│ ├─T[3]='+' +
│ └─VnTermT[4]
│ └─VnFactorT[4]
│ └─T[4]='number' 89
├─Item T[5->6]
│ ├─T[5]='+' +
│ └─VnTermT[6]
│ └─VnFactorT[6]
│ └─T[6]='number' 0
└─Item T[7->8]
├─T[7]='+' +
└─VnTermT[8]
└─VnFactorT[8]
└─T[8]='number' 0

类似Exp : Exp '+' Term ;的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构“压平”。根据压平了的语义信息,很容易将源代码格式化。

点击查看 VnExp结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Exp in the grammar(Exp).
/// </summary>
internal partial class VnExp : IFullFormatter {
// [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
// [2]=Exp : Term ; private readonly VnTerm first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnTerm r0; public PostItem(Token r1, VnTerm r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list; private readonly TokenRange _scope;
public TokenRange Scope => _scope; internal VnExp(VnTerm first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
} // [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
internal void Add(Token r1, VnTerm r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
} public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnTerm结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Term in the grammar(Exp).
/// </summary>
internal partial class VnTerm : IFullFormatter {
// [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
// [5]=Term : Factor ; private readonly VnFactor first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnFactor r0; public PostItem(Token r1, VnFactor r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
} private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list; private readonly TokenRange _scope;
public TokenRange Scope => _scope; // [5]=Term : Factor ;
internal VnTerm(VnFactor first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
} // [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
internal void Add(Token r1, VnFactor r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
} public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnFactor结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Factor in the grammar(Exp).
/// </summary>
internal abstract partial class VnFactor : IFullFormatter {
// [6]=Factor : '(' Exp ')' ;
// [7]=Factor : 'number' ; public class C0 : VnFactor {
// [6]=Factor : '(' Exp ')' ;
public C0(Token r2, VnExp r1, Token r0) {
this.r2 = r2;
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r2, r0);
}
private readonly Token r2;
private readonly VnExp r1;
private readonly Token r0; private readonly TokenRange _scope;
public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r2, preConfig, writer);
// ( Exp )之间不留空格
var config = new BlankConfig(inlineBlank: 0, forceNewline: false);
context.PrintCommentsBetween(this.r2, this.r1, config, writer);
context.PrintBlanksAnd(this.r1, config, writer);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class C1 : VnFactor {
// [7]=Factor : 'number' ;
public C1(Token r0) {
this.r0 = r0;
this._scope = new TokenRange(r0);
}
private readonly Token r0;
private readonly TokenRange _scope;
public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// 根据上级设置的preConfig输出自己唯一的token
context.PrintBlanksAnd(this.r0, preConfig, writer);
}
} public abstract TokenRange Scope { get; }
public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);
}

最终,1234+567+89+0+0被格式化后的样子如下(在运算符两侧各加入1个空格):

1234 + 567 + 89 + 0 + 0

下面是更多示例:

// 格式化 1-2*3
1 - 2 * 3 // 格式化 (1+2)/3
(1 + 2) / 3 // 格式化 (1+2)*(3-4)
(1 + 2) * (3 - 4)
点击查看 GLSL代码格式化的示例1
// 示例1 格式化前
void main() {
int a=0;
a++++;
++++a;
}
// 示例1 格式化后
void main() {
int a = 0;
a++ ++;
++ ++a;
}
点击查看 GLSL代码格式化的示例2
// 示例2 格式化前
in vec3 passNormal;
in vec2 passTexCoord; uniform sampler2D textureMap;
uniform vec3 lihtDirection=vec3(1,1,1);
uniform vec3 diffuseColor;
uniform bool transparent=false; out vec4 outColor; void main( ){
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;} if (passTexCoord==vec2(-1,-1)){ // when texture coordinate not exists..
float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord);}
}
// 示例2 格式化后
in vec3 passNormal;
in vec2 passTexCoord; uniform sampler2D textureMap;
uniform vec3 lihtDirection = vec3(1, 1, 1);
uniform vec3 diffuseColor;
uniform bool transparent = false; out vec4 outColor; void main() {
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;
} if (passTexCoord == vec2(-1, -1)) { // when texture coordinate not exists..
float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord); }
}

关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。

举例-自动解决Shift/Reduce冲突

C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:

selection_statement :
'if' '(' expression ')' selection_rest_statement
;
selection_rest_statement :
statement 'else' statement
| statement
;

这样,在语法分析器读到'else'这个Token时,它是应当Shift这个'else'呢,还是应当按照selection_rest_statement : statement ;进行规约呢?这就产生了Shift/Reduce冲突。

bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:

const int syntaxStateCount = 477;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerGLSL.syntaxStates;
...
// 30814 actions
// conflicts(1)=not sovled(0)+solved(1)(1 warnings)
#region init actions of syntax states
...
// syntaxStates[454]:
// [324] selection_rest_statement : statement 'else' statement ; '--' '-' ';' '!' '(' '{' '}' '+' ..
// [325] selection_rest_statement : statement ; '--' '-' ';' '!' '(' '{' '}' '+' ..
// 'else' repeated 2 times
states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));
// ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));
states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));
...
#endregion init actions of syntax states
}

举例-优先级指令%nonassoc、%left、%right、%prec

如果我们按最直观的方式书写Calc.st,可能是这样的:

Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number'
点击查看 其LALR(1)语法分析器的状态转换表代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions // 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[9]/*34*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
states[9]/*36*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[10]/*44*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
states[10]/*46*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[11]/*54*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
states[11]/*56*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
states[11]/*58*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
states[11]/*60*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[12]/*64*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
states[12]/*66*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
states[12]/*68*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
states[12]/*70*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states
}

当处于syntaxStates[9]时,如果遇到'+'这个Token,那么本应当Reduce,但是bitParser根据默认的“Shift优先于Reduce”的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。

如果不想将文法改写为本文最初的样式,这里可以用优先级指令声明加减乘除运算的优先级,从而得到正确的语法分析表。

Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number' %left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce
%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce
点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions // 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);
states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);
states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);
states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);
states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);
states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);
states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);
states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);
states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);
states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);
states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);
states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);
states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states }

bitParser中的优先级指令%nonassoc%left%right%prec,与yacc相同:

  • 书写顺序靠后的,优先级更高;

  • %left偏向于Reduce;

  • %right偏向于Shift;

  • %nonassoc表示有语法错误;

  • %prec可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。

举例-/后缀

在Step格式的文件里,1=2中的1应当被认为是一个'entityId',而2应当被认为是一个'refEntity',即对“entity 2”的引用。

如何区别两者呢?当某个数值后面有=时,它就是'entityId',否则就是'refEntity'。此时就需要用/后缀功能,如下所示:

// postfix.st
// regulations:
Items : Items Item | Item ;
Item : 'entityId' '=' 'refEntity' ;
// lexi statements:
%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着“ =”,那么这些数字就是'entityId'
%%[0-9]+%% 'refEntity'
点击查看 生成的 lexi状态0 相关代码
// lexicalState0
private static readonly Action<LexicalContext, char> lexicalState0 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : '=' */
c == '='/*'\u003D'(61)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState2;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
context.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* deal with everything else. */
else { // token with error type
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
// lexicalState2
private static readonly Action<LexicalContext, char> lexicalState2 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Equal符, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
// lexicalState3
private static readonly Action<LexicalContext, char> lexicalState3 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken(st.@refEntity, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
// lexicalState4
private static readonly Action<LexicalContext, char> lexicalState4 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@entityId, context);
context.currentState = lexicalState0;
}
};

下面是二维数组ElseIf[][]形式的状态转换表,它复用了一些ElseIf对象,这进一步减少了空间占用。

点击查看 生成的 二维数组状态转换表 代码
private static readonly ElseIf[][] lexiStates = new ElseIf[5][];
static void InitializeLexiTable() {
ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times
ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times
ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times
lexiStates[0] = new ElseIf[] {
// possible Vt: 'entityId' 'refEntity'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: '='
/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId'
/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*2*/s61_61_0_4,
};
lexiStates[2] = new ElseIf[] {
// possible Vt: '='
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId' 'refEntity'
/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),
// possible Vt: 'entityId'
/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*3*/s61_61_0_4,
// possible Vt: 'refEntity'
/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),
};
}

其miniDFA的状态图如下:

结合代码和状态图容易发现:

  • 当处于miniDFA3状态时,如果遇到空格\t=则可断定下一个Token类型是'entityId'(或错误类型'Error错');否则就是'refEntity'(并返回miniDFA0状态)。这符合我们的预期。

  • 当处于miniDFA0状态(即初始状态)时,如果遇到[0-9],那么下一个Token类型既可能是'entityId',又可能是'refEntity'。此时尚且无法唯一断定之。

也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId''refEntity'的开头[0-9]+是相同的。

我们来对比它的NFA和DFA:

下面是它的NFA状态图:

下面是它的DFA状态图:

可见:

  • 在NFA状态图中,NFA0-0状态在遇到[0-9]时,有两个选择:NFA2-1状态和NFA3-1状态。在对应的DFA状态图中,NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态(DFA2)中。这是通过子集构造法实现的。

这里顺便展示一个构造miniDFA的例子:

// regulations:
Left : 'min' ;
// lexical statements
%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟',',则为'min'

这个'min'的DFA如下:

这个'min'的miniDFA如下:

可见,miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并,是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。

举例-前缀<'Vt'>

在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }中的Point识别为一个“type_name”类型的Token,而不是“identifier”类型的Token。这可以通过前缀来实现。

// ..
struct_specifier :
'struct' 'type_name' '{' struct_declaration_list '}' ;
// .. // lexical statements
%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为“type_name”类型
%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为“identifier”类型

添加前缀<'struct'>不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是“struct”类型:若是,则设置为“type_name”类型,否则,设置为“identifier”类型。

另外,还需要新增一个数组,用于记录已识别出的全部“type_name”类型,以便再次遇到它时,也能够将其设置为“type_name”类型。

点击查看 状态机受前缀影响的部分 代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* [a-zA-Z0-9_] */
else if (/* possible Vt : 'type_name' 'identifier' */
('a' <= c && c <= 'z')
|| ('A' <= c && c <= 'Z')
|| ('0' <= c && c <= '9')
|| (c == '_')) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken2(context
// 如果上一个Token是struct,那么新Token是type_name
, new(/*<'Vt'>*/st.@struct, st.@type_name)
// 否则,新Token是identifier
, new(/*default preVt*/string.Empty, st.@identifier));
context.currentState = lexicalState0;
}
};

与之配套的AcceptToken2(context, ifVts);函数也就复杂些:

点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码
struct IfVt {
public readonly string preVt;
public readonly string Vt;
public IfVt(string preVt, string Vt) {
this.preVt = preVt;
this.Vt = Vt;
}
} private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {
var startIndex = context.analyzingToken.start.index;
var endIndex = context.analyzingToken.end.index;
context.analyzingToken.value = context.sourceCode.Substring(
startIndex, endIndex - startIndex + 1);
var typeSet = false;
const string key = "type_name"; var hadThisTypeName = false;
if (!typeSet) {
if (context.tagDict.TryGetValue(key, out var type_nameList)) {
var list = type_nameList as List<string>;
if (list.Contains(context.analyzingToken.value)) {
// 如果是已识别出的type_name
context.analyzingToken.type = st.type_name;
typeSet = true;
hadThisTypeName = true;
}
}
}
if (!typeSet) {
int lastType = 0;
if (context.lastSyntaxValidToken != null) {
lastType = context.lastSyntaxValidToken.type;
}
for (var i = 0; i < ifVts.Length; i++) {
var ifVt = ifVts[i];
if (ifVt.preVt == 0 // 默认没有前缀
|| ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>
context.analyzingToken.type = ifVt.Vt;
typeSet = true;
break;
}
}
}
if (!typeSet) {
// we failed to assign type according to lexi statements.
// this indicates token error in source code or inappropriate lexi statements.
context.analyzingToken.type = st.Error错;
context.signalCondition = LexicalContext.defaultSignal;
} // cancel forward steps for post-regex
var backStep = context.cursor.index - context.analyzingToken.end.index;
if (backStep > 0) { context.MoveBack(backStep); }
// next operation: context.MoveForward(); var token = context.analyzingToken.Dump();
context.result.Add(token);
// 跳过注释
if (context.analyzingToken.type != st.blockComment
&& context.analyzingToken.type != st.inlineComment) {
context.lastSyntaxValidToken = token;
} if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {
// 将新识别出的type_name加入list
if (!context.tagDict.TryGetValue(key, out var type_nameList)) {
type_nameList = new List<string>();
context.tagDict.Add(key, type_nameList);
}
var list = type_nameList as List<string>;
list.Add(context.analyzingToken.value);
}
}

注意,语法分析不需要blockCommentinlineComment(类型为“注释”的Token),因而在记录上一个Token类型时,我们要跳过注释。

举例-状态信号<signal1, signal2, ..>

在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )中的r1r2都识别为“type_name”类型的Token,而不是“identifier”类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal实现。

状态信号是这样起作用的:

在读到一个“subroutine”类型的Token时,将LexicalContext.signal设置为subroutine0

在读到一个“(”类型的Token时,如果LexicalContext.signalsubroutine0,就将LexicalContext.signal设置为subroutine1

在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*的标识符时,如果LexicalContext.signalsubroutine1(这说明词法分析器刚刚连续读到了'subroutine' '('),就将它识别为“type_name”类型的Token,否则识别为“identifier”类型的Token;

在读到一个“)”类型的Token时,如果LexicalContext.signalsubroutine1,就将LexicalContext.signal设置为default(默认状态),即不再理会状态信号。

storage_qualifier :
| 'subroutine' '(' type_name_list ')' ; // lexical statements
%%subroutine%% 'subroutine' subroutine0
<subroutine0>%%[(]%% '(' subroutine1
<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'
<subroutine1>%%[,]%% ','
<subroutine1>%%[)]%% ')' default

状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。

举例-中文字符

如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00\u9FFF之间),可以这样:

Items : Items Item | Item ;
Item : 'chineseChar' | 'other' ; %%[\u4E00-u9FFF]%% 'chineseChar'
%%[^\u4E00-u9FFF]%% 'other'

End

拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)的更多相关文章

  1. 语法解析器续:case..when..语法解析计算

    之前写过一篇博客,是关于如何解析类似sql之类的解析器实现参考:https://www.cnblogs.com/yougewe/p/13774289.html 之前的解析器,更多的是是做语言的翻译转换 ...

  2. Boost学习之语法解析器--Spirit

    Boost.Spirit能使我们轻松地编写出一个简单脚本的语法解析器,它巧妙利用了元编程并重载了大量的C++操作符使得我们能够在C++里直接使用类似EBNF的语法构造出一个完整的语法解析器(同时也把C ...

  3. 在.NET Core中使用Irony实现自己的查询语言语法解析器

    在之前<在ASP.NET Core中使用Apworks快速开发数据服务>一文的评论部分,.NET大神张善友为我提了个建议,可以使用Compile As a Service的Roslyn为语 ...

  4. 深入理解Java类加载器(1):Java类加载原理解析

    1 基本信息 每个开发人员对Java.lang.ClassNotFoundExcetpion这个异常肯定都不陌生,这背后就涉及到了java技术体系中的类加载.Java的类加载机制是技术体系中比较核心的 ...

  5. 手写token解析器、语法解析器、LLVM IR生成器(GO语言)

    最近开始尝试用go写点东西,正好在看LLVM的资料,就写了点相关的内容 - 前端解析器+中间代码生成(本地代码的汇编.执行则靠LLVM工具链完成) https://github.com/daibinh ...

  6. 用java实现编译器-算术表达式及其语法解析器的实现

    大家在参考本节时,请先阅读以下博文,进行预热: http://blog.csdn.net/tyler_download/article/details/50708807 本节代码下载地址: http: ...

  7. 使用 java 实现一个简单的 markdown 语法解析器

    1. 什么是 markdown Markdown 是一种轻量级的「标记语言」,它的优点很多,目前也被越来越多的写作爱好者,撰稿者广泛使用.看到这里请不要被「标记」.「语言」所迷惑,Markdown 的 ...

  8. Anrlr4 生成C++版本的语法解析器

    一. 写在前面 我最早是在2005年,首次在实际开发中实现语法解析器,当时调研了Yacc&Lex,觉得风格不是太好,关键当时yacc对多线程也支持的不太好,接着就又学习了Bison&F ...

  9. 深入理解Java类加载器(一):Java类加载原理解析

    摘要: 每个开发人员对java.lang.ClassNotFoundExcetpion这个异常肯定都不陌生,这个异常背后涉及到的是Java技术体系中的类加载机制.本文简述了JVM三种预定义类加载器,即 ...

  10. Kafka设计解析(四)Kafka Consumer设计解析

    转载自 技术世界,原文链接 Kafka设计解析(四)- Kafka Consumer设计解析 目录 一.High Level Consumer 1. Consumer Group 2. High Le ...

随机推荐

  1. 70 个群都来问我的 AI 日报,是这么做的。

    最近我给 FastGPT 用户交流群里接入了 AI 日报,每天早上 10 点会自动向群里推送 AI 日报,让群里的小伙伴们第一时间了解到昨天 AI 领域都发生了哪些大事. 效果大概是这个样子的: 如果 ...

  2. ArrayList removeRange方法分析

    <ArrayList原码分析>一文中提到了"为什么removeRange(int fromIndex,int toIndex)是protected的?" 先给出remo ...

  3. k8s之资源篇

    在线k8s环境: https://killercoda.com/playgrounds/scenario/kubernetes 相关文档: https://github.com/bregman-ari ...

  4. 性能检测工具之Lighthouse

    转载:https://mp.weixin.qq.com/s?src=11&timestamp=1618929340&ver=3020&signature=oXyx*RDLXjN ...

  5. 使用IDEA一键发布应用

    1.编辑Dockerfile from java:8 WORKDIR /usr/local ADD ./target/jpaas-bpm.jar . CMD ["java",&qu ...

  6. Python 学习记录(1)

    前言 简单说明: 简单情况:主要记录学习Python的简单情况,包括代码与结果,以及关键注释 工具与来源: 以下代码与结果都可在JupyterLab上实现,更多情况可看Github 使用 NumPy ...

  7. SFE人才需要具备哪些能力

    SFE(销售队伍效力)人才在企业中扮演着至关重要的角色,他们需要具备一系列的能力来确保销售队伍的高效运作和业绩提升.关于SFE的角色和能力,可以从业务理解.数据洞察.向上管理以及效率提升等几个方面来通 ...

  8. IO介绍-下

    中断 由外部设备引起的中断,称为外中断. 由内部错误引起的中断,称为内中断,或者是陷入.例如:非法指令,地址越界,电源故障等.   中断向量表 中断优先级 多中断源的处理方式 屏蔽中断 嵌套中断 根据 ...

  9. VTK 平面裁剪

    有些时候需要显示零件内部情况,所有会对零件显示进行平面裁剪,这里用到了vtkPlane和vtkClipPolyData. vtkPlane是定义一个平面,vtkClipPolyData使用vtkPla ...

  10. Qt在linux下实现程序编译后版本号自增的脚本

    #! /bin/bash rm -rf temp.cpp num=0 while read line do if [ $num -eq 3 ];then array=(`echo $line | tr ...