拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。

可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。

https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。

1234+567+89+0+0的语法树生成过程☟

(1234+567)/(89-0)的语法树生成过程☟

可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)

词法分析器的生成器

  • 分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)

  • 支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>,便于识别id = refIdstruct type_name<Comment>[^*\n]*subroutine(type_name_list)这类情况。类似lex,但不完全相同。

  • 无须显式书写Token类型、状态信号、保留字,即无须lex中的%s NUM ID ..%x Comment End Text ..[(] { return ('('); }

  • 注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。

  • 生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。

语法分析器的生成器

  • 分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。

  • 支持优先级指令%nonassoc%left%right%prec,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。

  • 注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。

  • 生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。

  • 生成nullable、FIRST集、FOLLOW集的文档。

其他

  • 无须lex和yacc中的%option%union%define%{%}%parse-param%lex-param%pure-parser%expect%token%type。做成类库,直接按如下方式调用即可:
var compiler = new CompilerXxx();
var sourceCode = File.ReadAllText("input.st");
var tokens = compiler.Analyze(sourceCode);
var syntaxTree = compiler.Parse(tokens);
var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);
// use extractedObj for user-defined business ..
  • 支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**///,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
%%#[^\r\n]*%% 'inlineComment'
  • 生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。

  • 大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。

点击查看 其他功能
  • 支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars,默认范围均为[\u0001-\uFFFF](即除'\0'外的全部Unicode字符),可自定义其范围。

  • 支持%omit指令,可指定要忽略的空白符。默认为'空格''\t''\n''\r''\0'

  • 支持%start指定起始语法结点。

举例-Calc.st

输入文件Calc.st

能够处理加减乘除和括号运算的解析器,其文法如下:

// 输入文件Calc.st
Exp : Exp '+' Term
| Exp '-' Term
| Term ;
Term : Term '*' Factor
| Term '/' Factor
| Factor ;
Factor : '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number' // 示例只处理非负整数
//无须书写 %%[+]%% '+' 等

据此文法,我们可以生成下述内容:

生成的词法分析器

生成的ε-NFA的词法分析器的状态图如下:

生成的miniDFA的词法分析器的状态图如下:

点击查看 生成的 终结点Vt和非终结点Vn 代码
// 如不需要,可删除此数组
public static readonly IReadOnlyList<string> stArray = new string[] {
"'¥'", // @终 = 0;
"'+'", // @Plus符 = 1; // '+'
"'-'", // @Dash符 = 2; // '-'
"'*'", // @Asterisk符 = 3; // '*'
"'/'", // @Slash符 = 4; // '/'
"'('", // @LeftParenthesis符 = 5; // '('
"')'", // @RightParenthesis符 = 6; // ')'
"'number'", // @number = 7; // 'number'
// end of 1 + 7 Vt
"Exp", // Exp枝 = 8; // Exp
"Term", // Term枝 = 9; // Term
"Factor", // Factor枝 = 10; // Factor
// end of 3 Vn
};
/// <summary>
/// Vt types are used both for lexical-analyze and syntax-parse.
/// <para>Vn types are only for syntax-parse.</para>
/// <para>Vt is quoted in ''.</para>
/// <para>Vn is not quoted in ''.</para>
/// </summary>
public static class st {
// Vt
/// <summary>
/// Something wrong within the source code.
/// </summary>
public const int Error错 = -1; // "'×'"; /// <summary>
/// end of token list.
/// </summary>
public const int @终 = 0; // "'¥'"; /// <summary>
/// '+'
/// </summary>
public const int @Plus符 = 1; // "'+'"
/// <summary>
/// '-'
/// </summary>
public const int @Dash符 = 2; // "'-'"
/// <summary>
/// '*'
/// </summary>
public const int @Asterisk符 = 3; // "'*'"
/// <summary>
/// '/'
/// </summary>
public const int @Slash符 = 4; // "'/'"
/// <summary>
/// '('
/// </summary>
public const int @LeftParenthesis符 = 5; // "'('"
/// <summary>
/// ')'
/// </summary>
public const int @RightParenthesis符 = 6; // "')'"
/// <summary>
/// 'number'
/// </summary>
public const int @number = 7; // "'number'"
/// <summary>
/// count of ('¥' + user-defined Vt)
/// </summary>
public const int VtCount = 8; // Vn
/// <summary>
/// Exp
/// </summary>
public const int Exp枝 = 8; // "Exp"
/// <summary>
/// Term
/// </summary>
public const int Term枝 = 9; // "Term"
/// <summary>
/// Factor
/// </summary>
public const int Factor枝 = 10; // "Factor"
}
点击查看 生成的 保留字(即一个语言中的keyword) 相关代码
public static class reservedWord {
/// <summary>
/// +
/// </summary>
public const string @Plus符 = "+";
/// <summary>
/// -
/// </summary>
public const string @Dash符 = "-";
/// <summary>
/// *
/// </summary>
public const string @Asterisk符 = "*";
/// <summary>
/// /
/// </summary>
public const string @Slash符 = "/";
/// <summary>
/// (
/// </summary>
public const string @LeftParenthesis符 = "(";
/// <summary>
/// )
/// </summary>
public const string @RightParenthesis符 = ")";
} /// <summary>
/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.
/// <para>otherwise, return false.</para>
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
private static bool CheckReservedWord(AnalyzingToken token) {
bool isReservedWord = true;
switch (token.value) {
case reservedWord.@Plus符: token.type = st.@Plus符; break;
case reservedWord.@Dash符: token.type = st.@Dash符; break;
case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;
case reservedWord.@Slash符: token.type = st.@Slash符; break;
case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;
case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break; default: isReservedWord = false; break;
} return isReservedWord;
}

以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表

点击查看 生成的 lexi状态0 相关代码
// lexicalState0
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* \) */
else if (/* possible Vt : ')' */
c == ')'/*'\u0029'(41)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState2;
}
/* \( */
else if (/* possible Vt : '(' */
c == '('/*'\u0028'(40)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState3;
}
/* \/ */
else if (/* possible Vt : '/' */
c == '/'/*'\u002F'(47)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState4;
}
/* \* */
else if (/* possible Vt : '*' */
c == '*'/*'\u002A'(42)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState5;
}
/* - */
else if (/* possible Vt : '-' */
c == '-'/*'\u002D'(45)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState6;
}
/* \+ */
else if (/* possible Vt : '+' */
c == '+'/*'\u002B'(43)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState7;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
wrapper.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
// lexicalState1
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* deal with everything else. */
else {
AcceptToken(st.@number, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
// lexicalState2
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@RightParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
// lexicalState3
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@LeftParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
// lexicalState4
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Slash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态5 相关代码
// lexicalState5
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Asterisk符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态6 相关代码
// lexicalState6
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Dash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态7 相关代码
// lexicalState7
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Plus符, context);
wrapper.currentState = lexicalState0;
}
};

其调用过程如下:

// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var wrapper = new CurrentStateWrapper(this.initialState);
do {
char currentChar = context.MoveForward();
wrapper.currentState(context, currentChar, wrapper);
// wrapper.currentState will be set to next lexi-state.
} while (!context.EOF); return context.result;
}

下面是用一个二维数组ElseIf[][]实现的词法状态转换表,其占用空间较少,且执行效率也有所提高。

private static readonly ElseIf[][] lexiStates = new ElseIf[8][];
static void InitializeLexiTable() {
lexiStates[0] = new ElseIf[] {
// possible Vt: '('
/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: ')'
/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),
// possible Vt: '*'
/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),
// possible Vt: '+'
/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),
// possible Vt: '-'
/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),
// possible Vt: '/'
/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),
// possible Vt: 'number'
/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'number'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),
// possible Vt: 'number'
/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),
};
lexiStates[2] = new ElseIf[] {
// possible Vt: ')'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: '('
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: '/'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),
};
lexiStates[5] = new ElseIf[] {
// possible Vt: '*'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),
};
lexiStates[6] = new ElseIf[] {
// possible Vt: '-'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),
};
lexiStates[7] = new ElseIf[] {
// possible Vt: '+'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),
};
}

顾名思义,一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样,就用一个ElseIf[]取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf

点击查看 调用二维数组实现的词法分析器 相关代码
// skip '\0' at lexi-state 0
private static readonly ElseIf skipZero = new(
char.MinValue, char.MaxValue, Acts.None,
nextState: 0);
// construct a error token at lexi-state 0
private static readonly ElseIf unexpectedChar = new(
char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'");
// construct a error token at other lexi-states
private static readonly ElseIf errorToken = new(
char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'"); // analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var currentStateId = 0;
do {
// read current char,
char currentChar = context.MoveForward();
ElseIf[] lexiState = lexiStates[currentStateId];
// binary search for the segment( else if (left <= c && c <= right) { ... } )
var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);
if (segment is null) {
if (currentStateId == 0) { // the initial state of lexical analyze.
segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);
if (segment is null) {
// '\0' must be skipped.
if (currentChar == '\0') { segment = skipZero; }
else { segment = unexpectedChar; }
}
}
else { // token with error type
segment = errorToken;
}
}
// construct the next token,
var scripts = segment.scripts;
if (scripts != 0) { // it is 0 in most cases.
if ((scripts & Acts.Begin) != 0) {
this.beginToken(context);
}
if ((scripts & Acts.Extend) != 0) {
this.extendToken(context);
} if ((scripts & Acts.Accept) != 0) {
this.acceptToken(context, segment.Vt);
}
else if ((scripts & Acts.Accept2) != 0) {
this.acceptToken2(context, segment.ifVts);
}
}
// point to next state.
currentStateId = segment.nextStateId;
} while (!context.EOF); return context.result;
} private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {
var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);
if (right < 0) { }
else {
var result = -1;
while (left < right) {
var mid = (left + right) / 2;
var current = lexiState[mid];
if (currentChar < current.min) { result = -1; }
else if (current.max < currentChar) { result = 1; }
else { return current; } if (result < 0) { right = mid; }
else { left = mid + 1; }
}
{
var current = lexiState[left];
if (current.min <= currentChar && currentChar <= current.max) {
return current;
}
}
}
if (specialLast) {
var last = lexiState[lexiState.Length - 1];
return last;
// no need to compare, because it's ['\0', '\uFFFF']
//if (last.min <= currentChar && currentChar <= last.max) {
// return last;
//}
} return null;
}

生成的语法分析器

点击查看 nullable、FIRST集、FOLLOW集
nullable:
[0]: nullable( Exp' ) = False
[1]: nullable( Exp ) = False
[2]: nullable( Term ) = False
[3]: nullable( Factor ) = False
[4]: nullable( '¥' ) = False
[5]: nullable( '+' ) = False
[6]: nullable( '-' ) = False
[7]: nullable( '*' ) = False
[8]: nullable( '/' ) = False
[9]: nullable( '(' ) = False
[10]: nullable( ')' ) = False
[11]: nullable( 'number' ) = False FIRST集:
[0]: FIRST( Exp' ) = { '(' 'number' }
[1]: FIRST( Exp ) = { '(' 'number' }
[2]: FIRST( Term ) = { '(' 'number' }
[3]: FIRST( Factor ) = { '(' 'number' }
[4]: FIRST( '¥' ) = { '¥' }
[5]: FIRST( '+' ) = { '+' }
[6]: FIRST( '-' ) = { '-' }
[7]: FIRST( '*' ) = { '*' }
[8]: FIRST( '/' ) = { '/' }
[9]: FIRST( '(' ) = { '(' }
[10]: FIRST( ')' ) = { ')' }
[11]: FIRST( 'number' ) = { 'number' }
[12]: FIRST( Exp '+' Term ) = { '(' 'number' }
[13]: FIRST( Exp '-' Term ) = { '(' 'number' }
[14]: FIRST( Term '*' Factor ) = { '(' 'number' }
[15]: FIRST( Term '/' Factor ) = { '(' 'number' }
[16]: FIRST( '(' Exp ')' ) = { '(' } FOLLOW集:
[0]: FOLLOW( Exp' ) = { '¥' }
[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }
[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }
[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }
点击查看 生成的 规约规则 代码
public static IReadOnlyList<Regulation> Regulations => regulations;
private static readonly Regulation[] regulations = new Regulation[] {
// [0]=Exp : Exp '+' Term ;
new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝),
// [1]=Exp : Exp '-' Term ;
new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝),
// [2]=Exp : Term ;
new(2, st.Exp枝, st.Term枝),
// [3]=Term : Term '*' Factor ;
new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝),
// [4]=Term : Term '/' Factor ;
new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝),
// [5]=Term : Factor ;
new(5, st.Term枝, st.Factor枝),
// [6]=Factor : '(' Exp ')' ;
new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符),
// [7]=Factor : 'number' ;
new(7, st.Factor枝, st.@number),
};
点击查看 生成的 LALR(1)语法分析表 代码
const int syntaxStateCount = 16;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 5);
states[1] = new(capacity: 3);
states[2] = new(capacity: 6);
states[3] = new(capacity: 6);
states[4] = new(capacity: 5);
states[5] = new(capacity: 6);
states[6] = new(capacity: 4);
states[7] = new(capacity: 4);
states[8] = new(capacity: 3);
states[9] = new(capacity: 3);
states[10] = new(capacity: 3);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
states[14] = new(capacity: 6);
states[15] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times
LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times
LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times
LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times
LRParseAction aReduce2 = new(regulations[2]);// refered 4 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce7 = new(regulations[7]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 4 times
LRParseAction aReduce1 = new(regulations[1]);// refered 4 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
LRParseAction aReduce6 = new(regulations[6]);// refered 6 times
#endregion re-used actions // 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Term ; '-' '+' '¥'
// [1] Exp : Exp '-' Term ; '-' '+' '¥'
// [2] Exp : Term ; '-' '+' '¥'
// [3] Term : Term '*' Factor ; '-' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' '*' '/' '+' '¥'
/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));
/*1*/states[0].Add(st.Term枝, aGoto2);
/*2*/states[0].Add(st.Factor枝, aGoto3);
/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);
/*4*/states[0].Add(st.@number, aShift5);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Term ; '-' '+' '¥'
// [1] Exp : Exp '-' Term ; '-' '+' '¥'
/*5*/states[1].Add(st.@Plus符, aShift6);
/*6*/states[1].Add(st.@Dash符, aShift7);
/*7*/states[1].Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [2] Exp : Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*8*/states[2].Add(st.@Asterisk符, aShift8);
/*9*/states[2].Add(st.@Slash符, aShift9);
/*10*/states[2].Add(st.@Dash符, aReduce2);
/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);
/*12*/states[2].Add(st.@Plus符, aReduce2);
/*13*/states[2].Add(st.@终, aReduce2);
// syntaxStates[3]:
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
/*14*/states[3].Add(st.@Dash符, aReduce5);
/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);
/*16*/states[3].Add(st.@Asterisk符, aReduce5);
/*17*/states[3].Add(st.@Slash符, aReduce5);
/*18*/states[3].Add(st.@Plus符, aReduce5);
/*19*/states[3].Add(st.@终, aReduce5);
// syntaxStates[4]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Term ; '-' ')' '+'
// [1] Exp : Exp '-' Term ; '-' ')' '+'
// [2] Exp : Term ; '-' ')' '+'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+'
// [5] Term : Factor ; '-' ')' '*' '/' '+'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+'
/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));
/*21*/states[4].Add(st.Term枝, aGoto2);
/*22*/states[4].Add(st.Factor枝, aGoto3);
/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);
/*24*/states[4].Add(st.@number, aShift5);
// syntaxStates[5]:
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*25*/states[5].Add(st.@Dash符, aReduce7);
/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);
/*27*/states[5].Add(st.@Asterisk符, aReduce7);
/*28*/states[5].Add(st.@Slash符, aReduce7);
/*29*/states[5].Add(st.@Plus符, aReduce7);
/*30*/states[5].Add(st.@终, aReduce7);
// syntaxStates[6]:
// [0] Exp : Exp '+' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));
/*32*/states[6].Add(st.Factor枝, aGoto3);
/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);
/*34*/states[6].Add(st.@number, aShift5);
// syntaxStates[7]:
// [1] Exp : Exp '-' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [5] Term : Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));
/*36*/states[7].Add(st.Factor枝, aGoto3);
/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);
/*38*/states[7].Add(st.@number, aShift5);
// syntaxStates[8]:
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));
/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);
/*41*/states[8].Add(st.@number, aShift5);
// syntaxStates[9]:
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [7] Factor : 'number' ; '-' ')' '*' '/' '+' '¥'
/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));
/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);
/*44*/states[9].Add(st.@number, aShift5);
// syntaxStates[10]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Term ; '-' ')' '+'
// [1] Exp : Exp '-' Term ; '-' ')' '+'
/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));
/*46*/states[10].Add(st.@Plus符, aShift6);
/*47*/states[10].Add(st.@Dash符, aShift7);
// syntaxStates[11]:
// [0] Exp : Exp '+' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*48*/states[11].Add(st.@Asterisk符, aShift8);
/*49*/states[11].Add(st.@Slash符, aShift9);
/*50*/states[11].Add(st.@Dash符, aReduce0);
/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);
/*52*/states[11].Add(st.@Plus符, aReduce0);
/*53*/states[11].Add(st.@终, aReduce0);
// syntaxStates[12]:
// [1] Exp : Exp '-' Term ; '-' ')' '+' '¥'
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*54*/states[12].Add(st.@Asterisk符, aShift8);
/*55*/states[12].Add(st.@Slash符, aShift9);
/*56*/states[12].Add(st.@Dash符, aReduce1);
/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);
/*58*/states[12].Add(st.@Plus符, aReduce1);
/*59*/states[12].Add(st.@终, aReduce1);
// syntaxStates[13]:
// [3] Term : Term '*' Factor ; '-' ')' '*' '/' '+' '¥'
/*60*/states[13].Add(st.@Dash符, aReduce3);
/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);
/*62*/states[13].Add(st.@Asterisk符, aReduce3);
/*63*/states[13].Add(st.@Slash符, aReduce3);
/*64*/states[13].Add(st.@Plus符, aReduce3);
/*65*/states[13].Add(st.@终, aReduce3);
// syntaxStates[14]:
// [4] Term : Term '/' Factor ; '-' ')' '*' '/' '+' '¥'
/*66*/states[14].Add(st.@Dash符, aReduce4);
/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);
/*68*/states[14].Add(st.@Asterisk符, aReduce4);
/*69*/states[14].Add(st.@Slash符, aReduce4);
/*70*/states[14].Add(st.@Plus符, aReduce4);
/*71*/states[14].Add(st.@终, aReduce4);
// syntaxStates[15]:
// [6] Factor : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
/*72*/states[15].Add(st.@Dash符, aReduce6);
/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);
/*74*/states[15].Add(st.@Asterisk符, aReduce6);
/*75*/states[15].Add(st.@Slash符, aReduce6);
/*76*/states[15].Add(st.@Plus符, aReduce6);
/*77*/states[15].Add(st.@终, aReduce6);
#endregion init actions of syntax states
}

生成的LALR(1)语法分析器的状态图和状态表如下:

由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。

状态 '+' '-' '*' '/' '(' ')' 'number' '¥' Exp Term Factor
0 S4 S5 G1 G2 G3
1 S6 S7
2 R[2] R[2] S8 S9 R[2] R[2]
3 R[5] R[5] R[5] R[5] R[5] R[5]
4 S4 S5 G10 G2 G3
5 R[7] R[7] R[7] R[7] R[7] R[7]
6 S4 S5 G11 G3
7 S4 S5 G12 G3
8 S4 S5 G13
9 S4 S5 G14
10 S6 S7 S15
11 R[0] R[0] S8 S9 R[0] R[0]
12 R[1] R[1] S8 S9 R[1] R[1]
13 R[3] R[3] R[3] R[3] R[3] R[3]
14 R[4] R[4] R[4] R[4] R[4] R[4]
15 R[6] R[6] R[6] R[6] R[6] R[6]

上表中:

  • S6表示Shift并进入状态6

  • R[2]表示用regulations[2]=Exp : Term ;规约

  • G1表示进入状态1

  • 表示Accept,即分析完毕,语法树生成成功;

  • 空白的地方表示遇到语法错误

点击查看 调用语法分析表 代码
public SyntaxTree Parse(TokenList tokenList) {
var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);
var accept = false;
do {
Token token = context.CurrentToken;
int nodeType = token.type;// auto-convert from string to string.
while (nodeType == blockComment || nodeType == inlineComment) {
// skip comment token
context.cursor++;
token = context.CurrentToken;
nodeType = token.type;// auto-convert from string to string.
} Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context);
accept = parseAction.kind == LRParseAction.Kind.Accept;
}
else { // syntax error happened.
return new SyntaxTree(context);
}
} while (!accept); var root = context.root;
Debug.Assert(root != null);
return new SyntaxTree(root);
} public class LRParseAction {
public enum Kind {
Error,
Shift,
Reduce,
Goto,
Accept,
} public readonly Kind kind;
[StructLayout(LayoutKind.Explicit)]
struct Union {
[FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;
[FieldOffset(0)] public Regulation regulation;
}
private Union union; // 执行分析动作。
public void Execute(LRSyntaxContext context) {
switch (this.kind) {
case Kind.Error: { throw new NotImplementedException(); }
//break;
case Kind.Shift: {
var token = context.CurrentToken;
var leaf = new LRNode(token);
context.nodeStack.Push(leaf);
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
// prepare for next loop.
context.cursor++;
}
break;
case Kind.Reduce: {
var regulation = this.union.regulation;
int count = regulation.right.Length;
var children = new LRNode[count];
var start = Token.empty; LRNode? lastNode = null;
var first = true;
for (int i = 0; i < count; i++) {
var state = context.stateStack.Pop();//只弹出,不再使用。
var node = context.nodeStack.Pop();
children[count - i - 1] = node;
if (node.start.index >= 0) { // this node includes token
if (first) { lastNode = node; first = false; }
start = node.start;
}
}
int tokenCount = 0;
if (lastNode is not null) {
tokenCount =
lastNode.start.index // comment tokens inside of parent are included.
- start.index // comment tokens before parent are excluded.
+ lastNode.tokenCount; // comment tokens after parent are excluded.
}
var parent = new LRNode(regulation, start, tokenCount, children);
for (var i = 0; i < count; i++) { children[i].parent = parent; }
context.nodeStack.Push(parent);
// goto next syntax-state
Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
var nodeType = regulation.left;
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context); // parseAction is supposed to be a Goto action
}
Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);
}
break;
case Kind.Goto: {
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
}
break;
case Kind.Accept: {
var state = context.stateStack.Pop();
context.root = context.nodeStack.Pop();
context.Finish(context.root, 40, context.stArray);
}
break;
default: { throw new NotImplementedException(); }
}
}
}

生成的语义内容提取器

举例来说,对于1234+567+89+0+0这个输入,Calc.st经过语法分析得到的语法树如下所示:(语法树的生成过程见本文开头)

R[0]=Exp : Exp '+' Term ;T[0->8]
├─R[0]=Exp : Exp '+' Term ;T[0->6]
│ ├─R[0]=Exp : Exp '+' Term ;T[0->4]
│ │ ├─R[0]=Exp : Exp '+' Term ;T[0->2]
│ │ │ ├─R[2]=Exp : Term ;T[0]
│ │ │ │ └─R[5]=Term : Factor ;T[0]
│ │ │ │ └─R[7]=Factor : 'number' ;T[0]
│ │ │ │ └─T[0]='number' 1234
│ │ │ ├─T[1]='+' +
│ │ │ └─R[5]=Term : Factor ;T[2]
│ │ │ └─R[7]=Factor : 'number' ;T[2]
│ │ │ └─T[2]='number' 567
│ │ ├─T[3]='+' +
│ │ └─R[5]=Term : Factor ;T[4]
│ │ └─R[7]=Factor : 'number' ;T[4]
│ │ └─T[4]='number' 89
│ ├─T[5]='+' +
│ └─R[5]=Term : Factor ;T[6]
│ └─R[7]=Factor : 'number' ;T[6]
│ └─T[6]='number' 0
├─T[7]='+' +
└─R[5]=Term : Factor ;T[8]
└─R[7]=Factor : 'number' ;T[8]
└─T[8]='number' 0

从左上到右下,连续4个R[0]显著增加了树的深度。

bitParser自动生成的语义内容提取器,会后序遍历此语法树,提取结点信息。

点击查看 通用的 遍历语法树 相关代码
// Extract some data structure from syntax tree.
// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>
public T? Extract(LRNode root, TokenList tokens, string sourceCode) {
var context = new TContext<T>(root, tokens, sourceCode); var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();
// init stack.
{
// push nextLeft and its next pending children.
var nextLeft = root; var index = 0;
nodeStack.Push(nextLeft); indexStack.Push(index);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
} while (nodeStack.Count > 0) {
var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;
if (index < current.children.Count) {
// push this node back again.
nodeStack.Push(current); indexStack.Push(index); // push nextLeft and its next pending children.
var nextLeft = current.children[index];
nodeStack.Push(nextLeft); indexStack.Push(0);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
}
else {
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
}
} {
var current = this.EOT;
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
} return context.result;
}
点击查看 生成的 在遍历语法树时提取结点信息 相关代码
private static readonly Dictionary<int/*LRNode.type*/,
Action<LRNode, TContext<Exp>>> @expExtractorDict = new(); private static readonly Action<LRNode, TContext<Exp>> VtHandler =
(node, context) => {
var token = node.start;
context.objStack.Push(token);
}; // initialize dict for extractor.
private static void InitializeExtractorDict() {
var extractorDict = @expExtractorDict;
extractorDict.Add(st.@Plus符, VtHandler);
extractorDict.Add(st.@Dash符, VtHandler);
extractorDict.Add(st.@Asterisk符, VtHandler);
extractorDict.Add(st.@Slash符, VtHandler);
extractorDict.Add(st.@LeftParenthesis符, VtHandler);
extractorDict.Add(st.@RightParenthesis符, VtHandler);
extractorDict.Add(st.@number, VtHandler);
extractorDict.Add(st.@终,
static (node, context) => {
// [-1]=Exp' : Exp ;
// dumped by ExternalExtractor
var @final = (VnExp?)context.objStack.Pop();
var left = new Exp(@final);
context.result = left; // final step, no need to push into stack.
}); // end of extractorDict.Add(st.@终, (node, context) => { ... });
extractorDict.Add(st.Exp枝,
static (node, context) => {
switch (node.regulation.index) {
case 0: { // [0]=Exp : Exp '+' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 1: { // [1]=Exp : Exp '-' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 2: { // [2]=Exp : Term ;
// dumped by ListExtractor 1
var r0 = (VnTerm?)context.objStack.Pop();
var left = new VnExp(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });
extractorDict.Add(st.Term枝,
static (node, context) => {
switch (node.regulation.index) {
case 3: { // [3]=Term : Term '*' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 4: { // [4]=Term : Term '/' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 5: { // [5]=Term : Factor ;
// dumped by ListExtractor 1
var r0 = (VnFactor?)context.objStack.Pop();
var left = new VnTerm(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });
extractorDict.Add(st.Factor枝,
static (node, context) => {
switch (node.regulation.index) {
case 6: { // [6]=Factor : '(' Exp ')' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var r1 = (VnExp?)context.objStack.Pop();
var r2 = (Token?)context.objStack.Pop();
var left = new VnFactor(r2, r1, r0);
context.objStack.Push(left);
}
break;
case 7: { // [7]=Factor : 'number' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var left = new VnFactor(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });
}

提取到的结点信息,展示出来如下:

VnExpT[0->8]
├─VnTermT[0]
│ └─VnFactorT[0]
│ └─T[0]='number' 1234
└─List T[1->8]
├─Item T[1->2]
│ ├─T[1]='+' +
│ └─VnTermT[2]
│ └─VnFactorT[2]
│ └─T[2]='number' 567
├─Item T[3->4]
│ ├─T[3]='+' +
│ └─VnTermT[4]
│ └─VnFactorT[4]
│ └─T[4]='number' 89
├─Item T[5->6]
│ ├─T[5]='+' +
│ └─VnTermT[6]
│ └─VnFactorT[6]
│ └─T[6]='number' 0
└─Item T[7->8]
├─T[7]='+' +
└─VnTermT[8]
└─VnFactorT[8]
└─T[8]='number' 0

类似Exp : Exp '+' Term ;的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构“压平”。根据压平了的语义信息,很容易将源代码格式化。

点击查看 VnExp结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Exp in the grammar(Exp).
/// </summary>
internal partial class VnExp : IFullFormatter {
// [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
// [2]=Exp : Term ; private readonly VnTerm first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnTerm r0; public PostItem(Token r1, VnTerm r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list; private readonly TokenRange _scope;
public TokenRange Scope => _scope; internal VnExp(VnTerm first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
} // [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
internal void Add(Token r1, VnTerm r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
} public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnTerm结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Term in the grammar(Exp).
/// </summary>
internal partial class VnTerm : IFullFormatter {
// [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
// [5]=Term : Factor ; private readonly VnFactor first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnFactor r0; public PostItem(Token r1, VnFactor r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
} private readonly TokenRange _scope;
public TokenRange Scope => _scope; public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list; private readonly TokenRange _scope;
public TokenRange Scope => _scope; // [5]=Term : Factor ;
internal VnTerm(VnFactor first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
} // [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
internal void Add(Token r1, VnFactor r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
} public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnFactor结点的格式化 代码
/// <summary>
/// Correspond to the Vn node Factor in the grammar(Exp).
/// </summary>
internal abstract partial class VnFactor : IFullFormatter {
// [6]=Factor : '(' Exp ')' ;
// [7]=Factor : 'number' ; public class C0 : VnFactor {
// [6]=Factor : '(' Exp ')' ;
public C0(Token r2, VnExp r1, Token r0) {
this.r2 = r2;
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r2, r0);
}
private readonly Token r2;
private readonly VnExp r1;
private readonly Token r0; private readonly TokenRange _scope;
public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r2, preConfig, writer);
// ( Exp )之间不留空格
var config = new BlankConfig(inlineBlank: 0, forceNewline: false);
context.PrintCommentsBetween(this.r2, this.r1, config, writer);
context.PrintBlanksAnd(this.r1, config, writer);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class C1 : VnFactor {
// [7]=Factor : 'number' ;
public C1(Token r0) {
this.r0 = r0;
this._scope = new TokenRange(r0);
}
private readonly Token r0;
private readonly TokenRange _scope;
public override TokenRange Scope => _scope; public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// 根据上级设置的preConfig输出自己唯一的token
context.PrintBlanksAnd(this.r0, preConfig, writer);
}
} public abstract TokenRange Scope { get; }
public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);
}

最终,1234+567+89+0+0被格式化后的样子如下(在运算符两侧各加入1个空格):

1234 + 567 + 89 + 0 + 0

下面是更多示例:

// 格式化 1-2*3
1 - 2 * 3 // 格式化 (1+2)/3
(1 + 2) / 3 // 格式化 (1+2)*(3-4)
(1 + 2) * (3 - 4)
点击查看 GLSL代码格式化的示例1
// 示例1 格式化前
void main() {
int a=0;
a++++;
++++a;
}
// 示例1 格式化后
void main() {
int a = 0;
a++ ++;
++ ++a;
}
点击查看 GLSL代码格式化的示例2
// 示例2 格式化前
in vec3 passNormal;
in vec2 passTexCoord; uniform sampler2D textureMap;
uniform vec3 lihtDirection=vec3(1,1,1);
uniform vec3 diffuseColor;
uniform bool transparent=false; out vec4 outColor; void main( ){
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;} if (passTexCoord==vec2(-1,-1)){ // when texture coordinate not exists..
float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord);}
}
// 示例2 格式化后
in vec3 passNormal;
in vec2 passTexCoord; uniform sampler2D textureMap;
uniform vec3 lihtDirection = vec3(1, 1, 1);
uniform vec3 diffuseColor;
uniform bool transparent = false; out vec4 outColor; void main() {
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;
} if (passTexCoord == vec2(-1, -1)) { // when texture coordinate not exists..
float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord); }
}

关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。

举例-自动解决Shift/Reduce冲突

C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:

selection_statement :
'if' '(' expression ')' selection_rest_statement
;
selection_rest_statement :
statement 'else' statement
| statement
;

这样,在语法分析器读到'else'这个Token时,它是应当Shift这个'else'呢,还是应当按照selection_rest_statement : statement ;进行规约呢?这就产生了Shift/Reduce冲突。

bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:

const int syntaxStateCount = 477;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerGLSL.syntaxStates;
...
// 30814 actions
// conflicts(1)=not sovled(0)+solved(1)(1 warnings)
#region init actions of syntax states
...
// syntaxStates[454]:
// [324] selection_rest_statement : statement 'else' statement ; '--' '-' ';' '!' '(' '{' '}' '+' ..
// [325] selection_rest_statement : statement ; '--' '-' ';' '!' '(' '{' '}' '+' ..
// 'else' repeated 2 times
states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));
// ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));
states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));
...
#endregion init actions of syntax states
}

举例-优先级指令%nonassoc、%left、%right、%prec

如果我们按最直观的方式书写Calc.st,可能是这样的:

Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number'
点击查看 其LALR(1)语法分析器的状态转换表代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions // 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[9]/*34*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
states[9]/*36*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[10]/*44*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
states[10]/*46*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[11]/*54*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
states[11]/*56*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
states[11]/*58*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
states[11]/*60*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[12]/*64*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
states[12]/*66*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
states[12]/*68*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
states[12]/*70*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states
}

当处于syntaxStates[9]时,如果遇到'+'这个Token,那么本应当Reduce,但是bitParser根据默认的“Shift优先于Reduce”的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。

如果不想将文法改写为本文最初的样式,这里可以用优先级指令声明加减乘除运算的优先级,从而得到正确的语法分析表。

Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ; %%[0-9]+%% 'number' %left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce
%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce
点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states #region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions // 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ; '¥'
// [0] Exp : Exp '+' Exp ; '-' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [5] Exp : 'number' ; '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);
states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);
states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);
states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);
states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);
states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);
states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);
states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);
states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);
states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);
states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);
states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);
states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states }

bitParser中的优先级指令%nonassoc%left%right%prec,与yacc相同:

  • 书写顺序靠后的,优先级更高;

  • %left偏向于Reduce;

  • %right偏向于Shift;

  • %nonassoc表示有语法错误;

  • %prec可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。

举例-/后缀

在Step格式的文件里,1=2中的1应当被认为是一个'entityId',而2应当被认为是一个'refEntity',即对“entity 2”的引用。

如何区别两者呢?当某个数值后面有=时,它就是'entityId',否则就是'refEntity'。此时就需要用/后缀功能,如下所示:

// postfix.st
// regulations:
Items : Items Item | Item ;
Item : 'entityId' '=' 'refEntity' ;
// lexi statements:
%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着“ =”,那么这些数字就是'entityId'
%%[0-9]+%% 'refEntity'
点击查看 生成的 lexi状态0 相关代码
// lexicalState0
private static readonly Action<LexicalContext, char> lexicalState0 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : '=' */
c == '='/*'\u003D'(61)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState2;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
context.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* deal with everything else. */
else { // token with error type
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
// lexicalState2
private static readonly Action<LexicalContext, char> lexicalState2 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Equal符, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
// lexicalState3
private static readonly Action<LexicalContext, char> lexicalState3 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken(st.@refEntity, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
// lexicalState4
private static readonly Action<LexicalContext, char> lexicalState4 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@entityId, context);
context.currentState = lexicalState0;
}
};

下面是二维数组ElseIf[][]形式的状态转换表,它复用了一些ElseIf对象,这进一步减少了空间占用。

点击查看 生成的 二维数组状态转换表 代码
private static readonly ElseIf[][] lexiStates = new ElseIf[5][];
static void InitializeLexiTable() {
ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times
ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times
ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times
lexiStates[0] = new ElseIf[] {
// possible Vt: 'entityId' 'refEntity'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: '='
/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId'
/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*2*/s61_61_0_4,
};
lexiStates[2] = new ElseIf[] {
// possible Vt: '='
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId' 'refEntity'
/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),
// possible Vt: 'entityId'
/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*3*/s61_61_0_4,
// possible Vt: 'refEntity'
/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),
};
}

其miniDFA的状态图如下:

结合代码和状态图容易发现:

  • 当处于miniDFA3状态时,如果遇到空格\t=则可断定下一个Token类型是'entityId'(或错误类型'Error错');否则就是'refEntity'(并返回miniDFA0状态)。这符合我们的预期。

  • 当处于miniDFA0状态(即初始状态)时,如果遇到[0-9],那么下一个Token类型既可能是'entityId',又可能是'refEntity'。此时尚且无法唯一断定之。

也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId''refEntity'的开头[0-9]+是相同的。

我们来对比它的NFA和DFA:

下面是它的NFA状态图:

下面是它的DFA状态图:

可见:

  • 在NFA状态图中,NFA0-0状态在遇到[0-9]时,有两个选择:NFA2-1状态和NFA3-1状态。在对应的DFA状态图中,NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态(DFA2)中。这是通过子集构造法实现的。

这里顺便展示一个构造miniDFA的例子:

// regulations:
Left : 'min' ;
// lexical statements
%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟',',则为'min'

这个'min'的DFA如下:

这个'min'的miniDFA如下:

可见,miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并,是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。

举例-前缀<'Vt'>

在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }中的Point识别为一个“type_name”类型的Token,而不是“identifier”类型的Token。这可以通过前缀来实现。

// ..
struct_specifier :
'struct' 'type_name' '{' struct_declaration_list '}' ;
// .. // lexical statements
%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为“type_name”类型
%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为“identifier”类型

添加前缀<'struct'>不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是“struct”类型:若是,则设置为“type_name”类型,否则,设置为“identifier”类型。

另外,还需要新增一个数组,用于记录已识别出的全部“type_name”类型,以便再次遇到它时,也能够将其设置为“type_name”类型。

点击查看 状态机受前缀影响的部分 代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* [a-zA-Z0-9_] */
else if (/* possible Vt : 'type_name' 'identifier' */
('a' <= c && c <= 'z')
|| ('A' <= c && c <= 'Z')
|| ('0' <= c && c <= '9')
|| (c == '_')) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken2(context
// 如果上一个Token是struct,那么新Token是type_name
, new(/*<'Vt'>*/st.@struct, st.@type_name)
// 否则,新Token是identifier
, new(/*default preVt*/string.Empty, st.@identifier));
context.currentState = lexicalState0;
}
};

与之配套的AcceptToken2(context, ifVts);函数也就复杂些:

点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码
struct IfVt {
public readonly string preVt;
public readonly string Vt;
public IfVt(string preVt, string Vt) {
this.preVt = preVt;
this.Vt = Vt;
}
} private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {
var startIndex = context.analyzingToken.start.index;
var endIndex = context.analyzingToken.end.index;
context.analyzingToken.value = context.sourceCode.Substring(
startIndex, endIndex - startIndex + 1);
var typeSet = false;
const string key = "type_name"; var hadThisTypeName = false;
if (!typeSet) {
if (context.tagDict.TryGetValue(key, out var type_nameList)) {
var list = type_nameList as List<string>;
if (list.Contains(context.analyzingToken.value)) {
// 如果是已识别出的type_name
context.analyzingToken.type = st.type_name;
typeSet = true;
hadThisTypeName = true;
}
}
}
if (!typeSet) {
int lastType = 0;
if (context.lastSyntaxValidToken != null) {
lastType = context.lastSyntaxValidToken.type;
}
for (var i = 0; i < ifVts.Length; i++) {
var ifVt = ifVts[i];
if (ifVt.preVt == 0 // 默认没有前缀
|| ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>
context.analyzingToken.type = ifVt.Vt;
typeSet = true;
break;
}
}
}
if (!typeSet) {
// we failed to assign type according to lexi statements.
// this indicates token error in source code or inappropriate lexi statements.
context.analyzingToken.type = st.Error错;
context.signalCondition = LexicalContext.defaultSignal;
} // cancel forward steps for post-regex
var backStep = context.cursor.index - context.analyzingToken.end.index;
if (backStep > 0) { context.MoveBack(backStep); }
// next operation: context.MoveForward(); var token = context.analyzingToken.Dump();
context.result.Add(token);
// 跳过注释
if (context.analyzingToken.type != st.blockComment
&& context.analyzingToken.type != st.inlineComment) {
context.lastSyntaxValidToken = token;
} if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {
// 将新识别出的type_name加入list
if (!context.tagDict.TryGetValue(key, out var type_nameList)) {
type_nameList = new List<string>();
context.tagDict.Add(key, type_nameList);
}
var list = type_nameList as List<string>;
list.Add(context.analyzingToken.value);
}
}

注意,语法分析不需要blockCommentinlineComment(类型为“注释”的Token),因而在记录上一个Token类型时,我们要跳过注释。

举例-状态信号<signal1, signal2, ..>

在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )中的r1r2都识别为“type_name”类型的Token,而不是“identifier”类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal实现。

状态信号是这样起作用的:

在读到一个“subroutine”类型的Token时,将LexicalContext.signal设置为subroutine0

在读到一个“(”类型的Token时,如果LexicalContext.signalsubroutine0,就将LexicalContext.signal设置为subroutine1

在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*的标识符时,如果LexicalContext.signalsubroutine1(这说明词法分析器刚刚连续读到了'subroutine' '('),就将它识别为“type_name”类型的Token,否则识别为“identifier”类型的Token;

在读到一个“)”类型的Token时,如果LexicalContext.signalsubroutine1,就将LexicalContext.signal设置为default(默认状态),即不再理会状态信号。

storage_qualifier :
| 'subroutine' '(' type_name_list ')' ; // lexical statements
%%subroutine%% 'subroutine' subroutine0
<subroutine0>%%[(]%% '(' subroutine1
<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'
<subroutine1>%%[,]%% ','
<subroutine1>%%[)]%% ')' default

状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。

举例-中文字符

如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00\u9FFF之间),可以这样:

Items : Items Item | Item ;
Item : 'chineseChar' | 'other' ; %%[\u4E00-u9FFF]%% 'chineseChar'
%%[^\u4E00-u9FFF]%% 'other'

End

拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)的更多相关文章

  1. 语法解析器续:case..when..语法解析计算

    之前写过一篇博客,是关于如何解析类似sql之类的解析器实现参考:https://www.cnblogs.com/yougewe/p/13774289.html 之前的解析器,更多的是是做语言的翻译转换 ...

  2. Boost学习之语法解析器--Spirit

    Boost.Spirit能使我们轻松地编写出一个简单脚本的语法解析器,它巧妙利用了元编程并重载了大量的C++操作符使得我们能够在C++里直接使用类似EBNF的语法构造出一个完整的语法解析器(同时也把C ...

  3. 在.NET Core中使用Irony实现自己的查询语言语法解析器

    在之前<在ASP.NET Core中使用Apworks快速开发数据服务>一文的评论部分,.NET大神张善友为我提了个建议,可以使用Compile As a Service的Roslyn为语 ...

  4. 深入理解Java类加载器(1):Java类加载原理解析

    1 基本信息 每个开发人员对Java.lang.ClassNotFoundExcetpion这个异常肯定都不陌生,这背后就涉及到了java技术体系中的类加载.Java的类加载机制是技术体系中比较核心的 ...

  5. 手写token解析器、语法解析器、LLVM IR生成器(GO语言)

    最近开始尝试用go写点东西,正好在看LLVM的资料,就写了点相关的内容 - 前端解析器+中间代码生成(本地代码的汇编.执行则靠LLVM工具链完成) https://github.com/daibinh ...

  6. 用java实现编译器-算术表达式及其语法解析器的实现

    大家在参考本节时,请先阅读以下博文,进行预热: http://blog.csdn.net/tyler_download/article/details/50708807 本节代码下载地址: http: ...

  7. 使用 java 实现一个简单的 markdown 语法解析器

    1. 什么是 markdown Markdown 是一种轻量级的「标记语言」,它的优点很多,目前也被越来越多的写作爱好者,撰稿者广泛使用.看到这里请不要被「标记」.「语言」所迷惑,Markdown 的 ...

  8. Anrlr4 生成C++版本的语法解析器

    一. 写在前面 我最早是在2005年,首次在实际开发中实现语法解析器,当时调研了Yacc&Lex,觉得风格不是太好,关键当时yacc对多线程也支持的不太好,接着就又学习了Bison&F ...

  9. 深入理解Java类加载器(一):Java类加载原理解析

    摘要: 每个开发人员对java.lang.ClassNotFoundExcetpion这个异常肯定都不陌生,这个异常背后涉及到的是Java技术体系中的类加载机制.本文简述了JVM三种预定义类加载器,即 ...

  10. Kafka设计解析(四)Kafka Consumer设计解析

    转载自 技术世界,原文链接 Kafka设计解析(四)- Kafka Consumer设计解析 目录 一.High Level Consumer 1. Consumer Group 2. High Le ...

随机推荐

  1. Java 网络编程----初探Servlet

    Jave Web是java面向web开发的相关技术,他是相关技术的统称,并不是指某一个单一的技术.在我之前的博客中(Java网络编程----通过实现简易聊天工具来聊聊BIO模型 https://www ...

  2. OSG开发笔记(三十二):深入理解相机视口、制作支持与主视图同步变换旋转的相机HUD

    前言   深入理解相机视口,摸索相机视口旋转功能,背景透明或者不透明.  本篇,实现了一个左下角旋转HUD且背景透明的相机视口.   Demo                  HUD相机的坐标    ...

  3. TypeScript名词解释系列:tsconfg中的target,module和moduleResolution

    tsconfg中的target,module和moduleResolution target 就是TypeScript文件编译后生成的javascript文件里的语法应该遵循哪个JavaScript的 ...

  4. Spring MVC 3.2 技术预览(三):动手写一个异步Controller方法

    原文地址:http://blog.springsource.org/2012/05/10/spring-mvc-3-2-preview-making-a-controller-method-async ...

  5. 超详细 HarmonyOS 开发教程之开发环境搭建指南

    HarmonyOS开发环境搭建指南:DevEco Studio安装教程 一.系统要求 操作系统:Windows 10 64位或更高版本 RAM:至少8GB,推荐16GB 硬盘空间:至少10GB可用空间 ...

  6. 案例 | 销讯通加持药企SFE部门效能提升

    为了获取更大的市场空间,医药健康行业正迎来一波前所未有的产业升级.尽管不少企业取得了许多成绩,但仍面临诸多挑战. 江苏某制药公司在心脑血管.中枢神经.胃肠内科.心脏科.内分泌科.皮肤科和风湿科等领域均 ...

  7. vite2+vue3使用tsx报错React is not defined、h is not defined

    vite 为 .jsx 和 .tsx 文件提供开箱即用支持. 如果不是在 react 中使用 jsx,对于报错: React is not defined 需要在 vite.config.js 文件中 ...

  8. 解析JDBC使用查询MySQL【非流式、流式、游标】

    解析JDBC使用游标查询MySQL 使用jdbc查询MySQL数据库,如果使用游标或者流式查询的话,则可以有效解决OOM的问题,否则MySQL驱动就会把数据集全部查询出来加载到内存里面,这样在大数据的 ...

  9. 【Android】谷歌应用关机闹钟 PowerOffAlarm 源码分析,并实现定时开、关机

    前言 RTC RTC 即实时时钟(Real-Time Clock),主要是功能有: 时间保持:RTC可以在断电的时候,仍然保持计时功能,保证时间的连续性 时间显示与设置:RTC可以向系统提供年.月.日 ...

  10. ajax请求与前后端交互的数据编码格式

    目录 一.Ajax AJAX简介 应用场景 AJAX的优点 语法实现 二.数据编码格式(Content-Type) 写在前面 form表单 几种数据编码格式介绍 三.ajax携带文件数据 四.ajax ...