拥有自己的解析器（C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器）

参考lex和yacc的输入格式，参考虎书《现代编译原理-C语言描述》的算法，不依赖第三方库，大力整合优化，实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器（暂命名为bitParser）。

可在（https://gitee.com/bitzhuwei/bitParser-demos）下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。

（https://www.cnblogs.com/bitzhuwei/p/18544785）列示了实现词法分析器和语法分析器的生成器的全部算法。

1234+567+89+0+0的语法树生成过程☟

(1234+567)/(89-0)的语法树生成过程☟

可在（https://www.cnblogs.com/bitzhuwei/p/18679529）查看更多语法树的生成过程。（想看自定义表达式的语法树生成过程的同学请留言）

词法分析器的生成器

分别生成DFA和最小化DFA（以下简称miniDFA）的词法分析器代码（状态转换表、保留字、Token类型等）
支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>，便于识别id = refId、struct type_name、<Comment>[^*\n]*、subroutine(type_name_list)这类情况。类似lex，但不完全相同。
无须显式书写Token类型、状态信号、保留字，即无须lex中的%s NUM ID ..、%x Comment End Text ..、[(] { return ('('); }。
注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型（和signal）等。
生成ε-NFA、NFA、DFA、miniDFA的状态图（mermaid格式）和各类型Token的状态图（mermaid格式），便于学习和调试。

语法分析器的生成器

分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码（分析表、规则列表、语法树结点类型等）。
支持优先级指令%nonassoc、%left、%right、%prec，自动解决Shift/Reduce、Reduce/Reduce冲突，并在分析表代码的注释中列示之。
注释详尽。在注释中列示：每个语法状态的LR项和lookahead；冲突数量、已解决数量、未解决数量等。
生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图（mermaid格式）和状态表（markdown格式），便于学习和调试。
生成nullable、FIRST集、FOLLOW集的文档。

其他

无须lex和yacc中的%option、%union、%define、%{、%}、%parse-param、%lex-param、%pure-parser、%expect、%token、%type。做成类库，直接按如下方式调用即可：

var compiler = new CompilerXxx();

var sourceCode = File.ReadAllText("input.st");

var tokens = compiler.Analyze(sourceCode);

var syntaxTree = compiler.Parse(tokens);

var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);

// use extractedObj for user-defined business ..

支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**/和//，可自定义其格式，例如：在解析VRML文件时，将单行注释的格式定义为从#到行尾：

%%#[^\r\n]*%% 'inlineComment'

生成遍历语法树提取语义信息的框架，提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。
大力优化，例如生成ANSI C语言的全部解析器代码+文档只需3秒，生成GLSL4.60.8的全部解析器代码+文档只需9秒。

点击查看其他功能

支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars，默认范围均为[\u0001-\uFFFF]（即除'\0'外的全部Unicode字符），可自定义其范围。
支持%omit指令，可指定要忽略的空白符。默认为'空格'、'\t'、'\n'、'\r'、'\0'。
支持%start指定起始语法结点。

举例-Calc.st

输入文件Calc.st

能够处理加减乘除和括号运算的解析器，其文法如下：

// 输入文件Calc.st

Exp    : Exp '+' Term

       | Exp '-' Term

       | Term ;

Term   : Term '*' Factor

       | Term '/' Factor

       | Factor ;

Factor : '(' Exp ')'

       | 'number' ;

%%[0-9]+%% 'number' // 示例只处理非负整数

//无须书写 %%[+]%% '+' 等

据此文法，我们可以生成下述内容：

生成的词法分析器

生成的ε-NFA的词法分析器的状态图如下：

生成的miniDFA的词法分析器的状态图如下：

点击查看生成的终结点Vt和非终结点Vn 代码

// 如不需要，可删除此数组

public static readonly IReadOnlyList<string> stArray = new string[] {

    "'¥'", // @终 = 0;

    "'+'", // @Plus符 = 1; // '+'

    "'-'", // @Dash符 = 2; // '-'

    "'*'", // @Asterisk符 = 3; // '*'

    "'/'", // @Slash符 = 4; // '/'

    "'('", // @LeftParenthesis符 = 5; // '('

    "')'", // @RightParenthesis符 = 6; // ')'

    "'number'", // @number = 7; // 'number'

    // end of 1 + 7 Vt

    "Exp", // Exp枝 = 8; // Exp

    "Term", // Term枝 = 9; // Term

    "Factor", // Factor枝 = 10; // Factor

    // end of 3 Vn

};

/// <summary>

/// Vt types are used both for lexical-analyze and syntax-parse.

/// <para>Vn types are only for syntax-parse.</para>

/// <para>Vt is quoted in ''.</para>

/// <para>Vn is not quoted in ''.</para>

/// </summary>

public static class st {

    // Vt

    /// <summary>

    /// Something wrong within the source code.

    /// </summary>

    public const int Error错 = -1; // "'×'";

    /// <summary>

    /// end of token list.

    /// </summary>

    public const int @终 = 0; // "'¥'";

    /// <summary>

    /// '+'

    /// </summary>

    public const int @Plus符 = 1; // "'+'"

    /// <summary>

    /// '-'

    /// </summary>

    public const int @Dash符 = 2; // "'-'"

    /// <summary>

    /// '*'

    /// </summary>

    public const int @Asterisk符 = 3; // "'*'"

    /// <summary>

    /// '/'

    /// </summary>

    public const int @Slash符 = 4; // "'/'"

    /// <summary>

    /// '('

    /// </summary>

    public const int @LeftParenthesis符 = 5; // "'('"

    /// <summary>

    /// ')'

    /// </summary>

    public const int @RightParenthesis符 = 6; // "')'"

    /// <summary>

    /// 'number'

    /// </summary>

    public const int @number = 7; // "'number'"

    /// <summary>

    /// count of ('¥' + user-defined Vt)

    /// </summary>

    public const int VtCount = 8;

    // Vn

    /// <summary>

    /// Exp

    /// </summary>

    public const int Exp枝 = 8; // "Exp"

    /// <summary>

    /// Term

    /// </summary>

    public const int Term枝 = 9; // "Term"

    /// <summary>

    /// Factor

    /// </summary>

    public const int Factor枝 = 10; // "Factor"

}

点击查看生成的保留字(即一个语言中的keyword) 相关代码

public static class reservedWord {

    /// <summary>

    /// +

    /// </summary>

    public const string @Plus符 = "+";

    /// <summary>

    /// -

    /// </summary>

    public const string @Dash符 = "-";

    /// <summary>

    /// *

    /// </summary>

    public const string @Asterisk符 = "*";

    /// <summary>

    /// /

    /// </summary>

    public const string @Slash符 = "/";

    /// <summary>

    /// (

    /// </summary>

    public const string @LeftParenthesis符 = "(";

    /// <summary>

    /// )

    /// </summary>

    public const string @RightParenthesis符 = ")";

}

/// <summary>

/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.

/// <para>otherwise, return false.</para>

/// </summary>

/// <param name="token"></param>

/// <returns></returns>

private static bool CheckReservedWord(AnalyzingToken token) {

    bool isReservedWord = true;

    switch (token.value) {

    case reservedWord.@Plus符: token.type = st.@Plus符; break;

    case reservedWord.@Dash符: token.type = st.@Dash符; break;

    case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;

    case reservedWord.@Slash符: token.type = st.@Slash符; break;

    case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;

    case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break;

    default: isReservedWord = false; break;

    }

    return isReservedWord;

}

以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表。

点击查看生成的 lexi状态0 相关代码

// lexicalState0

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* [0-9] */

	else if (/* possible Vt : 'number' */

	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState1;

	}

	/* \) */

	else if (/* possible Vt : ')' */

	c == ')'/*'\u0029'(41)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState2;

	}

	/* \( */

	else if (/* possible Vt : '(' */

	c == '('/*'\u0028'(40)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState3;

	}

	/* \/ */

	else if (/* possible Vt : '/' */

	c == '/'/*'\u002F'(47)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState4;

	}

	/* \* */

	else if (/* possible Vt : '*' */

	c == '*'/*'\u002A'(42)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState5;

	}

	/* - */

	else if (/* possible Vt : '-' */

	c == '-'/*'\u002D'(45)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState6;

	}

	/* \+ */

	else if (/* possible Vt : '+' */

	c == '+'/*'\u002B'(43)*/) {

		BeginToken(context);

		ExtendToken(context);

		wrapper.currentState = lexicalState7;

	}

	/* deal with everything else. */

	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {

		wrapper.currentState = lexicalState0; // skip them.

	}

	else { // unexpected char.

		BeginToken(context);

		ExtendToken(context);

		AcceptToken(st.Error, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态1 相关代码

// lexicalState1

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* [0-9] */

	else if (/* possible Vt : 'number' */

	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {

		ExtendToken(context);

		wrapper.currentState = lexicalState1;

	}

	/* deal with everything else. */

	else {

		AcceptToken(st.@number, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态2 相关代码

// lexicalState2

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@RightParenthesis符, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态3 相关代码

// lexicalState3

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@LeftParenthesis符, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态4 相关代码

// lexicalState4

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@Slash符, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态5 相关代码

// lexicalState5

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@Asterisk符, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态6 相关代码

// lexicalState6

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@Dash符, context);

		wrapper.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态7 相关代码

// lexicalState7

private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =

static (context, c, wrapper) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@Plus符, context);

		wrapper.currentState = lexicalState0;

	}

};

其调用过程如下：

// analyze the specified sourceCode and return a list of Token.

public TokenList Analyze(string sourceCode) {

    var context = new LexicalContext(sourceCode);

    var wrapper = new CurrentStateWrapper(this.initialState);

    do {

		char currentChar = context.MoveForward();

        wrapper.currentState(context, currentChar, wrapper);

        // wrapper.currentState will be set to next lexi-state.

    } while (!context.EOF);

    return context.result;

}

下面是用一个二维数组ElseIf[][]实现的词法状态转换表，其占用空间较少，且执行效率也有所提高。

private static readonly ElseIf[][] lexiStates = new ElseIf[8][];

static void InitializeLexiTable() {

	lexiStates[0] = new ElseIf[] {

	// possible Vt: '('

	/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),

	// possible Vt: ')'

	/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),

	// possible Vt: '*'

	/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),

	// possible Vt: '+'

	/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),

	// possible Vt: '-'

	/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),

	// possible Vt: '/'

	/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),

	// possible Vt: 'number'

	/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),

	};

	lexiStates[1] = new ElseIf[] {

	// possible Vt: 'number'

	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),

	// possible Vt: 'number'

	/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),

	};

	lexiStates[2] = new ElseIf[] {

	// possible Vt: ')'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),

	};

	lexiStates[3] = new ElseIf[] {

	// possible Vt: '('

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),

	};

	lexiStates[4] = new ElseIf[] {

	// possible Vt: '/'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),

	};

	lexiStates[5] = new ElseIf[] {

	// possible Vt: '*'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),

	};

	lexiStates[6] = new ElseIf[] {

	// possible Vt: '-'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),

	};

	lexiStates[7] = new ElseIf[] {

	// possible Vt: '+'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),

	};

}

顾名思义，一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样，就用一个ElseIf[]取代了一个函数委托；而且在调用此表时可以用折半查找方式快速定位ElseIf。

点击查看调用二维数组实现的词法分析器相关代码

// skip '\0' at lexi-state 0

private static readonly ElseIf skipZero = new(

    char.MinValue, char.MaxValue, Acts.None,

    nextState: 0);

// construct a error token at lexi-state 0

private static readonly ElseIf unexpectedChar = new(

    char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,

    nextState: 0, -1);// -1 means error("'×'");

// construct a error token at other lexi-states

private static readonly ElseIf errorToken = new(

    char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,

    nextState: 0, -1);// -1 means error("'×'");

// analyze the specified sourceCode and return a list of Token.

public TokenList Analyze(string sourceCode) {

    var context = new LexicalContext(sourceCode);

    var currentStateId = 0;

    do {

        // read current char,

        char currentChar = context.MoveForward();

        ElseIf[] lexiState = lexiStates[currentStateId];

        // binary search for the segment( else if (left <= c && c <= right) { ... } )

        var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);

        if (segment is null) {

            if (currentStateId == 0) { // the initial state of lexical analyze.

                segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);

                if (segment is null) {

                    // '\0' must be skipped.

                    if (currentChar == '\0') { segment = skipZero; }

                    else { segment = unexpectedChar; }

                }

            }

            else { // token with error type

                segment = errorToken;

            }

        }

        // construct the next token,

        var scripts = segment.scripts;

        if (scripts != 0) { // it is 0 in most cases.

            if ((scripts & Acts.Begin) != 0) {

                this.beginToken(context);

            }

            if ((scripts & Acts.Extend) != 0) {

                this.extendToken(context);

            }

            if ((scripts & Acts.Accept) != 0) {

                this.acceptToken(context, segment.Vt);

            }

            else if ((scripts & Acts.Accept2) != 0) {

                this.acceptToken2(context, segment.ifVts);

            }

        }

        // point to next state.

        currentStateId = segment.nextStateId;

    } while (!context.EOF);

    return context.result;

}

private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {

    var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);

    if (right < 0) { }

    else {

        var result = -1;

        while (left < right) {

            var mid = (left + right) / 2;

            var current = lexiState[mid];

            if (currentChar < current.min) { result = -1; }

            else if (current.max < currentChar) { result = 1; }

            else { return current; }

            if (result < 0) { right = mid; }

            else { left = mid + 1; }

        }

        {

            var current = lexiState[left];

            if (current.min <= currentChar && currentChar <= current.max) {

                return current;

            }

        }

    }

    if (specialLast) {

        var last = lexiState[lexiState.Length - 1];

        return last;

        // no need to compare, because it's ['\0', '\uFFFF']

        //if (last.min <= currentChar && currentChar <= last.max) {

        //    return last;

        //}

    }

    return null;

}

生成的语法分析器

点击查看 nullable、FIRST集、FOLLOW集

nullable：

[0]: nullable( Exp' ) = False

[1]: nullable( Exp ) = False

[2]: nullable( Term ) = False

[3]: nullable( Factor ) = False

[4]: nullable( '¥' ) = False

[5]: nullable( '+' ) = False

[6]: nullable( '-' ) = False

[7]: nullable( '*' ) = False

[8]: nullable( '/' ) = False

[9]: nullable( '(' ) = False

[10]: nullable( ')' ) = False

[11]: nullable( 'number' ) = False

FIRST集：

[0]: FIRST( Exp' ) = { '(' 'number' }

[1]: FIRST( Exp ) = { '(' 'number' }

[2]: FIRST( Term ) = { '(' 'number' }

[3]: FIRST( Factor ) = { '(' 'number' }

[4]: FIRST( '¥' ) = { '¥' }

[5]: FIRST( '+' ) = { '+' }

[6]: FIRST( '-' ) = { '-' }

[7]: FIRST( '*' ) = { '*' }

[8]: FIRST( '/' ) = { '/' }

[9]: FIRST( '(' ) = { '(' }

[10]: FIRST( ')' ) = { ')' }

[11]: FIRST( 'number' ) = { 'number' }

[12]: FIRST( Exp '+' Term ) = { '(' 'number' }

[13]: FIRST( Exp '-' Term ) = { '(' 'number' }

[14]: FIRST( Term '*' Factor ) = { '(' 'number' }

[15]: FIRST( Term '/' Factor ) = { '(' 'number' }

[16]: FIRST( '(' Exp ')' ) = { '(' }

FOLLOW集：

[0]: FOLLOW( Exp' ) = { '¥' }

[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }

[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }

[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }

点击查看生成的规约规则代码

public static IReadOnlyList<Regulation> Regulations => regulations;

private static readonly Regulation[] regulations = new Regulation[] {

	// [0]=Exp : Exp '+' Term ;

	new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝),

	// [1]=Exp : Exp '-' Term ;

	new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝),

	// [2]=Exp : Term ;

	new(2, st.Exp枝, st.Term枝),

	// [3]=Term : Term '*' Factor ;

	new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝),

	// [4]=Term : Term '/' Factor ;

	new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝),

	// [5]=Term : Factor ;

	new(5, st.Term枝, st.Factor枝),

	// [6]=Factor : '(' Exp ')' ;

	new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符),

	// [7]=Factor : 'number' ;

	new(7, st.Factor枝, st.@number),

};

点击查看生成的 LALR(1)语法分析表代码

const int syntaxStateCount = 16;

// LALR(1) syntax parse table

private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]

    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];

private static void InitializeSyntaxStates() {

	var states = CompilerExp.syntaxStates;

	// 78 actions

	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)

	#region create objects of syntax states

	states[0] = new(capacity: 5);

	states[1] = new(capacity: 3);

	states[2] = new(capacity: 6);

	states[3] = new(capacity: 6);

	states[4] = new(capacity: 5);

	states[5] = new(capacity: 6);

	states[6] = new(capacity: 4);

	states[7] = new(capacity: 4);

	states[8] = new(capacity: 3);

	states[9] = new(capacity: 3);

	states[10] = new(capacity: 3);

	states[11] = new(capacity: 6);

	states[12] = new(capacity: 6);

	states[13] = new(capacity: 6);

	states[14] = new(capacity: 6);

	states[15] = new(capacity: 6);

	#endregion create objects of syntax states

	#region re-used actions

	LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times

	LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times

	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times

	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times

	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times

	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times

	LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times

	LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times

	LRParseAction aReduce2 = new(regulations[2]);// refered 4 times

	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times

	LRParseAction aReduce7 = new(regulations[7]);// refered 6 times

	LRParseAction aReduce0 = new(regulations[0]);// refered 4 times

	LRParseAction aReduce1 = new(regulations[1]);// refered 4 times

	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times

	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times

	LRParseAction aReduce6 = new(regulations[6]);// refered 6 times

	#endregion re-used actions

	// 78 actions

	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)

	#region init actions of syntax states

	// syntaxStates[0]:

	// [-1] Exp' :  Exp ; '¥'

	// [0] Exp :  Exp '+' Term ; '-' '+' '¥'

	// [1] Exp :  Exp '-' Term ; '-' '+' '¥'

	// [2] Exp :  Term ; '-' '+' '¥'

	// [3] Term :  Term '*' Factor ; '-' '*' '/' '+' '¥'

	// [4] Term :  Term '/' Factor ; '-' '*' '/' '+' '¥'

	// [5] Term :  Factor ; '-' '*' '/' '+' '¥'

	// [6] Factor :  '(' Exp ')' ; '-' '*' '/' '+' '¥'

	// [7] Factor :  'number' ; '-' '*' '/' '+' '¥'

	/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));

	/*1*/states[0].Add(st.Term枝, aGoto2);

	/*2*/states[0].Add(st.Factor枝, aGoto3);

	/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);

	/*4*/states[0].Add(st.@number, aShift5);

	// syntaxStates[1]:

	// [-1] Exp' : Exp  ; '¥'

	// [0] Exp : Exp  '+' Term ; '-' '+' '¥'

	// [1] Exp : Exp  '-' Term ; '-' '+' '¥'

	/*5*/states[1].Add(st.@Plus符, aShift6);

	/*6*/states[1].Add(st.@Dash符, aShift7);

	/*7*/states[1].Add(st.@终, LRParseAction.accept);

	// syntaxStates[2]:

	// [2] Exp : Term  ; '-' ')' '+' '¥'

	// [3] Term : Term  '*' Factor ; '-' ')' '*' '/' '+' '¥'

	// [4] Term : Term  '/' Factor ; '-' ')' '*' '/' '+' '¥'

	/*8*/states[2].Add(st.@Asterisk符, aShift8);

	/*9*/states[2].Add(st.@Slash符, aShift9);

	/*10*/states[2].Add(st.@Dash符, aReduce2);

	/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);

	/*12*/states[2].Add(st.@Plus符, aReduce2);

	/*13*/states[2].Add(st.@终, aReduce2);

	// syntaxStates[3]:

	// [5] Term : Factor  ; '-' ')' '*' '/' '+' '¥'

	/*14*/states[3].Add(st.@Dash符, aReduce5);

	/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);

	/*16*/states[3].Add(st.@Asterisk符, aReduce5);

	/*17*/states[3].Add(st.@Slash符, aReduce5);

	/*18*/states[3].Add(st.@Plus符, aReduce5);

	/*19*/states[3].Add(st.@终, aReduce5);

	// syntaxStates[4]:

	// [6] Factor : '('  Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Term ; '-' ')' '+'

	// [1] Exp :  Exp '-' Term ; '-' ')' '+'

	// [2] Exp :  Term ; '-' ')' '+'

	// [3] Term :  Term '*' Factor ; '-' ')' '*' '/' '+'

	// [4] Term :  Term '/' Factor ; '-' ')' '*' '/' '+'

	// [5] Term :  Factor ; '-' ')' '*' '/' '+'

	// [6] Factor :  '(' Exp ')' ; '-' ')' '*' '/' '+'

	// [7] Factor :  'number' ; '-' ')' '*' '/' '+'

	/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));

	/*21*/states[4].Add(st.Term枝, aGoto2);

	/*22*/states[4].Add(st.Factor枝, aGoto3);

	/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);

	/*24*/states[4].Add(st.@number, aShift5);

	// syntaxStates[5]:

	// [7] Factor : 'number'  ; '-' ')' '*' '/' '+' '¥'

	/*25*/states[5].Add(st.@Dash符, aReduce7);

	/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);

	/*27*/states[5].Add(st.@Asterisk符, aReduce7);

	/*28*/states[5].Add(st.@Slash符, aReduce7);

	/*29*/states[5].Add(st.@Plus符, aReduce7);

	/*30*/states[5].Add(st.@终, aReduce7);

	// syntaxStates[6]:

	// [0] Exp : Exp '+'  Term ; '-' ')' '+' '¥'

	// [3] Term :  Term '*' Factor ; '-' ')' '*' '/' '+' '¥'

	// [4] Term :  Term '/' Factor ; '-' ')' '*' '/' '+' '¥'

	// [5] Term :  Factor ; '-' ')' '*' '/' '+' '¥'

	// [6] Factor :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [7] Factor :  'number' ; '-' ')' '*' '/' '+' '¥'

	/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));

	/*32*/states[6].Add(st.Factor枝, aGoto3);

	/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);

	/*34*/states[6].Add(st.@number, aShift5);

	// syntaxStates[7]:

	// [1] Exp : Exp '-'  Term ; '-' ')' '+' '¥'

	// [3] Term :  Term '*' Factor ; '-' ')' '*' '/' '+' '¥'

	// [4] Term :  Term '/' Factor ; '-' ')' '*' '/' '+' '¥'

	// [5] Term :  Factor ; '-' ')' '*' '/' '+' '¥'

	// [6] Factor :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [7] Factor :  'number' ; '-' ')' '*' '/' '+' '¥'

	/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));

	/*36*/states[7].Add(st.Factor枝, aGoto3);

	/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);

	/*38*/states[7].Add(st.@number, aShift5);

	// syntaxStates[8]:

	// [3] Term : Term '*'  Factor ; '-' ')' '*' '/' '+' '¥'

	// [6] Factor :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [7] Factor :  'number' ; '-' ')' '*' '/' '+' '¥'

	/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));

	/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);

	/*41*/states[8].Add(st.@number, aShift5);

	// syntaxStates[9]:

	// [4] Term : Term '/'  Factor ; '-' ')' '*' '/' '+' '¥'

	// [6] Factor :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [7] Factor :  'number' ; '-' ')' '*' '/' '+' '¥'

	/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));

	/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);

	/*44*/states[9].Add(st.@number, aShift5);

	// syntaxStates[10]:

	// [6] Factor : '(' Exp  ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Term ; '-' ')' '+'

	// [1] Exp : Exp  '-' Term ; '-' ')' '+'

	/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));

	/*46*/states[10].Add(st.@Plus符, aShift6);

	/*47*/states[10].Add(st.@Dash符, aShift7);

	// syntaxStates[11]:

	// [0] Exp : Exp '+' Term  ; '-' ')' '+' '¥'

	// [3] Term : Term  '*' Factor ; '-' ')' '*' '/' '+' '¥'

	// [4] Term : Term  '/' Factor ; '-' ')' '*' '/' '+' '¥'

	/*48*/states[11].Add(st.@Asterisk符, aShift8);

	/*49*/states[11].Add(st.@Slash符, aShift9);

	/*50*/states[11].Add(st.@Dash符, aReduce0);

	/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);

	/*52*/states[11].Add(st.@Plus符, aReduce0);

	/*53*/states[11].Add(st.@终, aReduce0);

	// syntaxStates[12]:

	// [1] Exp : Exp '-' Term  ; '-' ')' '+' '¥'

	// [3] Term : Term  '*' Factor ; '-' ')' '*' '/' '+' '¥'

	// [4] Term : Term  '/' Factor ; '-' ')' '*' '/' '+' '¥'

	/*54*/states[12].Add(st.@Asterisk符, aShift8);

	/*55*/states[12].Add(st.@Slash符, aShift9);

	/*56*/states[12].Add(st.@Dash符, aReduce1);

	/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);

	/*58*/states[12].Add(st.@Plus符, aReduce1);

	/*59*/states[12].Add(st.@终, aReduce1);

	// syntaxStates[13]:

	// [3] Term : Term '*' Factor  ; '-' ')' '*' '/' '+' '¥'

	/*60*/states[13].Add(st.@Dash符, aReduce3);

	/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);

	/*62*/states[13].Add(st.@Asterisk符, aReduce3);

	/*63*/states[13].Add(st.@Slash符, aReduce3);

	/*64*/states[13].Add(st.@Plus符, aReduce3);

	/*65*/states[13].Add(st.@终, aReduce3);

	// syntaxStates[14]:

	// [4] Term : Term '/' Factor  ; '-' ')' '*' '/' '+' '¥'

	/*66*/states[14].Add(st.@Dash符, aReduce4);

	/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);

	/*68*/states[14].Add(st.@Asterisk符, aReduce4);

	/*69*/states[14].Add(st.@Slash符, aReduce4);

	/*70*/states[14].Add(st.@Plus符, aReduce4);

	/*71*/states[14].Add(st.@终, aReduce4);

	// syntaxStates[15]:

	// [6] Factor : '(' Exp ')'  ; '-' ')' '*' '/' '+' '¥'

	/*72*/states[15].Add(st.@Dash符, aReduce6);

	/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);

	/*74*/states[15].Add(st.@Asterisk符, aReduce6);

	/*75*/states[15].Add(st.@Slash符, aReduce6);

	/*76*/states[15].Add(st.@Plus符, aReduce6);

	/*77*/states[15].Add(st.@终, aReduce6);

	#endregion init actions of syntax states

}

生成的LALR(1)语法分析器的状态图和状态表如下：

由于mermaid支持的字符数量有限，往往不能完全显示一个语言（例如C语言）的全部语法状态及其LR项+lookahead，我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。

状态	'+'	'-'	'*'	'/'	'('	')'	'number'	'¥'	Exp	Term	Factor
0					S4		S5		G1	G2	G3
1	S6	S7
2	R[2]	R[2]	S8	S9		R[2]		R[2]
3	R[5]	R[5]	R[5]	R[5]		R[5]		R[5]
4					S4		S5		G10	G2	G3
5	R[7]	R[7]	R[7]	R[7]		R[7]		R[7]
6					S4		S5			G11	G3
7					S4		S5			G12	G3
8					S4		S5				G13
9					S4		S5				G14
10	S6	S7				S15
11	R[0]	R[0]	S8	S9		R[0]		R[0]
12	R[1]	R[1]	S8	S9		R[1]		R[1]
13	R[3]	R[3]	R[3]	R[3]		R[3]		R[3]
14	R[4]	R[4]	R[4]	R[4]		R[4]		R[4]
15	R[6]	R[6]	R[6]	R[6]		R[6]		R[6]

上表中：

S6表示Shift并进入状态6；
R[2]表示用regulations[2]=Exp : Term ;规约；
G1表示进入状态1；
表示Accept，即分析完毕，语法树生成成功；
空白的地方表示遇到语法错误。

点击查看调用语法分析表代码

public SyntaxTree Parse(TokenList tokenList) {

    var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);

    var accept = false;

    do {

        Token token = context.CurrentToken;

        int nodeType = token.type;// auto-convert from string to string.

        while (nodeType == blockComment || nodeType == inlineComment) {

            // skip comment token

            context.cursor++;

            token = context.CurrentToken;

            nodeType = token.type;// auto-convert from string to string.

        }

        Dictionary<int/*Node.type*/, LRParseAction> currentState =

            context.stateStack.Peek();

        if (currentState.TryGetValue(nodeType, out var parseAction)) {

            parseAction.Execute(context);

            accept = parseAction.kind == LRParseAction.Kind.Accept;

        }

        else { // syntax error happened.

            return new SyntaxTree(context);

        }

    } while (!accept);

    var root = context.root;

    Debug.Assert(root != null);

    return new SyntaxTree(root);

}

public class LRParseAction {

    public enum Kind {

        Error,

        Shift,

        Reduce,

        Goto,

        Accept,

    }

    public readonly Kind kind;

    [StructLayout(LayoutKind.Explicit)]

    struct Union {

        [FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;

        [FieldOffset(0)] public Regulation regulation;

    }

    private Union union;

    // 执行分析动作。

    public void Execute(LRSyntaxContext context) {

        switch (this.kind) {

        case Kind.Error: { throw new NotImplementedException(); }

        //break;

        case Kind.Shift: {

            var token = context.CurrentToken;

            var leaf = new LRNode(token);

            context.nodeStack.Push(leaf);

            var nextState = this.union.nextState;

            context.stateStack.Push(nextState);

            // prepare for next loop.

            context.cursor++;

        }

        break;

        case Kind.Reduce: {

            var regulation = this.union.regulation;

            int count = regulation.right.Length;

            var children = new LRNode[count];

            var start = Token.empty; LRNode? lastNode = null;

            var first = true;

            for (int i = 0; i < count; i++) {

                var state = context.stateStack.Pop();//只弹出，不再使用。

                var node = context.nodeStack.Pop();

                children[count - i - 1] = node;

                if (node.start.index >= 0) { // this node includes token

                    if (first) { lastNode = node; first = false; }

                    start = node.start;

                }

            }

            int tokenCount = 0;

            if (lastNode is not null) {

                tokenCount =

                    lastNode.start.index   // comment tokens inside of parent are included.

                    - start.index          // comment tokens before parent are excluded.

                    + lastNode.tokenCount; // comment tokens after parent are excluded.

            }

            var parent = new LRNode(regulation, start, tokenCount, children);

            for (var i = 0; i < count; i++) { children[i].parent = parent; }

            context.nodeStack.Push(parent);

            // goto next syntax-state

            Dictionary<int/*Node.type*/, LRParseAction> currentState =

               context.stateStack.Peek();

            var nodeType = regulation.left;

            if (currentState.TryGetValue(nodeType, out var parseAction)) {

                parseAction.Execute(context); // parseAction is supposed to be a Goto action

            }

            Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);

        }

        break;

        case Kind.Goto: {

            var nextState = this.union.nextState;

            context.stateStack.Push(nextState);

        }

        break;

        case Kind.Accept: {

            var state = context.stateStack.Pop();

            context.root = context.nodeStack.Pop();

            context.Finish(context.root, 40, context.stArray);

        }

        break;

        default: { throw new NotImplementedException(); }

        }

    }

}

生成的语义内容提取器

举例来说，对于1234+567+89+0+0这个输入，Calc.st经过语法分析得到的语法树如下所示：（语法树的生成过程见本文开头）

R[0]=Exp : Exp '+' Term ;T[0->8]

 ├─R[0]=Exp : Exp '+' Term ;T[0->6]

 │  ├─R[0]=Exp : Exp '+' Term ;T[0->4]

 │  │  ├─R[0]=Exp : Exp '+' Term ;T[0->2]

 │  │  │  ├─R[2]=Exp : Term ;T[0]

 │  │  │  │  └─R[5]=Term : Factor ;T[0]

 │  │  │  │     └─R[7]=Factor : 'number' ;T[0]

 │  │  │  │        └─T[0]='number' 1234

 │  │  │  ├─T[1]='+' +

 │  │  │  └─R[5]=Term : Factor ;T[2]

 │  │  │     └─R[7]=Factor : 'number' ;T[2]

 │  │  │        └─T[2]='number' 567

 │  │  ├─T[3]='+' +

 │  │  └─R[5]=Term : Factor ;T[4]

 │  │     └─R[7]=Factor : 'number' ;T[4]

 │  │        └─T[4]='number' 89

 │  ├─T[5]='+' +

 │  └─R[5]=Term : Factor ;T[6]

 │     └─R[7]=Factor : 'number' ;T[6]

 │        └─T[6]='number' 0

 ├─T[7]='+' +

 └─R[5]=Term : Factor ;T[8]

    └─R[7]=Factor : 'number' ;T[8]

       └─T[8]='number' 0

从左上到右下，连续4个R[0]显著增加了树的深度。

bitParser自动生成的语义内容提取器，会后序遍历此语法树，提取结点信息。

点击查看通用的遍历语法树相关代码

// Extract some data structure from syntax tree.

// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>

public T? Extract(LRNode root, TokenList tokens, string sourceCode) {

	var context = new TContext<T>(root, tokens, sourceCode);

	var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();

	// init stack.

	{

		// push nextLeft and its next pending children.

		var nextLeft = root; var index = 0;

		nodeStack.Push(nextLeft); indexStack.Push(index);

		while (nextLeft.children.Count > 0) {

			nextLeft = nextLeft.children[0];

			nodeStack.Push(nextLeft); indexStack.Push(0);

		}

	}

	while (nodeStack.Count > 0) {

		var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;

		if (index < current.children.Count) {

			// push this node back again.

			nodeStack.Push(current); indexStack.Push(index);

			// push nextLeft and its next pending children.

			var nextLeft = current.children[index];

			nodeStack.Push(nextLeft); indexStack.Push(0);

			while (nextLeft.children.Count > 0) {

				nextLeft = nextLeft.children[0];

				nodeStack.Push(nextLeft); indexStack.Push(0);

			}

		}

		else {

			// Visit(current);

			if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {

				action(current, context);

			}

		}

	}

	{

		var current = this.EOT;

		// Visit(current);

		if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {

			action(current, context);

		}

	}

	return context.result;

}

点击查看生成的在遍历语法树时提取结点信息相关代码

private static readonly Dictionary<int/*LRNode.type*/,

    Action<LRNode, TContext<Exp>>> @expExtractorDict = new();

private static readonly Action<LRNode, TContext<Exp>> VtHandler =

(node, context) => {

    var token = node.start;

    context.objStack.Push(token);

};

// initialize dict for extractor.

private static void InitializeExtractorDict() {

    var extractorDict = @expExtractorDict;

    extractorDict.Add(st.@Plus符, VtHandler);

    extractorDict.Add(st.@Dash符, VtHandler);

    extractorDict.Add(st.@Asterisk符, VtHandler);

    extractorDict.Add(st.@Slash符, VtHandler);

    extractorDict.Add(st.@LeftParenthesis符, VtHandler);

    extractorDict.Add(st.@RightParenthesis符, VtHandler);

    extractorDict.Add(st.@number, VtHandler);

    extractorDict.Add(st.@终,

    static (node, context) => {

        // [-1]=Exp' : Exp ;

        // dumped by ExternalExtractor

        var @final = (VnExp?)context.objStack.Pop();

        var left = new Exp(@final);

        context.result = left; // final step, no need to push into stack.

    }); // end of extractorDict.Add(st.@终, (node, context) => { ... });

    extractorDict.Add(st.Exp枝,

    static (node, context) => {

        switch (node.regulation.index) {

        case 0: { // [0]=Exp : Exp '+' Term ;

            // dumped by ListExtractor 2

            var r0 = (VnTerm?)context.objStack.Pop();

            var r1 = (Token?)context.objStack.Pop();

            var r2 = (VnExp?)context.objStack.Pop();

            var left = r2;

            left.Add(r1, r0);

            context.objStack.Push(left);

        }

        break;

        case 1: { // [1]=Exp : Exp '-' Term ;

            // dumped by ListExtractor 2

            var r0 = (VnTerm?)context.objStack.Pop();

            var r1 = (Token?)context.objStack.Pop();

            var r2 = (VnExp?)context.objStack.Pop();

            var left = r2;

            left.Add(r1, r0);

            context.objStack.Push(left);

        }

        break;

        case 2: { // [2]=Exp : Term ;

            // dumped by ListExtractor 1

            var r0 = (VnTerm?)context.objStack.Pop();

            var left = new VnExp(r0);

            context.objStack.Push(left);

        }

        break;

        default: throw new NotImplementedException();

        }

    }); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });

    extractorDict.Add(st.Term枝,

    static (node, context) => {

        switch (node.regulation.index) {

        case 3: { // [3]=Term : Term '*' Factor ;

            // dumped by ListExtractor 2

            var r0 = (VnFactor?)context.objStack.Pop();

            var r1 = (Token?)context.objStack.Pop();

            var r2 = (VnTerm?)context.objStack.Pop();

            var left = r2;

            left.Add(r1, r0);

            context.objStack.Push(left);

        }

        break;

        case 4: { // [4]=Term : Term '/' Factor ;

            // dumped by ListExtractor 2

            var r0 = (VnFactor?)context.objStack.Pop();

            var r1 = (Token?)context.objStack.Pop();

            var r2 = (VnTerm?)context.objStack.Pop();

            var left = r2;

            left.Add(r1, r0);

            context.objStack.Push(left);

        }

        break;

        case 5: { // [5]=Term : Factor ;

            // dumped by ListExtractor 1

            var r0 = (VnFactor?)context.objStack.Pop();

            var left = new VnTerm(r0);

            context.objStack.Push(left);

        }

        break;

        default: throw new NotImplementedException();

        }

    }); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });

    extractorDict.Add(st.Factor枝,

    static (node, context) => {

        switch (node.regulation.index) {

        case 6: { // [6]=Factor : '(' Exp ')' ;

            // dumped by DefaultExtractor

            var r0 = (Token?)context.objStack.Pop();

            var r1 = (VnExp?)context.objStack.Pop();

            var r2 = (Token?)context.objStack.Pop();

            var left = new VnFactor(r2, r1, r0);

            context.objStack.Push(left);

        }

        break;

        case 7: { // [7]=Factor : 'number' ;

            // dumped by DefaultExtractor

            var r0 = (Token?)context.objStack.Pop();

            var left = new VnFactor(r0);

            context.objStack.Push(left);

        }

        break;

        default: throw new NotImplementedException();

        }

    }); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });

}

提取到的结点信息，展示出来如下：

VnExpT[0->8]

 ├─VnTermT[0]

 │  └─VnFactorT[0]

 │     └─T[0]='number' 1234

 └─List T[1->8]

    ├─Item T[1->2]

    │  ├─T[1]='+' +

    │  └─VnTermT[2]

    │     └─VnFactorT[2]

    │        └─T[2]='number' 567

    ├─Item T[3->4]

    │  ├─T[3]='+' +

    │  └─VnTermT[4]

    │     └─VnFactorT[4]

    │        └─T[4]='number' 89

    ├─Item T[5->6]

    │  ├─T[5]='+' +

    │  └─VnTermT[6]

    │     └─VnFactorT[6]

    │        └─T[6]='number' 0

    └─Item T[7->8]

       ├─T[7]='+' +

       └─VnTermT[8]

          └─VnFactorT[8]

             └─T[8]='number' 0

类似Exp : Exp '+' Term ;的规约规则，会导致语法树深度过大。这一步的语义提取，作用是将此类树结构“压平”。根据压平了的语义信息，很容易将源代码格式化。

点击查看 VnExp结点的格式化代码

/// <summary>

/// Correspond to the Vn node Exp in the grammar(Exp).

/// </summary>

internal partial class VnExp : IFullFormatter {

	// [0]=Exp : Exp '+' Term ;

	// [1]=Exp : Exp '-' Term ;

	// [2]=Exp : Term ;

	private readonly VnTerm first0;

	public class PostItem : IFullFormatter {

		private readonly Token r1;

		private readonly VnTerm r0;

		public PostItem(Token r1, VnTerm r0) {

			this.r1 = r1;

			this.r0 = r0;

			this._scope = new TokenRange(r1, r0);

		}

		private readonly TokenRange _scope;

		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

			context.PrintBlanksAnd(this.r1, preConfig, writer);

			// '+'或'-'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

            context.PrintCommentsBetween(this.r1, this.r0, config, writer);

			context.PrintBlanksAnd(this.r0, config, writer);

		}

	}

	public class PostItemList : IFullFormatter {

		private readonly List<PostItem> list = new();

		public PostItemList(PostItem item) {

			this.list.Add(item);

			this._scope = new TokenRange(item);

		}

		public void Add(PostItem item) {

			this.list.Add(item);

			this._scope.end = item.Scope.end;

		}

		private readonly TokenRange _scope;

		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

			// '+'或'-'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

			for (int i = 0; i < list.Count; i++) {

				if (i == 0) {

					context.PrintBlanksAnd(list[i], preConfig, writer);

				}

				else {

					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);

					context.PrintBlanksAnd(list[i], config, writer);

				}

			}

		}

	}

	private PostItemList? list;

	private readonly TokenRange _scope;

	public TokenRange Scope => _scope;

	internal VnExp(VnTerm first0) {

		this.first0 = first0;

		this._scope = new TokenRange(first0);

	}

	// [0]=Exp : Exp '+' Term ;

	// [1]=Exp : Exp '-' Term ;

	internal void Add(Token r1, VnTerm r0) {

		if (this.list == null) {

			this.list = new PostItemList(new(r1, r0));

		}

		else {

			this.list.Add(new(r1, r0));

		}

		this._scope.end = this.list.Scope.end;

	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

		context.PrintBlanksAnd(this.first0, preConfig, writer);

		if (this.list != null) {

			// '+'或'-'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

			context.PrintCommentsBetween(this.first0, this.list, config, writer);

			context.PrintBlanksAnd(this.list, config, writer);

		}

	}

}

点击查看 VnTerm结点的格式化代码

/// <summary>

/// Correspond to the Vn node Term in the grammar(Exp).

/// </summary>

internal partial class VnTerm : IFullFormatter {

	// [3]=Term : Term '*' Factor ;

	// [4]=Term : Term '/' Factor ;

	// [5]=Term : Factor ;

	private readonly VnFactor first0;

	public class PostItem : IFullFormatter {

		private readonly Token r1;

		private readonly VnFactor r0;

		public PostItem(Token r1, VnFactor r0) {

			this.r1 = r1;

			this.r0 = r0;

			this._scope = new TokenRange(r1, r0);

		}

		private readonly TokenRange _scope;

		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

			context.PrintBlanksAnd(this.r1, preConfig, writer);

			// '+'或'-'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

            context.PrintCommentsBetween(this.r1, this.r0, config, writer);

			context.PrintBlanksAnd(this.r0, config, writer);

		}

	}

	public class PostItemList : IFullFormatter {

		private readonly List<PostItem> list = new();

		public PostItemList(PostItem item) {

			this.list.Add(item);

			this._scope = new TokenRange(item);

		}

		public void Add(PostItem item) {

			this.list.Add(item);

			this._scope.end = item.Scope.end;

		}

		private readonly TokenRange _scope;

		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

			// '*'或'/'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

			for (int i = 0; i < list.Count; i++) {

				if (i == 0) {

					context.PrintBlanksAnd(list[i], preConfig, writer);

				}

				else {

					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);

					context.PrintBlanksAnd(list[i], config, writer);

				}

			}

		}

	}

	private PostItemList? list;

	private readonly TokenRange _scope;

	public TokenRange Scope => _scope;

	// [5]=Term : Factor ;

	internal VnTerm(VnFactor first0) {

		this.first0 = first0;

		this._scope = new TokenRange(first0);

	}

	// [3]=Term : Term '*' Factor ;

	// [4]=Term : Term '/' Factor ;

	internal void Add(Token r1, VnFactor r0) {

		if (this.list == null) {

			this.list = new PostItemList(new(r1, r0));

		}

		else {

			this.list.Add(new(r1, r0));

		}

		this._scope.end = this.list.Scope.end;

	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

		context.PrintBlanksAnd(this.first0, preConfig, writer);

		if (this.list != null) {

			// '*'或'/'与其左右两边的Token间隔1个空格

			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);

			context.PrintCommentsBetween(this.first0, this.list, config, writer);

			context.PrintBlanksAnd(this.list, config, writer);

		}

	}

}

点击查看 VnFactor结点的格式化代码

/// <summary>

/// Correspond to the Vn node Factor in the grammar(Exp).

/// </summary>

internal abstract partial class VnFactor : IFullFormatter {

	// [6]=Factor : '(' Exp ')' ;

	// [7]=Factor : 'number' ;

	public class C0 : VnFactor {

		// [6]=Factor : '(' Exp ')' ;

		public C0(Token r2, VnExp r1, Token r0) {

			this.r2 = r2;

			this.r1 = r1;

			this.r0 = r0;

			this._scope = new TokenRange(r2, r0);

		}

		private readonly Token r2;

		private readonly VnExp r1;

		private readonly Token r0;

		private readonly TokenRange _scope;

		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

			context.PrintBlanksAnd(this.r2, preConfig, writer);

			// ( Exp )之间不留空格

			var config = new BlankConfig(inlineBlank: 0, forceNewline: false);

			context.PrintCommentsBetween(this.r2, this.r1, config, writer);

			context.PrintBlanksAnd(this.r1, config, writer);

			context.PrintCommentsBetween(this.r1, this.r0, config, writer);

			context.PrintBlanksAnd(this.r0, config, writer);

		}

	}

	public class C1 : VnFactor {

		// [7]=Factor : 'number' ;

		public C1(Token r0) {

			this.r0 = r0;

			this._scope = new TokenRange(r0);

		}

		private readonly Token r0;

		private readonly TokenRange _scope;

		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {

            // 根据上级设置的preConfig输出自己唯一的token

			context.PrintBlanksAnd(this.r0, preConfig, writer);

		}

	}

	public abstract TokenRange Scope { get; }

	public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);

}

最终，1234+567+89+0+0被格式化后的样子如下（在运算符两侧各加入1个空格）：

1234 + 567 + 89 + 0 + 0

下面是更多示例：

// 格式化 1-2*3

1 - 2 * 3

// 格式化 (1+2)/3

(1 + 2) / 3

// 格式化 (1+2)*(3-4)

(1 + 2) * (3 - 4)

点击查看 GLSL代码格式化的示例1

// 示例1 格式化前

void main() {

    int a=0;

    a++++;

    ++++a;

}

// 示例1 格式化后

void main() {

    int a = 0;

    a++ ++;

    ++ ++a;

}

点击查看 GLSL代码格式化的示例2

// 示例2 格式化前

in vec3  passNormal;

in vec2  passTexCoord;

uniform sampler2D textureMap;

uniform   vec3      lihtDirection=vec3(1,1,1);

uniform   vec3       diffuseColor;

uniform bool     transparent=false;

out vec4 outColor;

void main(  ){

    if (transparent) {

        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;}

    if (passTexCoord==vec2(-1,-1)){   // when texture coordinate not exists..

    float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);

     outColor = vec4(diffuseColor * diffuse, 1.0);

    }

    else {     outColor = texture(textureMap, passTexCoord);}

}

// 示例2 格式化后

in vec3 passNormal;

in vec2 passTexCoord;

uniform sampler2D textureMap;

uniform vec3 lihtDirection = vec3(1, 1, 1);

uniform vec3 diffuseColor;

uniform bool transparent = false;

out vec4 outColor;

void main() {

    if (transparent) {

        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;

    }

    if (passTexCoord == vec2(-1, -1)) {   // when texture coordinate not exists..

        float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);

        outColor = vec4(diffuseColor * diffuse, 1.0);

    }

    else { outColor = texture(textureMap, passTexCoord); }

}

关于这个格式化算法的详细介绍，可参考我的另一篇文章（GLSL Shader的格式化算法(LALR解析器)）。

举例-自动解决Shift/Reduce冲突

C语言和GLSL中都有if-else悬挂问题，它是由下述产生式引起的：

selection_statement :

      'if' '(' expression ')' selection_rest_statement

 ;

selection_rest_statement :

      statement 'else' statement

    | statement

 ;

这样，在语法分析器读到'else'这个Token时，它是应当Shift这个'else'呢，还是应当按照selection_rest_statement : statement ;进行规约呢？这就产生了Shift/Reduce冲突。

bitParser会自动选择按照Shift处理，将按照Ruduce方式处理的那一项注释掉，如下代码所示：

const int syntaxStateCount = 477;

// LALR(1) syntax parse table

private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]

    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];

private static void InitializeSyntaxStates() {

    var states = CompilerGLSL.syntaxStates;

    ...

    // 30814 actions

    // conflicts(1)=not sovled(0)+solved(1)(1 warnings)

    #region init actions of syntax states

    ...

    // syntaxStates[454]:

    // [324] selection_rest_statement : statement  'else' statement ; '--' '-' ';' '!' '(' '{' '}' '+' ..

    // [325] selection_rest_statement : statement  ; '--' '-' ';' '!' '(' '{' '}' '+' ..

    // 'else' repeated 2 times

    states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));

    // ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));

    states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));

	...

    #endregion init actions of syntax states

}

举例-优先级指令`%nonassoc、%left、%right、%prec`

如果我们按最直观的方式书写Calc.st，可能是这样的：

Exp : Exp '+' Exp

    | Exp '-' Exp

    | Exp '*' Exp

    | Exp '/' Exp

    | '(' Exp ')'

    | 'number' ;

%%[0-9]+%% 'number'

点击查看其LALR(1)语法分析器的状态转换表代码

const int syntaxStateCount = 14;

/// <summary>

/// LALR(1) syntax parse table

/// </summary>

private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =

	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];

private static void InitializeSyntaxStates() {

	var states = CompilerExp.syntaxStates;

	// 80 actions

	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)

	#region create objects of syntax states

	states[0] = new(capacity: 3);

	states[1] = new(capacity: 5);

	states[2] = new(capacity: 3);

	states[3] = new(capacity: 6);

	states[4] = new(capacity: 3);

	states[5] = new(capacity: 3);

	states[6] = new(capacity: 3);

	states[7] = new(capacity: 3);

	states[8] = new(capacity: 5);

	states[9] = new(capacity: 6);

	states[10] = new(capacity: 6);

	states[11] = new(capacity: 6);

	states[12] = new(capacity: 6);

	states[13] = new(capacity: 6);

	#endregion create objects of syntax states

	#region re-used actions

	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times

	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times

	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times

	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times

	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times

	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times

	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times

	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times

	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times

	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times

	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times

	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times

	#endregion re-used actions

	// 80 actions

	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)

	#region init actions of syntax states

	// syntaxStates[0]:

	// [-1] Exp' :  Exp ; '¥'

	// [0] Exp :  Exp '+' Exp ; '-' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' '*' '/' '+' '¥'

	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));

	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);

	states[0]/*2*/.Add(st.@number, aShift3);

	// syntaxStates[1]:

	// [-1] Exp' : Exp  ; '¥'

	// [0] Exp : Exp  '+' Exp ; '-' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' '*' '/' '+' '¥'

	states[1]/*3*/.Add(st.@Plus符, aShift4);

	states[1]/*4*/.Add(st.@Dash符, aShift5);

	states[1]/*5*/.Add(st.@Asterisk符, aShift6);

	states[1]/*6*/.Add(st.@Slash符, aShift7);

	states[1]/*7*/.Add(st.@终, LRParseAction.accept);

	// syntaxStates[2]:

	// [4] Exp : '('  Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+'

	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));

	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);

	states[2]/*10*/.Add(st.@number, aShift3);

	// syntaxStates[3]:

	// [5] Exp : 'number'  ; '-' ')' '*' '/' '+' '¥'

	states[3]/*11*/.Add(st.@Dash符, aReduce5);

	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);

	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);

	states[3]/*14*/.Add(st.@Slash符, aReduce5);

	states[3]/*15*/.Add(st.@Plus符, aReduce5);

	states[3]/*16*/.Add(st.@终, aReduce5);

	// syntaxStates[4]:

	// [0] Exp : Exp '+'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));

	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);

	states[4]/*19*/.Add(st.@number, aShift3);

	// syntaxStates[5]:

	// [1] Exp : Exp '-'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));

	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);

	states[5]/*22*/.Add(st.@number, aShift3);

	// syntaxStates[6]:

	// [2] Exp : Exp '*'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));

	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);

	states[6]/*25*/.Add(st.@number, aShift3);

	// syntaxStates[7]:

	// [3] Exp : Exp '/'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));

	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);

	states[7]/*28*/.Add(st.@number, aShift3);

	// syntaxStates[8]:

	// [4] Exp : '(' Exp  ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+'

	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));

	states[8]/*30*/.Add(st.@Plus符, aShift4);

	states[8]/*31*/.Add(st.@Dash符, aShift5);

	states[8]/*32*/.Add(st.@Asterisk符, aShift6);

	states[8]/*33*/.Add(st.@Slash符, aShift7);

	// syntaxStates[9]:

	// [0] Exp : Exp '+' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	states[9]/*34*/.Add(st.@Plus符, aShift4);

	// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);

	// '-' repeated 2 times

	states[9]/*36*/.Add(st.@Dash符, aShift5);

	// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);

	// '*' repeated 2 times

	states[9]/*38*/.Add(st.@Asterisk符, aShift6);

	// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);

	// '/' repeated 2 times

	states[9]/*40*/.Add(st.@Slash符, aShift7);

	// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);

	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);

	states[9]/*43*/.Add(st.@终, aReduce0);

	// syntaxStates[10]:

	// [1] Exp : Exp '-' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	states[10]/*44*/.Add(st.@Plus符, aShift4);

	// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);

	// '-' repeated 2 times

	states[10]/*46*/.Add(st.@Dash符, aShift5);

	// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);

	// '*' repeated 2 times

	states[10]/*48*/.Add(st.@Asterisk符, aShift6);

	// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);

	// '/' repeated 2 times

	states[10]/*50*/.Add(st.@Slash符, aShift7);

	// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);

	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);

	states[10]/*53*/.Add(st.@终, aReduce1);

	// syntaxStates[11]:

	// [2] Exp : Exp '*' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	states[11]/*54*/.Add(st.@Plus符, aShift4);

	// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);

	// '-' repeated 2 times

	states[11]/*56*/.Add(st.@Dash符, aShift5);

	// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);

	// '*' repeated 2 times

	states[11]/*58*/.Add(st.@Asterisk符, aShift6);

	// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);

	// '/' repeated 2 times

	states[11]/*60*/.Add(st.@Slash符, aShift7);

	// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);

	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);

	states[11]/*63*/.Add(st.@终, aReduce2);

	// syntaxStates[12]:

	// [3] Exp : Exp '/' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	states[12]/*64*/.Add(st.@Plus符, aShift4);

	// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);

	// '-' repeated 2 times

	states[12]/*66*/.Add(st.@Dash符, aShift5);

	// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);

	// '*' repeated 2 times

	states[12]/*68*/.Add(st.@Asterisk符, aShift6);

	// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);

	// '/' repeated 2 times

	states[12]/*70*/.Add(st.@Slash符, aShift7);

	// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);

	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);

	states[12]/*73*/.Add(st.@终, aReduce3);

	// syntaxStates[13]:

	// [4] Exp : '(' Exp ')'  ; '-' ')' '*' '/' '+' '¥'

	states[13]/*74*/.Add(st.@Dash符, aReduce4);

	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);

	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);

	states[13]/*77*/.Add(st.@Slash符, aReduce4);

	states[13]/*78*/.Add(st.@Plus符, aReduce4);

	states[13]/*79*/.Add(st.@终, aReduce4);

	#endregion init actions of syntax states

}

当处于syntaxStates[9]时，如果遇到'+'这个Token，那么本应当Reduce，但是bitParser根据默认的“Shift优先于Reduce”的原则，选择了Shift。显然，这无法正确处理加减运算和乘除运算的优先关系。

如果不想将文法改写为本文最初的样式，这里可以用优先级指令声明加减乘除运算的优先级，从而得到正确的语法分析表。

Exp : Exp '+' Exp

    | Exp '-' Exp

    | Exp '*' Exp

    | Exp '/' Exp

    | '(' Exp ')'

    | 'number' ;

%%[0-9]+%% 'number'

%left '+' '-' // '+' '-' 的优先级相同，且偏向于Reduce

%left '*' '/' // '*' '/' 的优先级相同，且高于'+' '-'，且偏向于Reduce

点击查看使用了优先级指令的LALR(1)语法分析器的状态转换表代码

const int syntaxStateCount = 14;

/// <summary>

/// LALR(1) syntax parse table

/// </summary>

private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =

	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];

private static void InitializeSyntaxStates() {

	var states = CompilerExp.syntaxStates;

	// 80 actions

	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)

	#region create objects of syntax states

	states[0] = new(capacity: 3);

	states[1] = new(capacity: 5);

	states[2] = new(capacity: 3);

	states[3] = new(capacity: 6);

	states[4] = new(capacity: 3);

	states[5] = new(capacity: 3);

	states[6] = new(capacity: 3);

	states[7] = new(capacity: 3);

	states[8] = new(capacity: 5);

	states[9] = new(capacity: 6);

	states[10] = new(capacity: 6);

	states[11] = new(capacity: 6);

	states[12] = new(capacity: 6);

	states[13] = new(capacity: 6);

	#endregion create objects of syntax states

	#region re-used actions

	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times

	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times

	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times

	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times

	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times

	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times

	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times

	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times

	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times

	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times

	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times

	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times

	#endregion re-used actions

	// 80 actions

	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)

	#region init actions of syntax states

	// syntaxStates[0]:

	// [-1] Exp' :  Exp ; '¥'

	// [0] Exp :  Exp '+' Exp ; '-' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' '*' '/' '+' '¥'

	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));

	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);

	states[0]/*2*/.Add(st.@number, aShift3);

	// syntaxStates[1]:

	// [-1] Exp' : Exp  ; '¥'

	// [0] Exp : Exp  '+' Exp ; '-' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' '*' '/' '+' '¥'

	states[1]/*3*/.Add(st.@Plus符, aShift4);

	states[1]/*4*/.Add(st.@Dash符, aShift5);

	states[1]/*5*/.Add(st.@Asterisk符, aShift6);

	states[1]/*6*/.Add(st.@Slash符, aShift7);

	states[1]/*7*/.Add(st.@终, LRParseAction.accept);

	// syntaxStates[2]:

	// [4] Exp : '('  Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+'

	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));

	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);

	states[2]/*10*/.Add(st.@number, aShift3);

	// syntaxStates[3]:

	// [5] Exp : 'number'  ; '-' ')' '*' '/' '+' '¥'

	states[3]/*11*/.Add(st.@Dash符, aReduce5);

	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);

	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);

	states[3]/*14*/.Add(st.@Slash符, aReduce5);

	states[3]/*15*/.Add(st.@Plus符, aReduce5);

	states[3]/*16*/.Add(st.@终, aReduce5);

	// syntaxStates[4]:

	// [0] Exp : Exp '+'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));

	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);

	states[4]/*19*/.Add(st.@number, aShift3);

	// syntaxStates[5]:

	// [1] Exp : Exp '-'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));

	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);

	states[5]/*22*/.Add(st.@number, aShift3);

	// syntaxStates[6]:

	// [2] Exp : Exp '*'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));

	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);

	states[6]/*25*/.Add(st.@number, aShift3);

	// syntaxStates[7]:

	// [3] Exp : Exp '/'  Exp ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp :  Exp '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp :  Exp '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp :  Exp '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp :  Exp '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// [4] Exp :  '(' Exp ')' ; '-' ')' '*' '/' '+' '¥'

	// [5] Exp :  'number' ; '-' ')' '*' '/' '+' '¥'

	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));

	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);

	states[7]/*28*/.Add(st.@number, aShift3);

	// syntaxStates[8]:

	// [4] Exp : '(' Exp  ')' ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+'

	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));

	states[8]/*30*/.Add(st.@Plus符, aShift4);

	states[8]/*31*/.Add(st.@Dash符, aShift5);

	states[8]/*32*/.Add(st.@Asterisk符, aShift6);

	states[8]/*33*/.Add(st.@Slash符, aShift7);

	// syntaxStates[9]:

	// [0] Exp : Exp '+' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);

	states[9]/*35*/.Add(st.@Plus符, aReduce0);

	// '-' repeated 2 times

	// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);

	states[9]/*37*/.Add(st.@Dash符, aReduce0);

	// '*' repeated 2 times

	states[9]/*38*/.Add(st.@Asterisk符, aShift6);

	// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);

	// '/' repeated 2 times

	states[9]/*40*/.Add(st.@Slash符, aShift7);

	// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);

	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);

	states[9]/*43*/.Add(st.@终, aReduce0);

	// syntaxStates[10]:

	// [1] Exp : Exp '-' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);

	states[10]/*45*/.Add(st.@Plus符, aReduce1);

	// '-' repeated 2 times

	// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);

	states[10]/*47*/.Add(st.@Dash符, aReduce1);

	// '*' repeated 2 times

	states[10]/*48*/.Add(st.@Asterisk符, aShift6);

	// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);

	// '/' repeated 2 times

	states[10]/*50*/.Add(st.@Slash符, aShift7);

	// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);

	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);

	states[10]/*53*/.Add(st.@终, aReduce1);

	// syntaxStates[11]:

	// [2] Exp : Exp '*' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);

	states[11]/*55*/.Add(st.@Plus符, aReduce2);

	// '-' repeated 2 times

	// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);

	states[11]/*57*/.Add(st.@Dash符, aReduce2);

	// '*' repeated 2 times

	// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);

	states[11]/*59*/.Add(st.@Asterisk符, aReduce2);

	// '/' repeated 2 times

	// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);

	states[11]/*61*/.Add(st.@Slash符, aReduce2);

	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);

	states[11]/*63*/.Add(st.@终, aReduce2);

	// syntaxStates[12]:

	// [3] Exp : Exp '/' Exp  ; '-' ')' '*' '/' '+' '¥'

	// [0] Exp : Exp  '+' Exp ; '-' ')' '*' '/' '+' '¥'

	// [1] Exp : Exp  '-' Exp ; '-' ')' '*' '/' '+' '¥'

	// [2] Exp : Exp  '*' Exp ; '-' ')' '*' '/' '+' '¥'

	// [3] Exp : Exp  '/' Exp ; '-' ')' '*' '/' '+' '¥'

	// '+' repeated 2 times

	// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);

	states[12]/*65*/.Add(st.@Plus符, aReduce3);

	// '-' repeated 2 times

	// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);

	states[12]/*67*/.Add(st.@Dash符, aReduce3);

	// '*' repeated 2 times

	// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);

	states[12]/*69*/.Add(st.@Asterisk符, aReduce3);

	// '/' repeated 2 times

	// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);

	states[12]/*71*/.Add(st.@Slash符, aReduce3);

	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);

	states[12]/*73*/.Add(st.@终, aReduce3);

	// syntaxStates[13]:

	// [4] Exp : '(' Exp ')'  ; '-' ')' '*' '/' '+' '¥'

	states[13]/*74*/.Add(st.@Dash符, aReduce4);

	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);

	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);

	states[13]/*77*/.Add(st.@Slash符, aReduce4);

	states[13]/*78*/.Add(st.@Plus符, aReduce4);

	states[13]/*79*/.Add(st.@终, aReduce4);

	#endregion init actions of syntax states

}

bitParser中的优先级指令%nonassoc、%left、%right、%prec，与yacc相同：

书写顺序靠后的，优先级更高；
%left偏向于Reduce；
%right偏向于Shift；
%nonassoc表示有语法错误；
%prec可以特别指定一个Token类型，以使用其优先级，而不是采用默认的（规约规则最右边Vt）的优先级。而且可以指定文法中不存在的Vt（即此Vt纯粹是个占位符）。

举例-`/`后缀

在Step格式的文件里，1=2中的1应当被认为是一个'entityId'，而2应当被认为是一个'refEntity'，即对“entity 2”的引用。

如何区别两者呢？当某个数值后面有=时，它就是'entityId'，否则就是'refEntity'。此时就需要用/后缀功能，如下所示：

// postfix.st

// regulations:

Items : Items Item | Item ;

Item : 'entityId' '=' 'refEntity' ;

// lexi statements:

%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着“ =”，那么这些数字就是'entityId'

%%[0-9]+%% 'refEntity'

点击查看生成的 lexi状态0 相关代码

// lexicalState0

private static readonly Action<LexicalContext, char> lexicalState0 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* = */

	else if (/* possible Vt : '=' */

	c == '='/*'\u003D'(61)*/) {

		BeginToken(context);

		ExtendToken(context);

		context.currentState = lexicalState2;

	}

	/* [0-9] */

	else if (/* possible Vt : 'entityId' 'refEntity' */

	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {

		BeginToken(context);

		ExtendToken(context);

		context.currentState = lexicalState3;

	}

	/* deal with everything else. */

	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {

		context.currentState = lexicalState0; // skip them.

	}

	else { // unexpected char.

		BeginToken(context);

		ExtendToken(context);

		AcceptToken(st.Error, context);

		context.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态1 相关代码

// lexicalState1

private static readonly Action<LexicalContext, char> lexicalState1 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* = */

	else if (/* possible Vt : 'entityId' */

	c == '='/*'\u003D'(61)*/) {

		context.currentState = lexicalState4;

	}

	/* [ \t] */

	else if (/* possible Vt : 'entityId' */

	(c == ' '/*'\u0020'(32)*/)

	|| (c == '\t'/*'\u0009'(9)*/)) {

		context.currentState = lexicalState1;

	}

	/* deal with everything else. */

	else { // token with error type

		ExtendToken(context);

		AcceptToken(st.Error, context);

		context.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态2 相关代码

// lexicalState2

private static readonly Action<LexicalContext, char> lexicalState2 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@Equal符, context);

		context.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态3 相关代码

// lexicalState3

private static readonly Action<LexicalContext, char> lexicalState3 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* = */

	else if (/* possible Vt : 'entityId' */

	c == '='/*'\u003D'(61)*/) {

		context.currentState = lexicalState4;

	}

	/* [ \t] */

	else if (/* possible Vt : 'entityId' */

	(c == ' '/*'\u0020'(32)*/)

	|| (c == '\t'/*'\u0009'(9)*/)) {

		context.currentState = lexicalState1;

	}

	/* [0-9] */

	else if (/* possible Vt : 'entityId' 'refEntity' */

	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {

		ExtendToken(context);

		context.currentState = lexicalState3;

	}

	/* deal with everything else. */

	else {

		AcceptToken(st.@refEntity, context);

		context.currentState = lexicalState0;

	}

};

点击查看生成的 lexi状态4 相关代码

// lexicalState4

private static readonly Action<LexicalContext, char> lexicalState4 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* deal with everything else. */

	else {

		AcceptToken(st.@entityId, context);

		context.currentState = lexicalState0;

	}

};

下面是二维数组ElseIf[][]形式的状态转换表，它复用了一些ElseIf对象，这进一步减少了空间占用。

点击查看生成的二维数组状态转换表代码

private static readonly ElseIf[][] lexiStates = new ElseIf[5][];

static void InitializeLexiTable() {

	ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times

	ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times

	ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times

	lexiStates[0] = new ElseIf[] {

	// possible Vt: 'entityId' 'refEntity'

	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),

	// possible Vt: '='

	/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),

	};

	lexiStates[1] = new ElseIf[] {

	// possible Vt: 'entityId'

	/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),

	/*0*/s9_9_0_1,

	// possible Vt: 'entityId'

	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),

	/*1*/s32_32_0_1,

	// possible Vt: 'entityId'

	/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),

	/*2*/s61_61_0_4,

	};

	lexiStates[2] = new ElseIf[] {

	// possible Vt: '='

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),

	};

	lexiStates[3] = new ElseIf[] {

	// possible Vt: 'entityId'

	/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),

	/*0*/s9_9_0_1,

	// possible Vt: 'entityId'

	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),

	/*1*/s32_32_0_1,

	// possible Vt: 'entityId' 'refEntity'

	/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),

	// possible Vt: 'entityId'

	/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),

	/*3*/s61_61_0_4,

	// possible Vt: 'refEntity'

	/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),

	};

	lexiStates[4] = new ElseIf[] {

	// possible Vt: 'entityId'

	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),

	};

}

其miniDFA的状态图如下：

结合代码和状态图容易发现：

当处于miniDFA3状态时，如果遇到空格、\t、=则可断定下一个Token类型是'entityId'（或错误类型'Error错'）；否则就是'refEntity'（并返回miniDFA0状态）。这符合我们的预期。
当处于miniDFA0状态（即初始状态）时，如果遇到[0-9]，那么下一个Token类型既可能是'entityId'，又可能是'refEntity'。此时尚且无法唯一断定之。

也就是说，这个例子也包含了将NFA转换为DFA的问题，因为它的两种Token'entityId'和'refEntity'的开头[0-9]+是相同的。

我们来对比它的NFA和DFA：

下面是它的NFA状态图：

下面是它的DFA状态图：

可见：

在NFA状态图中，NFA0-0状态在遇到[0-9]时，有两个选择：NFA2-1状态和NFA3-1状态。在对应的DFA状态图中，NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态（DFA2）中。这是通过子集构造法实现的。

这里顺便展示一个构造miniDFA的例子：

// regulations:

Left : 'min' ;

// lexical statements

%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟','，则为'min'

这个'min'的DFA如下：

这个'min'的miniDFA如下：

可见，miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并，是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。

举例-前缀`<'Vt'>`

在GLSL中，为了方便语法分析，我需要将struct Point { float x; float y; }中的Point识别为一个“type_name”类型的Token，而不是“identifier”类型的Token。这可以通过前缀来实现。

// ..

struct_specifier :

      'struct' 'type_name' '{' struct_declaration_list '}' ;

// ..

// lexical statements

%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为“type_name”类型

%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为“identifier”类型

添加前缀<'struct'>不会影响词法分析器的状态构成，只会在设置Token类型时看一看上一个Token是不是“struct”类型：若是，则设置为“type_name”类型，否则，设置为“identifier”类型。

另外，还需要新增一个数组，用于记录已识别出的全部“type_name”类型，以便再次遇到它时，也能够将其设置为“type_name”类型。

点击查看状态机受前缀影响的部分代码

// lexicalState1

private static readonly Action<LexicalContext, char> lexicalState1 =

static (context, c) => {

	if (false) { /* for simpler code generation purpose. */ }

	/* [a-zA-Z0-9_] */

	else if (/* possible Vt : 'type_name' 'identifier' */

	('a' <= c && c <= 'z')

	|| ('A' <= c && c <= 'Z')

	|| ('0' <= c && c <= '9')

	|| (c == '_')) {

		ExtendToken(context);

		context.currentState = lexicalState3;

	}

	/* deal with everything else. */

	else {

		AcceptToken2(context

		// 如果上一个Token是struct，那么新Token是type_name

		, new(/*<'Vt'>*/st.@struct, st.@type_name)

		// 否则，新Token是identifier

		, new(/*default preVt*/string.Empty, st.@identifier));

		context.currentState = lexicalState0;

	}

};

与之配套的AcceptToken2(context, ifVts);函数也就复杂些：

点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码

struct IfVt {

	public readonly string preVt;

	public readonly string Vt;

	public IfVt(string preVt, string Vt) {

		this.preVt = preVt;

		this.Vt = Vt;

	}

}

private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {

	var startIndex = context.analyzingToken.start.index;

	var endIndex = context.analyzingToken.end.index;

	context.analyzingToken.value = context.sourceCode.Substring(

		startIndex, endIndex - startIndex + 1);

	var typeSet = false;

	const string key = "type_name"; var hadThisTypeName = false;

	if (!typeSet) {

		if (context.tagDict.TryGetValue(key, out var type_nameList)) {

			var list = type_nameList as List<string>;

			if (list.Contains(context.analyzingToken.value)) {

			    // 如果是已识别出的type_name

				context.analyzingToken.type = st.type_name;

				typeSet = true;

				hadThisTypeName = true;

			}

		}

	}

	if (!typeSet) {

	    int lastType = 0;

		if (context.lastSyntaxValidToken != null) {

			lastType = context.lastSyntaxValidToken.type;

		}

		for (var i = 0; i < ifVts.Length; i++) {

			var ifVt = ifVts[i];

			if (ifVt.preVt == 0 // 默认没有前缀

			  || ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>

				context.analyzingToken.type = ifVt.Vt;

				typeSet = true;

				break;

			}

		}

	}

	if (!typeSet) {

		// we failed to assign type according to lexi statements.

		// this indicates token error in source code or inappropriate lexi statements.

		context.analyzingToken.type = st.Error错;

		context.signalCondition = LexicalContext.defaultSignal;

	}

    // cancel forward steps for post-regex

	var backStep = context.cursor.index - context.analyzingToken.end.index;

	if (backStep > 0) { context.MoveBack(backStep); }

	// next operation: context.MoveForward();

	var token = context.analyzingToken.Dump();

	context.result.Add(token);

	// 跳过注释

	if (context.analyzingToken.type != st.blockComment

	 && context.analyzingToken.type != st.inlineComment) {

		context.lastSyntaxValidToken = token;

	}

	if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {

	    // 将新识别出的type_name加入list

		if (!context.tagDict.TryGetValue(key, out var type_nameList)) {

			type_nameList = new List<string>();

			context.tagDict.Add(key, type_nameList);

		}

		var list = type_nameList as List<string>;

		list.Add(context.analyzingToken.value);

	}

}

注意，语法分析不需要blockComment和inlineComment（类型为“注释”的Token），因而在记录上一个Token类型时，我们要跳过注释。

举例-状态信号<signal1, signal2, ..>

在GLSL中，为了方便语法分析，我需要将subroutine ( r1, r2 )中的r1、r2都识别为“type_name”类型的Token，而不是“identifier”类型的Token。这无法通过前缀实现，但可以通过状态信号LexicalContext.signal实现。

状态信号是这样起作用的：

在读到一个“subroutine”类型的Token时，将LexicalContext.signal设置为subroutine0；

在读到一个“(”类型的Token时，如果LexicalContext.signal是subroutine0，就将LexicalContext.signal设置为subroutine1；

在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*的标识符时，如果LexicalContext.signal是subroutine1（这说明词法分析器刚刚连续读到了'subroutine' '('），就将它识别为“type_name”类型的Token，否则识别为“identifier”类型的Token；

在读到一个“)”类型的Token时，如果LexicalContext.signal是subroutine1，就将LexicalContext.signal设置为default（默认状态），即不再理会状态信号。

storage_qualifier :

    | 'subroutine' '(' type_name_list ')' ;

// lexical statements

%%subroutine%%             'subroutine' subroutine0

<subroutine0>%%[(]%%       '('          subroutine1

<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'

<subroutine1>%%[,]%%       ','

<subroutine1>%%[)]%%       ')'          default

状态信号，是在词法分析器这个状态机的基础上又附加了一个状态机，因而其应用比较复杂，容易出错，应当尽量少用。

举例-中文字符

如果我想识别出一个文本文件中的全部汉字（假设汉字字符位于\u4E00和\u9FFF之间），可以这样：

Items : Items Item | Item ;

Item : 'chineseChar' | 'other' ;

%%[\u4E00-u9FFF]%% 'chineseChar'

%%[^\u4E00-u9FFF]%% 'other'

End