An expression evaluator

 Download source code (17 kb)

Two weeks ago, I saw an article on codeproject that really nicely solve an old and very known issue. Why it is nice is because it is short, simple, sequential
and last but not least elegant.

In the mean time, I needed an expression evaluator for a product I am making, and I needed not only to extend the principles of evaluation, but also add a few features.

This article describes what I have done in order to write a general purpose expression evaluator, that goes beyond evaluating the 4 primary operations. In fact it can be regarded as a complement to the mentioned article for three reasons :

  • it's a complete rewrite
  • it uses an object model when it comes to variables and functions
  • it describes how the parsing and evaluation works

The principles of expression parsing

It's an old problem because although separating operators from numbers or other tokens is an easy task, the fact that operators are either infix or not, and have algebric prevalence relations between them forces the parser to pay a particular attention to
the whole parsing process. Quite naturally, this involves playing with a separate stack of all operators, and swapping operators according to a set of predefined rules.

The good news is that, when it comes to usual mathematical expressions, there are only a tiny set of rules. Here they are :

  • * and / are prevalent over + and -
  • functions are prevalent over * and /
  • parenthesis are always prevalent over operators

What prevalent means is that if operator a is prevalent over operator b, then a will be performed before b. For instance, multiplications must be performed before additions. Of course, parenthesis help solve prevalence
issues.

Other than prevalence, we need to build a tree of operations to perform in order to get a result out of the evaluation. Typical trees are binary trees where operands are leaves, and operators are tree nodes. Of course, because operators can chain up each
other arbitrarily, the tree can be fairly deep, and significantly unbalanced (by the way, re-balancing trees is an interesting topic). Below are example pics of evaluation trees for the following expressions : = 5 + 3=
5 + 3 * 8
.


Sample evaluation trees

When an evaluation tree is built, evaluation can be done by traversing the nodes from the root down to leaves. Doing the evaluation is a matter of knowing, for each operator, how many arguments are expected, and retrieve them recursively going down the tree.

That being said, and that's the main point of RPN (Reverse Polish Notation), an actual tree need not be created. Having an expression in which operators are suffixes of their arguments is enough to have the equivalent of the tree and, as a result, enough
to do the evaluation.

For instance, the parsing phase not only distinguishes operator arguments, better known as operands, and operators themselves, the parsing phase also repurposes the expression so that it's in RPN style. For the sample expressions, =
5 + 3
, and = 5 + 3 * 8, this leads to : {5}{3}{+} and {5}{3}{8}{*}{+}.

RPN makes evaluation straight forward

Why RPN is so useful is that, given such order, it is possible to have a really simple algorithm that reads tokens from left to right, stacks all operands, and then unstacks those whenever an operator is retrieved. After the operation is performed, the resulting
token is stacked so it behaves to the remainder of the RPN expression like a normal operand. Below provides a timeline evaluation process of expression = 5 + 3 * 8 :

  • Tokens are from this list : {5}{3}{8}{*}{+}
  • next token is {5}, an operand, stack it (separate stack), stack now contains {5}
  • next token is {3}, an operand, stack it, stack now contains {5} {3}
  • next token is {8}, an operand, stack it, stack now contains {5} {3} {8}
  • next token is {*}, operator, 2 expected operands, unstack two operands ({3} {8}), perform the operation, stack the result, stack now contains {5} {24}
  • next token is {+}, operator, 2 expected operands, unstack two operands ({5} {24}), perform the operation, stack the result, stack now contains {29}
  • the token list is empty
  • {29} is the result of evaluation!

Please note that, during the evaluation process, it is possible to check the expected number of arguments against the amount of arguments available in the operand stack. This leads to typical execution errors, where the user is expected to correct the impaired
arguments being passed. In order to forge a cursor position for that execution error, the tokens must be associated to a cursor position from the original expression.

A typical algorithm for expression parsing is as follows :

 for each char of the expression
decide if the char is part of an operand or of an operator
if the char is part of an operand,
append it to the list of operands
else if the char is part of an operator,
look up the operator
match it with supported operators
compare operator with the preceding operators
if this operator is prevalent,
store the operator in a stack
otherwise, unstack the preceding operator,
append the preceding operator to the list of operands
stack the new operator
end if
end if
end for

This is a general algorithm and it's easy to figure out that a typical implementation remains under 200 lines of source code.

The following blocks depict how the parsing works :



initial structures



retrieving an operand



retrieving an operator



retrieving an operand



retrieving an operator



retrieving an operand



unstacking the operator from the top of the stack



unstacking the operator from the top of the stack


RPN style of the resulting structure

If the expression was = 5 * 3 + 8, instead of = 5 + 3 * 8, then when the parser retrieves the + operator, it unstacks the * operator and append it to the list of tokens, before the + operator is stacked.

Since parenthesis are of maximum prevalence, they have to be taken into account as such. While open and closed parenthesis behave like any other normal operators, they are paid a special attention. Parenthesis are not appended in the list of tokens. What
parenthesis do is only add an arbitrary prevalence on top of the existing prevalence order between stacked operators. When parsing the expression, any time we reach a closed parenthesis, we basically unstack all operators until we reach the open parenthesis.
This is how maximum prevalence is sorted out.

In the implementation provided in this article, a more granular level of object manipulation was considered. If we limit ourselves to what has been said above, then this expression evaluator is limited to the simple operators. We neither support functions
nor variables. What would be an expression evaluator good for is it doesn't support both of these?

Implementing the list of tokens is a matter of having a base class, wzelement, whose derived classes either hold numbers, strings, operators, or whatever might be required by the client application. As a result, the list of tokens is an array of wzelement
:

typedef enum {_operator, _litteral} elementtype;

wzarray<wzelement*> m_arrelements; // list of tokens

Just to show how this is brought together, below is the declarations for those classes :

class wzelement
{
protected:
elementtype m_type; public:
wzelement();
virtual ~wzelement(); void setType(elementtype t);
elementtype getType(); }; class wzoperator : public wzelement
{
protected:
long m_nID;
long m_nPriority;
long m_nbParams;
BOOL m_bIsAfunction; public:
void setID(long n);
long getID(); void setPriority(long lvl);
long getPriority(); void setNbParams(long nb);
long getNbParams(); void setIsAFunction(BOOL bIsAFunction);
BOOL getIsAFunction(); BOOL isHigherPriorityThan(wzoperator* src); // TRUE if this is of higher priority than src
BOOL isParenthesis();
}; // wzstring //////////////////////////////////////////////
//
// simple string storage implementation
//
class wzstring : public wzelement
{ // Members
protected:
LPSTR m_pstr;
long m_nLength; // Construction
public:
wzstring();
virtual ~wzstring(); // frees the buffer void init(); BOOL isEmpty(); void empty();
LPSTR setString(LPSTR pString, long nLength); // allocates a buffer
LPSTR setString(wzstring* pString); // allocates a buffer
LPSTR getString();
long getLength(); BOOL isANumber(); // TRUE if the number is an integer
BOOL isADouble(); // less restricting than isANumber()
long getNumber();
double getDouble(); void fromNumber(long n);
void fromDouble(double d); //void fromNumberOrDouble(double d, BOOL bArg1IsANumber, BOOL bArg2IsANumber);
void fromNumberOrDouble(double d, ...); // var args implementation };

Function support

Supporting functions is a matter of :

  • adding operator tokens to the list of supported operators
  • adding a special treatment whenever the parser finds an argument separator.

Adding support to functions gives a good opportunity to declare operators openly in a table, rather than hardcode them in the parser. Among important flags required by either the parser or the evaluator are :

  • the amount of arguments expected by the operator. This can be 0, 1, 2, ...
  • whether the operator is a function or not. This plays a role to better identify parenthesis used to manage prevalence or parenthesis enclosing function arguments.
  • the prevalence of that operator
  • last but not least, its ID

Below is a sample table showing exactly that :

_structoperator operators[] = {
{ "+"/*label*/, 50/*id*/, 10/*priority*/, 2/*nbparams*/, FALSE/*is a function*/ },
{ "-", 51, 10, 2, FALSE },
{ "*", 52, 20, 2, FALSE },
{ "/", 53, 20, 2, FALSE },
{ "(", 40, 100, 1, FALSE },
{ ")", 41, 100, 1, FALSE },
{ "SIN", 60, 30, 1, TRUE },
{ "COS", 61, 30, 1, TRUE },
{ NULL, 0, 0, 0, 0 }
};

Although the table above (and by the way the source code provided) implements functions that play with numbers, it need not be the case. As a matter of fact, arguments can be strings, imbricated functions or operators, etc. For instance, this source code
is used to create Excel formulas like this : =IF(B2 > B3 ; IF(B3 > B4 ; "TRUE" ; "FALSE" ) ; "FALSE").

Last but not least, the argument separator, ; by default, can be specified using the parser API. Unlike typical comma-separated C-code arguments, functions tend to use semi-colons any time arguments can be numbers. Indeed,
commas can be thousand or decimal separators depending on the culture. It is possible to allow comma-separated functions as long as tokens are enclosed within quotes, but this is an unnecessary overhead, causes user look and feel issues, and makes the expression
looks odd.

Variables support

Supporting variables is only a matter of replacing litteral tokens that are not numbers, strings or other litterals with actual numbers or strings or whatever is suited to performing operations.

In order to call the evaluator more than once, either the list of tokens must be saved somewhere, and then restored, or variables being replaced with their value need to put back their original name, in the evaluation clean up. In the provided source code,
we clean up the evaluation process by putting original variable names back in the list of tokens.

The variable class is declared as follows :

class wzvariable : public wzelement
{ // Members
protected:
wzstring m_varname, m_varvalue;
wzstring* m_attachedLitteral; public:
void setVar(LPSTR name, LPSTR value);
void setVarname(LPSTR name);
void setVarvalue(LPSTR value); wzstring* getVarname();
wzstring* getVarvalue(); BOOL isNameMatching(wzstring* litteral); void attachLitteral(wzstring* litteral);
wzstring* getLitteral();
};

Code samples

1) sample code

This sample code demoes the minimum code involved when parsing an expression. Variables not used.

#include "util.h"
#include "parser.h" wzparser* p = new wzparser(); p->parse("=5+3+SIN(1.236)");
p->dump(); // for debug purpose only wzarray<wzvariable*> arrVariables; wzstring result;
if ( p->eval(arrVariables, result) )
{
OutputDebugString( "result=" );
OutputDebugString( result.getString() );
OutputDebugString( "\r\n" );
} delete p;

2) another sample code

This sample code uses variables. Evaluation is done twice as to show how to iterate the process.

#include "util.h"
#include "parser.h" wzparser* p = new wzparser(); p->parse("=5+3+SIN(x)");
p->dump(); // for debug purpose only wzarray<wzvariable*> arrVariables; wzvariable* x = new wzvariable();
x->setVar("x","13");
arrVariables.Add( x ); wzstring result;
if ( p->eval(arrVariables, result) )
{
OutputDebugString( "result=" );
OutputDebugString( result.getString() );
OutputDebugString( "\r\n" );
} x->setVarvalue("15");
if ( p->eval(arrVariables, result) )
{
OutputDebugString( "result=" );
OutputDebugString( result.getString() );
OutputDebugString( "\r\n" );
} // delete variables
long nbVariables = arrVariables.GetSize();
for (long iVars = 0; iVars < nbVariables; iVars++)
delete arrVariables.GetAt(iVars); delete p;

3) what you need to reuse this code

You need the following files :

  • util.h, util.cpp : element classes
  • parser.h, parser.cpp : parser and evaluator
  • it's pure C++ code. No dependency on MFC.

in parser.cpp, the operators table defines the support operators. When either the parser or evaluator fail, the getLastError() method returns the error.

History

  • November 18, 2003 : first release. Support for basic functions.
  • October 17, 2005 : second release. Much better parsing and eval routines. Support for variable length functions.

Stéphane Rodriguez

October 17, 2005.

An expression evaluator的更多相关文章

  1. .NET平台开源项目速览(8)Expression Evaluator表达式计算组件使用

    在文章:这些.NET开源项目你知道吗?让.NET开源来得更加猛烈些吧!(第二辑)中,给大家初步介绍了一下Expression Evaluator验证组件.那里只是概述了一下,并没有对其使用和强大功能做 ...

  2. 给 C# Expression Evaluator 增加中文变量名支持

    由于一些特殊的原因,我的Expression里面需要支持中文变量名,但是C# Expression Evaluator会提示错误,在他的HelperMethods.IsAlpha()里面加上这么一段就 ...

  3. .NET 表达式计算:Expression Evaluator

    Expression Evaluator 是一个轻量级的可以在运行时解析C#表达式的开源免费组件.表达式求值应该在很多地方使用,例如一些工资或者成本核算系统,就需要在后台动态配置计算表达式,从而进行计 ...

  4. (字符串的处理4.7.22)POJ 3337 Expression Evaluator(解析C风格的字符串)

    /* * POJ_3337.cpp * * Created on: 2013年10月29日 * Author: Administrator */ #include <iostream> # ...

  5. 使用 Roslyn 编译器服务

    .NET Core和 .NET 4.6中 的C# 6/7 中的编译器Roslyn 一个重要的特性就是"Compiler as a Service",简单的讲,就是就是将编译器开放为 ...

  6. 【目录】本博客其他.NET开源项目文章目录

    本博客所有文章分类的总目录链接:本博客博文总目录-实时更新 1.本博客其他.NET开源项目文章目录 37..NET平台开源项目速览(17)FluentConsole让你的控制台酷起来 36..NET平 ...

  7. DotNet 资源大全中文版(Awesome最新版)

    Awesome系列的.Net资源整理.awesome-dotnet是由quozd发起和维护.内容包括:编译器.压缩.应用框架.应用模板.加密.数据库.反编译.IDE.日志.风格指南等. 算法与数据结构 ...

  8. Index

    我主要在研究.NET/C# 实现 PC IMERP 和 Android IMERP ,目的在解决企业通信中遇到的各类自动化问题   分布式缓存框架: Microsoft Velocity:微软自家分布 ...

  9. 《.NET开发资源大全》

    目录 API 应用框架(Application Frameworks) 应用模板(Application Templates) 人工智能(Artificial Intelligence) 程序集处理( ...

  10. logback logback.xml常用配置详解 <filter>

    <filter>: 过滤器,执行一个过滤器会有返回个枚举值,即DENY,NEUTRAL,ACCEPT其中之一.返回DENY,日志将立即被抛弃不再经过其他过滤器:返回NEUTRAL,有序列表 ...

随机推荐

  1. 深度学习Python代码小知识点(备忘,因为没有脑子)

    现在是2024年4月24日16:58,今天摸鱼有点多,备忘一下,都写到一篇内容里面,免得分散. 1. np.concatenate()函数'np.concatenate'是NumPy库中用来合并两个或 ...

  2. ASP.NET Core – Globalization & Localization

    前言 之前就写过 2 篇, 只是写的很乱, 这篇作为整理版. Asp.net core (学习笔记 路由和语言 route & language) Asp.net core 学习笔记之 glo ...

  3. Angular 学习笔记 work with excel (导出 excel)

    更新: 2020-04-15 补上 read excel 先用 file reader 把文件变成 buffer 然后调用 exceljs 就可以了, 它很聪明的哦, date number, boo ...

  4. Asp.net core 学习笔记之异常处理

    自己写代码自己维护, 你爱怎样写都可以, 确保一致性就可以了. 不要自己写,自己看不懂 /.\ 但是如果有一天你要别人也看得懂...那就不单单是一致性的问题了,最好是用大众的 style. refer ...

  5. C++ char*类型与vector类型的相互转换

    char*类型与vector<char> 类型的相互转换 很多时候需要使用动态的字符串,但是char*难以完成相应的扩容操作,而动态数组vector则可以简单地完成,结合二者特性就可以完成 ...

  6. dfs 油滴拓展——洛谷p1378

    油滴扩展 题目描述 在一个长方形框子里,最多有 \(N\) 个相异的点,在其中任何一个点上放一个很小的油滴,那么这个油滴会一直扩展,直到接触到其他油滴或者框子的边界.必须等一个油滴扩展完毕才能放置下一 ...

  7. Vue3——环境变量的配置

    vue3环境变量的配置 开发环境(development) 测试环境(testing) 生产环境(production) 项目根目录分别添加 开发.生产和测试环境的文件! .env.developme ...

  8. Kubernetes StatefulSet 控制器(二十一)

    前面我们学习了 Deployment 和 ReplicaSet 两种资源对象得使用,在实际使用的过程中,Deployment 并不能编排所有类型的应用,对无状态服务编排是非常容易的,但是对于有状态服务 ...

  9. Excel表格重复项特殊标注

    事件起因: 某不知名同事,需要将Excel表格中的重复选项特殊标注出来,故研究了一下   解决办法: 在Excel表格中,如果需要特殊标注重复项时候,可以参考以下办法 选项excel表格行/列 - 开 ...

  10. php获取支付宝用户信息

    php获取支付宝用户信息 一:创建应用 要在您的应用中使用支付宝开放产品的接口能力: 您需要先去蚂蚁金服开放平台(open.alipay.com),在开发者中心创建登记您的应用,此时您将获得应用唯一标 ...