rapidxml对unicode的支持

　　为了提高duilib创建布局控件的效率，在LuaDui项目中使用rapidxml解析器替换了duilib库自带的xml解析器。

duilib使用unicode编译，所以rapidxml需要解析unicode xml字符串。

　　使用rapidxml解析unicode字符串很简单，只需在rapidxml的模板参数中设置为TCHAR即可，所以定义以下类型方便使用。

#include <rapidxml/rapidxml.hpp>

typedef rapidxml::xml_document<TCHAR> XmlDoc;

typedef rapidxml::xml_node<TCHAR> XmlNode;

typedef rapidxml::xml_attribute<TCHAR> XmlAttr;

　　在使用过程中发现了解析xml中的中文字符出现bug，解析如下xml会出现问题抛出异常。

<?xml version="1.0" encoding="UTF-8"?>

<Window caption="0,0,0,30" sizebox="5,5,5,5" mininfo="480,360" defaultfontcolor="#ff010000" width="600" height="480">

<Font name="微软雅黑" size="12" bold="false"/>

<VerticalLayout bkcolor="#ff019bd0" inset="1,1,1,1" bordersize="1" bordercolor="#FF010000">

    <HorizontalLayout height="30" inset="5,0,0,0">

	<Label name="标题" text="调试窗口" textcolor="#FFFFFFFF"></Label>

	<Control />

        <Button name="minbtn" width="40" height="22" text="最小化" bkcolor="#ff3fd536">

	　　<Event click="DebugUIEvent.minBtnClick" />

	</Button>

        <Button name="closebtn" width="47" height="22" text="关闭" bkcolor="#ffef2f4d">

	　　<Event click="DebugUIEvent.closeBtnClick" />

	</Button>

    </HorizontalLayout>

　　<VerticalLayout  bkcolor="#66ffffff">

　　</VerticalLayout>

</VerticalLayout>

</Window>

　　断点时发现在解析 text="最小化" 属性时出现问题，解析text值的时候把后面的内容全部当做text的属性值，无法再往下解析了。

最后终于找到了问题所在，rapidxml为提高解析效率，定义了如下的表：

        template<int Dummy>

        struct lookup_tables

        {

            static const unsigned char lookup_whitespace[256];              // Whitespace table

            static const unsigned char lookup_node_name[256];               // Node name table

            static const unsigned char lookup_text[256];                    // Text table

            static const unsigned char lookup_text_pure_no_ws[256];         // Text table

            static const unsigned char lookup_text_pure_with_ws[256];       // Text table

            static const unsigned char lookup_attribute_name[256];          // Attribute name table

            static const unsigned char lookup_attribute_data_1[256];        // Attribute data table with single quote

            static const unsigned char lookup_attribute_data_1_pure[256];   // Attribute data table with single quote

            static const unsigned char lookup_attribute_data_2[256];        // Attribute data table with double quotes

            static const unsigned char lookup_attribute_data_2_pure[256];   // Attribute data table with double quotes

            static const unsigned char lookup_digits[256];                  // Digits

            static const unsigned char lookup_upcase[256];                  // To uppercase conversion table for ASCII characters

        };

　　来识别xml中的标志符，在进行查找的时候直接通过数组直接找到使用了

如下操作：

internal::lookup_tables<0>::lookup_text_pure_no_ws[static_cast<unsigned char>(ch)];

但在unicode下static_cast<unsigned char>(ch)的ch是wchar占两个字节直接转换为unsigned char会出现判断出错问题。所以要在rapidxml中解析unicode需要修改rapidxml代码：

       // Detect whitespace character

        struct whitespace_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_whitespace[static_cast<unsigned char>(ch)];

				else

					return 0;

            }

        };

        // Detect node name character

        struct node_name_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_node_name[static_cast<unsigned char>(ch)];

				else

					return 1;

            }

        };

        // Detect attribute name character

        struct attribute_name_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_attribute_name[static_cast<unsigned char>(ch)];

				else

					return 1;

            }

        };

        // Detect text character (PCDATA)

        struct text_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_text[static_cast<unsigned char>(ch)];

				else

					return 1;

            }

        };

        // Detect text character (PCDATA) that does not require processing

        struct text_pure_no_ws_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_text_pure_no_ws[static_cast<unsigned char>(ch)];

				else

					return 1;

            }

        };

        // Detect text character (PCDATA) that does not require processing

        struct text_pure_with_ws_pred

        {

            static unsigned char test(Ch ch)

            {

				if(ch<=255)

					return internal::lookup_tables<0>::lookup_text_pure_with_ws[static_cast<unsigned char>(ch)];

				else

					return 1;

            }

        };

        // Detect attribute value character

        template<Ch Quote>

        struct attribute_value_pred

        {

            static unsigned char test(Ch ch)

            {

                if (Quote == Ch('\''))

					if(ch<=255)

						return internal::lookup_tables<0>::lookup_attribute_data_1[static_cast<unsigned char>(ch)];

					else

						return 1;

                if (Quote == Ch('\"'))

					if(ch<=255)

						return internal::lookup_tables<0>::lookup_attribute_data_2[static_cast<unsigned char>(ch)];

					else

						return 1;

                return 0;       // Should never be executed, to avoid warnings on Comeau

            }

        };

        // Detect attribute value character

        template<Ch Quote>

        struct attribute_value_pure_pred

        {

            static unsigned char test(Ch ch)

            {

                if (Quote == Ch('\''))

					if(ch<=255)

						return internal::lookup_tables<0>::lookup_attribute_data_1_pure[static_cast<unsigned char>(ch)];

					else

						return 1;

                if (Quote == Ch('\"'))

					if(ch<=255)

						return internal::lookup_tables<0>::lookup_attribute_data_2_pure[static_cast<unsigned char>(ch)];

					else

						return 1;

                return 0;       // Should never be executed, to avoid warnings on Comeau

            }

        };

rapidxml对unicode的支持的更多相关文章

各个系统和语言对Unicode的支持字符集和编码——Unicode(UTF&UCS)深度历险
http://www.cnblogs.com/Johness/p/3322445.html 各个系统和语言对Unicode的支持: Windows NT从底层支持Unicode(不幸的是,Window ...
C++的标准库函数默认都是操作字节，而不是字符，非常痛苦，所以引入了u16string和u32string（Linux上的wchar_t是32位的原因，utf16对unicode的支持是有缺陷的）good
时至今日,字符串使用unicode已经是不需要理由的常识,但对一些有着悠久历史的编程语言来说,这仍然是个头痛的问题.如果抛开第三方库的支持,C++其实并不能实际有效地支持unicode,即使是utf8 ...
本地win7 把数组写入 txt 文本日志 json_encode转换中文,需要加上JSON_UNESCAPED_UNICODE 不适用unicode --仅仅支持php5.4以后
json_encode 改进为 json_encode_ex function json_encode_ex($value){ if (version_compare(PHP_VERSION, '5 ...
Erlang的Unicode支持
在R13A中, Erlang加入了对Unicode的支持.本文涉及到的数据类型包括:list, binary, 涉及到的模块包括stdlib/unicode, stdlib/io, kernel/fi ...
[Erlang 0124] Erlang Unicode 两三事 - 补遗
最近看了Erlang User Conference 2013上patrik分享的BRING UNICODE TO ERLANG!视频,这个分享很好的梳理了Erlang Unicode相关的问题,基本 ...
【Windows编程】系列第四篇：使用Unicode编程
上一篇我们学习了Windows编程的文本及字体输出,在以上几篇的实例中也出现了一些带有“TEXT”的Windows宏定义,有朋友留言想了解一些ANSI和Unicode编程方面的内容,本章就来了解和学习 ...
boost::spirit unicode 简用记录
本文简单记录使用boost::spirit解析有中文关键字的字符串并执行响应动作,类似于语法分析+执行. 关键字:字符串解析 boost::spirit::qi::parse qi::unicode: ...
彻底搞定char/wchar_t/unicode
彻底搞定char/wchar_t!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (2013-07-17 10:18:28) 转载▼ 从char/wchar_t到TCHAR(1) ...
[Python] 中文编码问题：raw_input输入、文件读取、变量比较等str、unicode、utf-8转换问题
最近研究搜索引擎.知识图谱和Python爬虫比较多,中文乱码问题再次浮现于眼前.虽然市面上讲述中文编码问题的文章数不胜数,同时以前我也讲述过PHP处理数据库服务器中文乱码问题,但是此处还是准备简单做下 ...

随机推荐

HW5.21
import java.util.Scanner; public class Solution { public static void main(String[] args) { Scanner i ...
如果iis的配置文件 applicationHost.config坏掉了，会在 C:\inetpub\history\ 中存储历史备份。复制过去还原就可以了-摘自网络
You will usually get the error ‘Configuration file is not well-formed XML’ ‘C:\Windows\system32\inet ...
(太强大了) - Linux 性能监控、测试、优化工具
转: http://www.vpsee.com/2014/09/linux-performance-tools/ Linux 平台上的性能工具有很多,眼花缭乱,长期的摸索和经验发现最好用的还是那些久经 ...
IOS GCD 使用 (二)
上一节,主要介绍了GCD的基本的概念,这节将用代码深入详细介绍GCD的使用. 一使用介绍 GCD的使用主要分为三步:创建代码块;选择或创建合适的分发队列;(同步.异步方式)向分发队列提交任 ...
SGU107——987654321 problem
For given number N you must output amount of N-digit numbers, such, that last digits of their square ...
OR 改写union数据变少
<pre name="code" class="sql">SQL> SELECT deptno FROM emp WHERE mgr = 76 ...
Ubuntu 12.04 升级到14.04之后，pidgin-sipe 出现的问题： Trouble with the pidgin and self-signed SSL certificate
Once again, I run into trouble when upgrading my LinuxMint. In last few days, my Linux mint notifies ...
Codeforces 439D Devu and his Brother 三分
题目链接:点击打开链接 = - =曾经的三分姿势不对竟然没有被卡掉,,,太逗.. #include<iostream> #include<string> #include< ...
dmesg 程序崩溃调试2
dmesg命令基于缓冲区打印信息dmesg -c可以清除该内存信息清除后demsg 命令不显示任何信息,但可以到/var/log/dmesg查看信息 dmesg |tail 20dmesg |head ...
android自定义View之仿通讯录侧边栏滑动，实现A-Z字母检索
我们的手机通讯录一般都有这样的效果,如下图: OK,这种效果大家都见得多了,基本上所有的android手机通讯录都有这样的效果.那我们今天就来看看这个效果该怎么实现. 一.概述 1.页面功能分析整体 ...

rapidxml对unicode的支持

rapidxml对unicode的支持的更多相关文章

随机推荐

热门专题