原始连接:http://rvelthuis.blogspot.tw/2018/01/strings-on-other-platforms-than-32-bit.html

Strings too slow outside WIN32?

 

In a recent debate I had it was said that strings in the Win64 runtime are too slow to be useful. That is, in my opinion, a gross exaggeration. It is true that the Win32 runtime library (RTL) has benefited a lot from the work of the FastCode project, usually with routines in extremely clever assembler. For all other platforms, often the routines are in plain Object Pascal, so no assembler is being used. Also, far fewer routines have been replaced by clever implementations.

One very obvious example of this is the Pos function, which searches if a certain string (I call that the Needle) can be found in a larger one (the Haystack). The Win32 implementation is in highly optimized assembler, written by Aleksandr Sharahov from the FastCode project, and licensed by CodeGear. The Win64 implementation is in plain Pascal (PUREPASCAL). But the implementation for UnicodeString is not the same, or even similar, to the implementation for AnsiString!

The implementation for UnicodeString is slower than the same routine for Win32. On my system a search in Win64 takes approx. 1.8 × the time it needs in Win32. On Win32, Pos for AnsiString is about as fast (or sometimes even slightly faster than) Pos for UnicodeString. But on Win64, Pos for AnsiString takes 2 × the time Pos for UnicodeString needs!

If you look at the sources in System.pas, you'll see that the Unicode version is slightly better optimized (searching for the first Char in the Needle first, and only checking the rest if a match was found).

For fun, I took the code for the UnicodeString implementation and converted it to work for AnsiString. It was slightly faster than System.Pos for UnicodeString, instead of 2 times as slow. I wonder why, in System.pas, the AnsiString implementation does not simply use the same code as that for UnicodeString, like I did. If I were a suspicious person, I would think it was done on purpose, to deprecate AnsiString by making it less usable.

But even that can be improved upon. I wrote three implementations of my own routine, one for AnsiString, one for UnicodeString and one for TBytes (many people have complained that TBytes lacks something like Pos and that was the reason they maintained the incredibly bad habit of using strings to store binary data — <shudder> — I wanted to take away that silly argument).

Code

Here is the code for my RVPosExA function (for what it's worth: these days, there is no difference between PosEx and Pos anymore: both have the exact same functionality and signature):

function RVPosExA(const Needle, Haystack: AnsiString;
Offset: Integer = 1): Integer;
type
PUInt32 = ^UInt32;
PUInt16 = ^UInt16;
{$IFNDEF CPU32BITS}
var
LNeedleTip: UInt32;
PNeedle: PAnsiChar;
PHaystack, PEnd: PAnsiChar;
LLenNeedle: Integer;
LCmpMemOffset: Integer;
{$ENDIF}
begin
{$IFDEF CPU32BITS}
// FastCode (asm) implementation.
Result := System.Pos(Needle, Haystack, Offset);
{$ELSE}
if Offset - 1 + Length(Needle) > Length(Haystack) then
Exit(0);
Result := 0;
PHaystack := PAnsiChar(Haystack) + Offset - 1;
PEnd := PHaystack + Length(Haystack) - Length(Needle) + 1;
case Length(Needle) of
0: Exit(0);
1:
begin
LNeedleTip := PByte(Needle)^;
while PHaystack < PEnd do
if
PByte(PHaystack)^ = LNeedleTip then
Exit(PHaystack - PAnsiChar(Haystack) + 1)
else
Inc(PHaystack);
Exit(0);
end;
2:
begin
LNeedleTip := PUInt16(Needle)^;
while PHaystack < PEnd do
if
PUInt16(Haystack)^ = LNeedleTip then
Exit(PHayStack - PAnsiChar(Haystack) + 1)
else
Inc(PHaystack);
Exit(0);
end;
3:
begin
LNeedleTip := PUInt32(Needle)^; // if Needle is length 3, then top byte
// is the #0 terminator

while PHaystack < PEnd do
if ((PUInt32(Haystack)^ xor LNeedleTip) and $FFFFFF) = 0 then
Exit(PHaystack - PAnsiChar(Haystack) + 1)
else
Inc(PHaystack);
Exit(0);
end;
4:
begin
LNeedleTip := PUInt32(Needle)^;
while PHaystack < PEnd do
if PUInt32(Haystack)^ = LNeedleTip then
Exit(PHaystack - PAnsiChar(Haystack) + 1)
else
Inc(PHaystack);
Exit(0);
end;
else
begin

LCmpMemOffset := SizeOf(UInt32) div SizeOf(AnsiChar);
PNeedle := PAnsiChar(Needle) + LCmpMemOffset;
LLenNeedle := Length(Needle) - LCmpMemOffset;
LNeedleTip := PUInt32(Needle)^;
while PHaystack < PEnd do
if (PUInt32(PHaystack)^ = LNeedleTip) and
CompareMem(PHaystack + LCmpMemOffset, PNeedle, LLenNeedle) then
Exit(PHaystack - PAnsiChar(Haystack) + 1)
else
Inc(PHaystack);
end;
end;
{$ENDIF}
end;

As you can see, under Win32, it simply jumps to System.Pos, as that is the fastest anyway. But on all other platforms, it searches the Haystack 4-byte-wise (if the Needle is larger than 4 elements), and if it found something, then it searches the rest using CompareMem.

Timing

Here is a slightly reformatted output of a test program (I put the WIN32 and the WIN64 columns beside each other, to save space):

Different versions of Pos(Needle, Haystack: <sometype>; Offset: Integer): Integer
where <sometype> is UnicodeString, AnsiString or TBytes Testing with Haystack lengths of 50, 200, 3000, 4000 and 300000
and Needle lengths of 1, 3, 8 and 20
5 * 4 * 2000 = 40000 loops WIN64 WIN32 UnicodeString UnicodeString
------------- -------------
System.Pos: 2428 ms System.Pos: 1051 ms
StrUtils.PosEx: 2258 ms StrUtils.PosEx: 1070 ms
RVPosExU: 1071 ms RVPosExU: 1050 ms AnsiString AnsiString
---------- ----------
System.Pos: 4956 ms System.Pos: 1046 ms
AnsiStrings.PosEx: 4959 ms AnsiStrings.PosEx: 1051 ms
OrgPosA: 5129 ms OrgPosA: 5712 ms
PosUModForA: 1958 ms PosUModForA: 3744 ms
RVPosExA: 1322 ms RVPosExA: 1086 ms TBytes TBytes
------ ------
RVPosEXB: 998 ms RVPosEXB: 2754 ms Haystack: random string of 500000000 ASCII characters or bytes
Needle: last 10 characters of Haystack = 'WRDURJVDFA' WIN64 WIN32 UnicodeString UnicodeString
------------- -------------
System.Pos: 847 ms System.Pos: 421 ms
Strutils.PosEx: 827 ms Strutils.PosEx: 414 ms
RVPosExU: 421 ms RVPosExU: 438 ms AnsiString AnsiString
---------- ----------
System.Pos: 1735 ms System.Pos: 428 ms
AnsiStrings.PosEx: 1831 ms AnsiStrings.PosEx: 428 ms
OrgPosA: 1749 ms OrgPosA: 2687 ms
PosUModForA: 708 ms PosUModForA: 1525 ms
RVPosExA: 368 ms RVPosExA: 423 ms
RvPosExA(,,Offset): 200 ms RvPosExA(,,Offset): 220 ms TBytes TBytes
------ ------
RVPosExB(TBytes): 385 ms RVPosExB(TBytes): 1095 ms

The routines RVPosExA, RVPosExU and RVPosExB are my implementations for AnsiString, UnicodeString and TBytes respectively. OrgPosA is the original code for Pos for AnsiString, while PosUModForA is the original PUREPASCAL code for Pos for UnicodeString, modified for AnsiString.

As you can see, the PosUModForA routine is almost twice as fast as the rather braindead OrgPosA, and in WIN32, the RVPosEx<A/U/B> implementations are faster than the others.

I didn't check, but it is well possible that one of the plain Pascal versions of the FastCode project is faster. But for me, this implementation is a start and proof, that with a few simple optimizations string routines could be made faster. Perhaps, one day, Embarcadero will adopt more of the plain Pascal code from the FastCode project.

The code for the routines and the program that produces the output above can be downloaded from my website.

delphi 中的win32 以外到平台的字符串处理一定慢吗?(转载)的更多相关文章

  1. BCB/Delphi中常用的VCL函数说明(字符串函数)

    本文档是ccrun(老妖)根据网上资料整理而成. --------------------内存分配--------------------函数名称:AllocMem函数说明:在队中分配指定字节的内存块 ...

  2. Delphi中的关键字与保留字

    Delphi中的关键字与保留字 分类整理 Delphi 中的“关键字”和“保留字”,方便查询 感谢原作者的收集整理! 关键字和保留字的区别在于,关键字不推荐作标示符(编译器已经内置相关函数或者留给保留 ...

  3. Delphi中SendMessage使用说明(所有消息说明) good

    Delphi中SendMessage使用说明 SendMessage基础知识 函数功能:该函数将指定的消息发送到一个或多个窗口.此函数为指定的窗口调用窗口程序,直到窗口程序处理完消息再返回.而函数Po ...

  4. delphi中SendMessage使用说明

    SendMessage基础知识 函数功能:该函数将指定的消息发送到一个或多个窗口.此函数为指定的窗口调用窗口程序,直到窗口程序处理完消息再返回.而函数PostMessage不同,将一个消息寄送到一个线 ...

  5. Delphi中编辑word

      其他(28)   //启动Word   try     wordapplication1.connect;   except     messagedlg('word may not be ins ...

  6. Delphi中代替WebBrowser控件的第三方控件

    这几天,接触到在delphi中内嵌网页,用delphi7自带的TWebBrowser控件,显示的内容与本机IE8显示的不一样,但是跟装IE8之前的IE6显示一个效果.现在赶脚是下面两个原因中的一个: ...

  7. [转]Delphi中,让程序只运行一次的方法

    program onlyRunOne; uses Forms,Windows,SysUtils, Dialogs, Unit1 in 'Unit1.pas' {Form1}; {$R *.res} v ...

  8. 在C#中使用 Win32 和其他库

    C# 用户经常提出两个问题:“我为什么要另外编写代码来使用内置于 Windows® 中的功能?在框架中为什么没有相应的内容可以为我完成这一任务?”当框架小组构建他们的 .NET 部分时,他们评估了为使 ...

  9. DELPHI语法基础学习笔记-Windows 句柄、回调函数、函数重载等(Delphi中很少需要直接使用句柄,因为句柄藏在窗体、 位图及其他Delphi 对象的内部)

    函数重载重载的思想很简单:编译器允许你用同一名字定义多个函数或过程,只要它们所带的参数不同.实际上,编译器是通过检测参数来确定需要调用的例程.下面是从VCL 的数学单元(Math Unit)中摘录的一 ...

随机推荐

  1. GIT 命令集

    Git图形化界面 下面是我整理的常用 Git 命令清单.几个专用名词的译名如下. Workspace:工作区 Index / Stage:暂存区 Repository:仓库区(或本地仓库) Remot ...

  2. C# 关于委托

    例如: public class test:Form { //定义委托 public delegate void GetSql(string sql); //定义装载委托的属性 public GetS ...

  3. Lua的闭包详解(终于搞懂了)

    词法定界:当一个函数内嵌套另一个函数的时候,内函数可以访问外部函数的局部变量,这种特征叫做词法定界 table.sort(names,functin (n1,n2) return grades[n1] ...

  4. Jedis cluster集群初始化源码剖析

    Jedis cluster集群初始化源码剖析 环境 jar版本: spring-data-redis-1.8.4-RELEASE.jar.jedis-2.9.0.jar 测试环境: Redis 3.2 ...

  5. SQL Merge 语法 单表查询

    --项目中需要用到Merg语法,于是去网上查了资料,发现竟然都是多表查询,问题是我只有一张表,于是我纳闷了,后来我灵机一动,就搞定了!--表名:t_login(登录表)--字段:f_userName( ...

  6. how2j网站前端项目——天猫前端(第一次)学习笔记5

    收拾好心情,现在开始学习第5个页面——购物车页面! 一.结算按钮 这个还是比较简单的,我自己看着站长的样子模仿了一个: 有个地方不会做,就是全选前面的复选框,站长的框里面是白色的,我搞不来. 二.订单 ...

  7. iOS 网络操作与AFNetworking

    一.早前的几个网络框架 1.ASI框架: HTTP终结者.很牛, 但是有BUG, 已经停止更新. 2.MKNetworkKit (印度人写的). 3.AFN一直还在更新. AFNetworking的出 ...

  8. [z]一个SQL语句分清楚RANK(),DENSE_RANK(),ROW_NUMBER()三个排序的不同

    转自:http://blog.csdn.net/s630730701/article/details/51902762 在SCOTT用户下,执行下面SQL; SELECT s.deptno,s.ena ...

  9. 谷歌开发的draco格式文件将obj文件压缩成drc文件后将大大减小文件大小(threejs加载有mtl文件的drc文件)

    问题描述:当前threejs是92版本 但是当前版本还没有能够直接加载带贴图文件的drc格式的loader: 解决办法:先加载mtl文件将obj文件分解(按照mtl文件内材质贴图信息进行分解)再将分解 ...

  10. jQuery封装和优化

    封装和优化插件 --封装插件 (function($){ //自定义插件代码 })(jQuery) --------------- (function($){ $.fn.extend({ //函数列表 ...