May 31, 2016

Calling a virtual method through an interface always was a lot slower than calling a static method through an interface. But why is that? Sure, the virtual method call costs some time, but comparing it with the difference of a normal static and virtual method call shows that the timings diverge too much.

i7-4790 3.6GHz
10,000,000 calls to empty method
Instance call Interface call
Static method 12 ms 17 ms
Virtual method 17 ms 164 ms

Let’s assume we have this declaration:

type
  IMyInterface = interface
    procedure Test(A, B: Integer);
  end;
  TTest = class(TInterfacedObject, IMyInterface)
  public
    procedure Test(A, B: Integer); virtual;
  end;

The compiler will generate a helper function for the interface method “Test”. This helper converts the “MyIntf” interface reference in the call “MyIntf.Test()” to the object reference behind the interface and then jumps to the virtual method.

add eax,-$0C   // convert the interface reference to the object reference
push eax       // save the object reference on the stack
mov eax,[eax]  // access the VMT
mov eax,[eax]  // get the “Test” VMT method address
xchg [esp],eax // swap the object ref on the stack with the method address
ret            // do the jump to the method address

This is very slow as you can see in the table above. If you know the “XCHG mem,reg” instruction, then you also know that it has an implicit “CPU LOCK” that slows down the method call a lot. But why is it using the XCHG instruction in the first place? Well, we are in between a method call. All the parameters are already loaded in to EAX, EDX and ECX. So we can’t use those to do the swap. The only way is to use the stack as temporary variable, and XCHG seemed to be the choice of the compiler engineer at the time interfaces were introduced to Delphi.

Let’s change that code to not use XCHG.

add eax,-$0 C      // convert the interface reference to the object reference
push eax           // reserve space for the method address used by RET
push eax           // save the object reference on the stack
mov eax,[eax]      // access the VMT
mov eax,[eax]      // get the “Test” VMT method address
mov [esp+04],eax   // write the method address to the reserved space
pop eax            // restore the object reference
ret                // do the jump to the method address

i7-4790 3.6GHz
10,000,000 calls to empty method
Instance call Interface call
Static method 12 ms 17 ms
Virtual method 17 ms 99 ms
Virtual method (XCHG)   164 ms

This is a lot faster, but still slow compared to the “Instance call”. The helper has a lot of memory accesses, but they shouldn’t slow it that much down, especially not in a tight loop when everything comes from the CPU’s cache.

So where does the code spend the time? Well, modern CPUs (after P1) have a feature called “return stack buffer”. The CPU puts the return address on the “return stack buffer” for every CALL instruction so it can predict where the RETinstruction will jump to. This requires that every CALL is matched by a RET. But wait, the helper uses a RET for an indirect jump. We have the CALL from the interface method call, the RET in the helper and the RET in the actual method. That doesn’t match up. In other words, this helper renders the “return stack buffer” invalid what comes with a performance hit because the CPU can’t predict where to jump.

Let’s see what happens if we replace the RET with a JMP.

add eax,-$0C        // convert the interface reference to the object reference
push eax            // save the object reference on the stack
mov eax,[eax]       // access the VMT
push DWORD PTR [eax]// save the “Test” VMT entry method address on the stack
add esp,$04         // skip the method address stack entry
pop eax             // restore the object reference
jmp [esp-$08]       // jump to the method address

i7-4790 3.6GHz
10,000,000 calls to empty method
Instance call Interface call
Static method 12 ms 17 ms
Virtual method 17 ms 24 ms
Virtual method (RET)   99 ms
Virtual method (XCHG)   164 ms

UPDATE: As fast as this implementation may be, it has a problem. As Allen and Mark pointed out, it accesses memory on the stack that is treated as free memory from the system. So if a hardware interrupt is triggered between the “add esp,$04” and the “jmp [esp-$08]”, the data on the stack is overwritten and the jump will end somewhere but not where it should be.

UPDATE 2: Thorsten Engler sent me an e-mail that invalidates the “hardware interrupt problem”. All interrupts are handled in kernel mode and kernel mode code doesn’t touch the user stack. The CPU itself switches the SS:ESP before invoking the interrupt handler.

Based on AMD64 Architecture Programmer’s Manual Volume 2 – System Programming Rev.3.22 Section 8.7.3 Interrupt To Higher Privilege:

When a control transfer to an exception or interrupt handler running at a higher privilege occurs (numerically lower CPL value), the processor performs a stack switch using the following steps:

  1. The target CPL is read by the processor from the target code-segment DPL and used as an index into the TSS for selecting the new stack pointer (SS:ESP). For example, if the target CPL is 1, the processor selects the SS:ESP for privilege-level 1 from the TSS.
  2. Pushes the return stack pointer (old SS:ESP) onto the new stack. The SS value is padded with two bytes to form a doubleword.

Category: Delphi

Post navigation

← System.ByteStrings for 10.1 BerlinCastalia’s Clipboard history + TRichEdit = IDE deadlock →

20 thoughts on “What’s wrong with virtual methods called through an interface”

    1. Ian BarkerMay 31, 2016

      I’m starting to think you might be a witch or at the very least you’ve promised one or more body parts to the evil demons of The Dark Arts.

      This is an INCREDIBLE find, a huge performance improvement for virtual methods – seriously, look at the figures!

      Good catch, again. 

       
    2. BasMay 31, 2016

      Nasty problem…. what a shame you didn’t have a solution for us delphi developers.
      Let’s hope the compiler builders pick it up quickly

       
    3. Allen BauerMay 31, 2016

      While your code is certainly faster, it does reference memory *above* the stack pointer. If you consider the stack as a mark-and-release memory allocator, the memory above the stack pointer is *freed* memory. This code is referencing freed memory. For the CPU stack this is not considered universally safe at all.

       
      1. Andreas Hausladen Post authorMay 31, 2016

        At least the freed stack slot was accessed before it is read, so the memory page is already committed.

         
        1. Mark GriffithsJune 1, 2016

          I could well be wrong on this, but I think that the issue that Allen is alluding to is that between: add esp,$04 and the jump, if a CPU interrupt occurs, the contents of the stack that you’re going to use later will be overwritten – leading to unpredictable crashes…

           
          1. Andreas Hausladen Post authorJune 1, 2016

            Ah, the interrupts. Yes that could be a problem here.

             
        2. Allen BauerJune 1, 2016

          That isn’t the problem. As Mark pointed out, interrupts can scribble over the area above the stack pointer, rendering that data invalid.

          while there are valid reasons to make the implementing method virtual, but for the most part it isn’t necessary. By definition interfaces already are virtual.

          Instances where it is valid is where you are adding interface implementations to an existing hierarchy of classes and still want to preserve the normal inheritance and virtual override functionality. Sometimes you want to “hide” the notion of an interface since it’s only used internal to the class hierarchy.

          If you think the virtual method thunk is expensive, try it with a dynamic method 

           
    4. Hallvard VassbotnJune 1, 2016

      Well, people should not be mixing interface methods and virtual methods anyway – it is a kind of two-level indirection that doesn’t really provide anything useful – other than slowing down the code, of course… 

       
      1. Stefan GlienkeJune 1, 2016

        If I have an abstract base class that implements a given number of interfaces and I build a class hierarchy on top of that whats wrong with that?

         
        1. Dave NottageJune 2, 2016

          Stole the words from my mouth, and I have at least one situation that does it. Having said that, the calls are not made often, so the performance hit is not much of a concern.

           
        2. Hallvard VassbotnJune 2, 2016

          Performance. And mixed abstraction models.

          To “override” a method you should consciously know that you are modifying the behavior of a contract (the interface), thus you should include the interface in the sub-class and re-implement the method. No virtual needed.

           
    5. Hallvard VassbotnJune 1, 2016

      But a very nice find and write-up! 

       
    6. RafaelJune 2, 2016

      When I test it under 32 bits,
      I get similar results but if I test it on 64 bit, I do not get such slowdowns. I would say the 64 Bit compilers works in this case more intelligent 

       
    7. RafaelJune 2, 2016

      I tested this case under Delphi XE 7 and get same results when I test it under 32 bits, but if I test it on 64 bit, I do not get such slowdowns. I would say in this case the 64-bit compiler operates smarter.

       
    8. Andreas Hausladen Post authorJune 2, 2016

      The Win64 compiler has more CPU registers available and doesn’t need to use the stack to jump to the virtual method.

       
    9. egJune 2, 2016

      Does this issue impact iOS implemetation?

       
      1. Andreas Hausladen Post authorJune 2, 2016

        Did I show any ARM assembler code? This is Win32 only.

         
    10. Mark GriffithsJune 3, 2016

      I’ve been trying to think of a possible solution to this problem.

      One thought that I had was to use self modifying code – i.e. instead of storing the destination address on the stack, you could update the address in memory for an unconditional jump. The downside to this would be that the code then wouldn’t be thread safe.

      My next thought was to have thread specific memory allocated so that each thread had it’s own section of memory for the self modifying code – the problem with this is that you’re then back to having to determine an address and jump to it without having enough registers or being able to use the stack.

      A possible solution then would be to store the address on the stack, jump to the self modifying code which first adjusts the stack pointer to remove the temporary address and then do the unconditional jump.

      I think that this would work and be safe, but it does increase the overheads and introduces possible problems from using self modifying code – either or which could end up causing more problems than we started with…

       
      1. Andreas Hausladen Post authorJune 3, 2016

        I tried to use a “CALL” that jumps to the next line, so that the CALL+RET match, but modifying the return address on the stack already kills the return address prediction.

        What the compiler could do is, if the method has only one parameter, the ECX register would be free to use. The XCHG can be removed in all cases.

        The only good solution (for Win32) is to not use virtual methods in interfaces at all if this is an actual bottleneck in the application. In most cases it isn’t because the other code takes much more time.

         
    11. Arthur HoornwegJune 3, 2016

      Hi Andreas,
      I think the problem can be easily overcome by using explicit interface method resolution. Let the interface point to a non-virtual method that chains into the virtual method.

      Regards,
      Arthur Hoornweg
      ———————-

      Type imyinterface=interface
      Procedure Dosomething;
      end;

      Type tMyclass=class(tInterfacedObject, iMyInterface)
      Procedure DoSomething; VIRTUAL; {virtual/slow}
      Procedure DoSomethingFast; {fast}
      Procedure iMyInterface.DoSomething =DoSomethingFast;
      End;

      Procedure tMyclass.DoSomethingFast;
      begin
      DoSomething; //chain into virtual method
      end;

https://andy.jgknet.de/blog/2016/05/whats-wrong-with-virtual-methods-called-through-an-interface/

What’s wrong with virtual methods called through an interface的更多相关文章

  1. why do we need virtual methods in C++?

    http://stackoverflow.com/questions/2391679/why-do-we-need-virtual-methods-in-c Basic idea: when mark ...

  2. QCustomplot使用分享(三) 图

    一.可以实现的图 相对于其他绘制图表的第三方库来说,QCustomPlot算是比较轻量的,不仅仅能实现功能,而且二次开发比较容易.下面我们来具体说下他可以实现那些图 QCPGraph:折线图,Line ...

  3. 【转载】#349 - The Difference Between Virtual and Non-Virtual Methods

    In C#, virtual methods support polymorphism, by using a combination of the virtual and override keyw ...

  4. CLR via C# 3rd - 08 - Methods

       Kinds of methods        Constructors      Type constructors      Overload operators      Type con ...

  5. 8.Methods(一)

    1.Instance Constructors and Classes (Reference Types) Constructors methods : 1.allow an instance of ...

  6. (转) Virtual function

    原文地址:http://en.wikipedia.org/wiki/Virtual_function In object-oriented programming, a virtual functio ...

  7. Should I expose asynchronous wrappers for synchronous methods?

    Lately I've received several questions along the lines of the following, which I typically summarize ...

  8. why pure virtual function has definition 为什么可以在基类中实现纯虚函数

    看了会音频,无意搜到一个frameworks/base/include/utils/Flattenable.h : virtual ~Flattenable() = 0; 所以查了下“纯虚函数定义实现 ...

  9. JVM Specification 9th Edition (3) Chapter 2. The Structure of the Java Virtual Machine

    Chapter 2. The Structure of the Java Virtual Machine 内容列表 2.1. The class File Format (class文件的格式) 2. ...

随机推荐

  1. xslt 映射 xml

    1.xslt文件映射xml文件中的A节点的时候,如果A节点有属性的话,先把属性值映射出来,然后再映射节点的值,如下: xml文件: <A age="11" sex=" ...

  2. 基于jquery垂直缩略图切换相册

    今天给大家分享一款垂直缩略图切换jQuery相册,这是一款垂直缩略图左右滚动切换响应式jQuery图片相册代码.该 插件适用浏览器:IE8.360.FireFox.Chrome.Safari.Oper ...

  3. [boostrap]debian下为arm创建debian和emdebian文件系统

    转自:http://www.cnblogs.com/qiaoqiao2003/p/3738552.html Debian系统本身包含对arm的支持,其包含的软件包最多,但是最终的文件系统要大一些. e ...

  4. 一只青蛙一次可以跳上1级台阶,也可以跳上2级……它也可以跳上n级。求该青蛙跳上一个n级的台阶总共有多少种跳法。

    // test14.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include<iostream> #include< ...

  5. Idea上配置btm

    1.  先在eclipse中配置好项目,再讲配置好的项目导入到idea中

  6. 终极方法,pjsip发起多方对讲出错Too many objects of the specified type (PJ_ETOOMANY)

    http://blog.csdn.net/zhangjm_123/article/details/26727221 —————————————————————————————————————————— ...

  7. Linux - 静默安装oracle数据库总结

    Web服务器上面的Linux一般是不会有图形界面的,所有通过图形界面来安装Linux的方式在没有图形界面的Linux上面是行不通的,我们要使用的安装方式叫做Linux的静默安装.即在没有图形界面的Li ...

  8. java开发总体知识复习

    上一篇发了一个找工作的面经, 找工作不宜, 希望这一篇的内容能够帮助到大家. 对于这次跳槽找工作, 我准备了挺长的时间, 其中也收集了很多比较好的笔试面试题, 大都是一些常用的基础, 很多都是由于时间 ...

  9. 【BZOJ】1611: [Usaco2008 Feb]Meteor Shower流星雨(bfs)

    http://www.lydsy.com/JudgeOnline/problem.php?id=1611 一眼题,bfs. #include <cstdio> #include <c ...

  10. c#后台修改前台DOM的css属性示例代码

    <div id = 'div1' runat="server">haha</div> ----------- 后台代码中这样调用 div1.Style[&q ...