Matt Pietrek

Download the code for this article: Hood0101.exe (45KB)
W
ay back in my October 1996 column in MSJ, I addressed a question concerning the size of executable files. Back then, a simple Hello World program compiled to a 32KB executable. Two compiler versions later, the problem is only slightly better. The same program with the Visual C++® 6.0 compiler is now 28KB.
      In that column, I provided a replacement runtime library that lets you create very small executable programs. There were some restrictions on what situations it was useful for, but for a large number of my own programs it worked well. After living with these restrictions for quite a while, I decided it was time to fix some of them. Making these modifications also happens to provide a great opportunity to describe a little-known linker option that can be used to further reduce program size.

EXE and DLL Size

Before jumping into the code for my replacement runtime library, it's worth taking the time to review why simple EXEs and DLLs are bigger than you might expect. Consider the canonical Hello World program:

#include <stdio.h>

void main()
{
printf ("Hello World!\n" );
}

Let's compile this program for size, and generate a map file. Using the command-line Visual C++ compiler, the syntax would be:

Cl /O1 Hello.CPP /link /MAP

First, look at the .MAP file; a trimmed down version is shown in Figure 1. From looking at the addresses of main (0001:00000000) and of printf (0001:0000000C), you can infer that function main's code is only 0xC bytes in length. Looking at the last line of the file, the __chkstk function at address 0001:00003B10, you can also infer that there's at least 0x3B10 bytes of code in the executable. That's over 14KB of code to send Hello World to the screen.
      Now, start looking through some of the other .MAP file lines. Some items make sense, for example, the __initstdio function. After all, printf writes its output to a file, so some amount of underlying runtime library support routines for stdio makes sense. Likewise, it's reasonable to expect that the printf code might call strlen, so its inclusion isn't a surprise.
      However, take a look at some of the other functions, for instance __sbh_heap_init. This is the initialization function for the runtime library's small block heap. The Win32®-based operating systems offer up their own heap in the form of the HeapAlloc family of functions. Potential performance gains notwithstanding, the Visual C++ library could choose to use the Win32 heap APIs, but doesn't. Thus, you end up with more code than necessary in your executable.
      While some people might not care that the runtime library implements its own heap, there are other less defensible examples. Consider the __crtMessageBoxA function near the bottom of the map file. This function allows the runtime library to call the MessageBox API without forcing the executable to link against USER32.DLL. For a simple Hello World program, it's hard to anticipate the need to call MessageBox.
      Consider another example: the __crtLCMapStringA function, which does locale-dependent transformations of strings. While Microsoft is somewhat obligated to provide locale support, it's not really needed for a large number of programs. Why make programs that don't use locales pay the cost for those that do?
      I could continue with other examples of unneeded code, but I've made my point. A typical small program contains lots of little nuggets of code that aren't used. By themselves, they don't contribute much to the code size, but add up all the cases and you're into serious amounts of code!

What About the C++ Runtime Library DLL?

Alert readers might say, "Hey Matt! Why don't you just use the DLL version of the runtime library?" In the past, I could make the argument that there was no consistently named version of the C++ runtime library DLL available on Windows® 95, Windows 98, Windows NT 3.51, Windows NT® 4.0, and so forth. Luckily, we've moved past those days, and in most cases you can rely on MSVCRT.DLL being available on your target machines.
      Making this switch and recompiling Hello.CPP, the resulting executable is now only 16KB. Not bad, but you can do better. More importantly, you're just shifting all of this unneeded code to someplace else (that is, to MSVCRT.DLL). In addition, when your program starts up, another DLL will have to be loaded and initialized. This initialization includes items like locale support, which you may not care about. If MSVCRT.DLL suits your needs, then by all means use it. However, I believe that using a stripped-down, statically linked runtime library still has merit.
      I may be tilting at windmills here, but my e-mail conversations with readers show that I'm not alone. There are people out there who want the leanest possible code. In this day of writeable CDs, DVDs, and fast Internet connections, it's easy not to worry about code size. However, the best Internet connection I can get at home is only 24Kbps. I hate wasting time downloading bloated controls for a Web page.
      As a matter of principle, I want my code to have as small a footprint as possible. I don't want to load any extra DLLs that I don't really need. Even if I might need a DLL, I'll try to delayload it so that I don't incur the cost of loading it until I use the DLL. Delayloading is a topic I've described in previous columns, and I strongly encourage you to become familiar with it. See Under the Hood in the December 1998 issue of MSJ for starters.

Digging Deeper

Now that I've beaten up the unneeded code within the program, let's turn to the executable file itself. If you were to run DUMPBIN /HEADERS on my Hello.EXE, you'd see the following two lines in the output:

1000 section alignment
1000 file alignment

The second line is interesting. It says that every code and data section in the executable is aligned on a 4KB (0x1000) byte boundary. Because sections are stored contiguously in a file, it's not hard to see the potential for wasting up to 4KB between the end of one section and the start of the next.
      If I had linked the program with a version of the linker that came before Visual C++ 6.0, I would have seen something different, as you see here:

1000 section alignment
200 file alignment

The key difference is that the alignment between sections is only 512 bytes (0x200). There's much less space available to waste. In Visual C++ 6.0, the linker defaults were changed to make the file alignment of sections equal to the alignment in memory. This provides a slight load-time performance improvement on Windows 9x, but makes executables bigger.
      Luckily, the Visual C++ linker has a way to go back to the previous behavior. The magic switch is /OPT:NOWIN98. Rebuilding Hello.CPP as before, but with the addition of this linker switch gets the executable file down to 21KB—a savings of 7KB. If I switch to linking with MSVCRT.DLL and using /OPT:NOWIN98, the executable size drops to 2560 bytes!

LIBCTINY: A Minimal Runtime Library

Now that you understand the problem of why simple EXEs and DLLs are so large, it's time to introduce my new and improved replacement runtime library. In the October 1996 column (mentioned earlier), I created a small static .LIB file designed to replace or augment the Microsoft LIBC.LIB and LIBCMT.LIB libraries. I called this replacement runtime library LIBCTINY.LIB, since it was a very stripped-down version of Microsoft's own runtime library sources.
      LIBCTINY.LIB is intended for simple applications that don't require a huge amount of runtime library support. Thus, it's not suitable for MFC applications or other complicated scenarios that make extensive use of the C++ runtime. LIBCTINY's ideal target is small programs or DLLs that call some Win32 APIs and perhaps display some simple output. 
      There are two guiding principles behind LIBCTINY.LIB. First, it replaces the standard Visual C++ startup routines with much simpler code. This simpler code doesn't refer to any of the more esoteric runtime library functions like __crtLCMapStringA. Because of this, much less extraneous code is linked into your binary. As I'll show shortly, the LIBCTINY routines perform a bare minimum of tasks before calling your WinMain, main, or DllMain routines.
      The second guiding principle of LIBCTINY.LIB is to implement relatively large functions like malloc or printf with code that's already in the Win32 system DLLs. Beyond the minimal startup code, most of the other LIBCTINY source files are simple implementations of standard C++ runtime library functions such as malloc, free, new, delete, printf, strupr, strlwr, and so on. Take a look at the implementation of printf in printf.cpp (see Figure 2) to get an idea of what I'm talking about.
      In my original version of LIBCTINY.LIB there were two restrictions that annoyed me. First, the original version did not support DLLs. You could make tiny console and GUI executable programs, but if you wanted to create a tiny DLL, you were out of luck. 
      Second, the original LIBCTINY did not support static C++ constructors and destructors. By this, I mean constructors and destructors declared at global scope. In the new version, I've added the basic code that implements this support. Along the way, I learned quite a bit about how the compiler and runtime library play a complicated game to make static constructors and destructors work.

The Dark Underbelly of Constructors

When the compiler processes a source file that has a static constructor, it generates two things. The first is a small blob of code with a name like $E2 that calls the constructor. The second thing the compiler emits is a pointer to this blob of code. This pointer is written to a specially named section in the .OBJ called .CRT$XCU. 
      Why the funny section name? It's a bit complicated. Let me throw another piece of data at you to help explain. If you examine the Visual C++ runtime library sources (for instance, CINITEXE.C), you'll find the following:

#pragma data_seg(".CRT$XCA")
_PVFV __xc_a[] = { NULL }; #pragma data_seg(".CRT$XCZ")
_PVFV __xc_z[] = { NULL };

The previous lines of code create two data segments, .CRT$XCA and .CRT$XCZ. In each segment it places a variable (__xc_a and __xc_z, respectively). Note that the segment names are very similar to the .CRT$XCU segment to which the compiler emits the constructor code pointer.
      At this point, a little linker theory is needed. When processing all of the segments to create the final portable executable (PE) file, the linker concatenates all the data from identically named segments. Thus, if A.OBJ has a section called .data, and B.OBJ also has a .data section, all the data from A.OBJ and B.OBJ will be written contiguously into a single .data section in the PE file.
      The use of a $ in a segment name puts a new twist on things. When encountering segment names with a $ in them, the linker treats the portion of the name preceding the $ as the final segment name. Thus, the .CRT$XCA, .CRT$XCU, and .CRT$XCZ segments all end up together in the final executable in a segment called .CRT.
      What about the part of the segment name following the $? When combining these types of sections, the linker writes out the segments in the order dictated by the string following the $. The ordering is alphabetical, so all the data from .CRT$XCA goes first, followed by all of the data from .CRT$XCU, and finally all of the data from .CRT$XCZ. This is a crucial point to understand.
      What's going on here is that the runtime library code has no idea how many static constructor calls are needed for a given EXE or DLL. However, it does know that only pointers to constructor code blobs will be in the .CRT$XCU segment. When the linker concatenates all the .CRT$XCU sections, it has the net effect of creating a function pointer array. By defining .CRT$XCA and .CRT$XCZ segments along with the __xc_a and __xc_z symbols, the runtime library can reliably locate the beginning and end of the function pointer array. 
      As you might expect, calling all the static constructors in a module is a simple matter of enumerating through the function pointer array, calling each pointer in turn. The routine that does this is _initterm, shown in Figure 3. This routine is identical to the version from the Visual C++ runtime library sources. 
      All things considered, getting static constructors to work in LIBCTINY was relatively easy. It was mostly a matter of defining the right data segments (specifically, .CRT$XCA and .CRT$XCZ), and calling _initterm from the correct spot in the startup code. Getting static destructors to work was a bit trickier.
      Unlike the function pointer array that the compiler and linker conspire to create for static constructors, the list of static destructors to call is built at runtime. To build this list, the compiler generates calls to the atexit function, which is part of the Visual C++ runtime. The atexit function takes a function pointer and adds the pointer to a first-in, last-out list. When the EXE or DLL unloads, the runtime library iterates through the list and calls each function pointer.
      LIBCTINY's implementation of the atexit functionality is significantly simpler than what the Visual C++ runtime library does. There are three functions and a handful of static variables for this support, which is also in initterm.cpp. The _atexit_init function simply allocates an array to hold 32 function pointers, and stores the pointer in the pf_atexitlist static variable.
      The atexit function checks to see if there's room in the array, and if so, adds the pointer to the end of the list. A more robust version of this code would reallocate the array to a larger size if necessary. Finally, the _DoExit function uses your friend, _initterm, to iterate through the array and call each function pointer. In an ideal world, _DoExit would iterate through the array in reverse order, mimicking the behavior of the Visual C++ runtime library implementation. However, the whole purpose of LIBCTINY is to be simple and small, rather than striving for perfect compatibility.

LIBCTINY's Minimal Startup Routines

Now let's take a look at LIBCTINY's new support for small DLLs. As with EXEs, the trick is to make the DLL's entry point code as small as possible and omit calls to unneeded routines that bring in lots of other code. Figure 4 shows the minimal DLL startup code. When your DLL is loaded, it is this code, not your DllMain routine, that executes first.
      The _DllMainCRTStartup is the very first place execution begins in your DLL. In LIBCTINY's implementation, it first checks to see if the DLL is in its DLL_PROCESS_ATTACH call. If so, the code calls _atexit_init (described earlier), and _initterm to invoke any static constructors. The heart of the function is the call to DllMain, which is the routine you supply as part of your DLL's code. This DllMain call is made for all four notification types (process attach/detach, and thread attach/detach).
      The last thing DllMainCRTStartup does is to check if the DLL is in its DLL_PROCESS_DETACH code. If so, the code calls _DoExit. As described earlier, this causes any static destructors to be called. If you're curious about the startup code for console and GUI mode EXEs, be sure to check out CRT0TCON.CPP and CRT0TWIN.CPP, respectively. (These modules accompany the code download, found at the link at the top of this article.)
      One other thing worth checking out in DLLCRTO.CPP (see Figure 4) is this line near the top:

#pragma comment(linker, "/OPT:NOWIN98")

This puts a linker directive into the DLLCRT0.OBJ file that tells the linker to use the /OPT:NOWIN98 switch. The benefit is that you don't have to manually add /OPT:NOWIN98 to your make files or project files by hand. I figure if you're using LIBCTINY, you'd probably want to use /OPT:NOWIN98 as well.

Using LIBCTINY.LIB

Using LIBCTINY is very simple. All you have to do is add LIBCTINY.LIB to the linker's list of .LIB files to search. If you're using the Visual Studio® IDE, this would be in the Projects | Settings | Link tab. It doesn't matter what type of binary you're building (console EXE, GUI EXE, or DLL), since LIBCTINY.LIB contains appropriate entry point routines for each of them. 
      Take a look at TEST.CPP in Figure 5. This program simply exercises a few of the routines that LIBCTINY.LIB implements, and includes a static constructor and destructor invocation. When I compile it normally with Visual C++ 6.0,

CL /O1 TEST.CPP

the resulting executable is 32768 bytes. By simply adding LIBCTINY.LIB to the command line

CL /O1 TEST.CPP LIBCTINY.LIB

the resulting executable shrinks to 3072 bytes.
      You might be wondering about the runtime library routines that LIBCTINY doesn't implement. For instance, in TEST.CPP, there's a call to strrchr. There's no problem here because that function exists in the regular LIBC.LIB or LIBCMT.LIB that Visual C++ provides. Both LIBCTINY.LIB and LIBC.LIB implement a variety of routines. LIBCTINY's list is obviously smaller than what LIBC.LIB provides. The important thing for your purposes is that the linker finds the LIBCTINY routines first when resolving function calls, and so LIBCTINY's routines are what's used. If something isn't implemented in LIBCTINY, the linker finds it in LIBC.LIB instead.
      Finally, it's worth repeating that LIBCTINY isn't suitable for all purposes. For example, if your code makes use of multiple threads and relies on the runtime library's per-thread data support, then LIBCTINY isn't for you. What I do is try LIBCTINY with a prospective program. If it works, great! If not, I simply use the normal runtime library.

Metadata Article Correction

In my October 2000 MSDN Magazine article "Avoiding DLL Hell: Introducing Application Metadata in the Microsoft .NET Framework," I said that using the Visual C++ 6.0 #import directive causes the compiler to read in a COM type library and generate ATL-ready header files for all the interfaces contained within. While header files are generated by #import, it turns out they don't use ATL.
      Richard Grimes, author of Professional ATL COM Programming (Wrox Press, 1998), kindly pointed out to me that #import generates what Microsoft calls "compiler COM support classes," which are supported by the COMDEF.H header. Richard goes on to say, "There are many differences between the COM compiler support classes and the equivalent in ATL. The most important is that ATL does not use C++ exceptions. In fact, the ATL classes are more lightweight than the COM compiler support classes and so I would have preferred if Microsoft had decided to generate ATL code."
      I have to confess that I should have studied this more before I wrote it. My experience with ATL is limited to the wizards in Visual C++, and tweaking the resulting code. I have used #import on a few occasions, but not enough to have made the connection that the resulting code wasn't ATL. Thanks to Richard for pointing this out to me, and for giving me even more incentive to verify everything before I write about it.

Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site, athttp://www.wheaty.net, has a FAQ page and information on previous columns and articles.

From the January 2001 issue of MSDN Magazine

[under the hood]Reduce EXE and DLL Size with LIBCTINY.LIB的更多相关文章

  1. 如何判断exe或dll的目标平台及是否是.NET?

    1. COFF文件头中偏移0处的Machine指示目标机器类型(IMAGE_FILE_MACHINE_AMD64等),偏移18处的Characteristics位指示文件属性(IMAGE_FILE_3 ...

  2. PE头的应用---插入代码到EXE或DLL文件中

    三.代码实现(DELPHI版本),采用第三种方式实现代码插入. 1. 定义两个类,一个用来实现在内存中建立输入表:一个用来实现对PE头的代码插入. DelphiCode: const MAX_SECT ...

  3. C# 获取exe、dll中的图标,支持获取256x256分辨率

    在网上找过许多文章,都没有成功获取过大图标,只能获取最大32x32.最后自己尝试了相关的windows api,终于找到一个可用的. 主要用到的C++的PrivateExtractIcons函数,具体 ...

  4. 有关windows系统的EXE和DLL文件说法错误

    正确答案: B C   你的答案: C (错误) EXE和DLL文件都是PE文件 EXE不能有导出函数,DLL可以有导出函数 EXE有x86和x64之分,则DLL没有 EXE可以单独运行,DLL则不行 ...

  5. 如何用VS调试不属于解决方案的EXE和DLL程序

    如果你手里有一个现成的EXE, 以及EXE相关联PDB文件, 还有相关联的CPP文件和H文件. 你如何用VS调试? (当然你可以选择WinDbg.不过这里就讨论VS) 你或许想问我干嘛不从一开始就用V ...

  6. Qt技巧:Win7下打包发布Qt程序(解释的比较清楚,把exe和dll伪装合并成一个文件)

    转自:http://www.stardrad.com/blog/qt-5%E7%A8%8B%E5%BA%8F%E5%9C%A8windows%E4%B8%8A%E7%9A%84%E5%8F%91%E5 ...

  7. 动态加载EXE和DLL

    程序中加载了一个DLL文件,但生成的EXE在脱离了DLL文件后仍然可以 单独使用,这是动态加载DLL技术.即:调用资源中的DLL. 此技术的好处:EXE可以使用DLL中的函数,但不会额外增加一 个DL ...

  8. 使用ILmerge合并Exe、Dll文件的帮助类

    原文:使用ILmerge合并Exe.Dll文件的帮助类 using System; using System.Collections.Generic; using System.Text; using ...

  9. C#程序集系列06,程序集清单,EXE和DLL的区别

    CLR在加载程序集的时候会查看程序集清单,程序集清单包含哪些内容呢?可执行文件和程序集有什么区别/ 程序集清单 □ 查看程序集清单 →清空F盘as文件夹中的所有内容→创建MainClass.cs文件→ ...

随机推荐

  1. JQuery中动态生成元素的绑定事件(坑死宝宝了)

    今天在做项目的时候,遇到了一个前端的问题,坑了我好长时间没有解决,今天就记录于此,也分享给大家. 问题是这样的,首先看看我的界面,有一个初始印象: 下面是操作列所对应的JS代码: { "da ...

  2. 分享我用Qt开发的应用程序【一】,附绿色版下载,以后会慢慢公布源码

    写在前面: 1.第一版的代码还有些烂,等功能开发齐全了,做一次重构,再慢慢分享代码 2.邮箱功能.自动升级功能还没有做,笔记功能和备忘功能是好用的,大家如果不嫌弃,可以先用起来 3.笔记功能目前还不能 ...

  3. VS2010 调试不会命中当前断点

    方法1.直接把整个文件格式化了一次,断点就可以用了Ctrl + A全选菜单:编辑-〉高级-〉设置选定内容的格式 (Ctrl+K, Ctrl+F)通过比较文件发现是由于制表符Tab(0x09)引起的,原 ...

  4. [C#] Timer + Graphics To Get Simple Animation (简单的源码例子,适合初学者)

    >_<" 这是一个非常简单的利用C#的窗口工程创立的程序,用来做一个简单的动画,涉及Timer和Graphics,适合初学者,高手略过~

  5. Arduino 端口通信实例

    ////////////////////////////////////////////////////////// //Arduino 1.0.x-----Arduino Uno----COM9 / ...

  6. [Java Web] 3、WEB开发之HTML基础程序试手

    1.初试: <html> <body> <h1>My First Heading</h1> <p>My first paragraph.&l ...

  7. 一句话在网页右上角加一个精致下拉框:forkme on github

    随着我国科技水平不断发展,玩Github的童鞋越来越多了,按照惯例,开源项目会有一个示例网站,而网站的右上角,通常会有一个forkme on github,这说明你可以去Github查看.下载项目源码 ...

  8. JavaScript text highlighting JQuery plugin

    介绍一个JQuery的插件,用来在页面上高亮显示匹配到的字符串. Demo 点击下面的两个链接以查看效果: highlight javascript 点击Remove highlights移除高亮显示 ...

  9. RESTful API设计指南

    网络应用程序,分为前端和后端两个部分.当前的发展趋势,就是前端设备层出不穷(手机.平板.桌面电脑.其他专用设备......). 因此,必须有一种统一的机制,方便不同的前端设备与后端进行通信.这导致AP ...

  10. spring常用jar包总结(转载)

    spring.jar是包含有完整发布的单个jar 包,spring.jar中包含除了spring-mock.jar里所包含的内容外其它所有jar包的内容,因为只有在开发环境下才会用到 spring-m ...