I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.

Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a TCHAR string.

TCHAR and the _t functions (for example, _tcscpy()) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you'll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to chars instead of TCHARs, just for simplicity. You should definitely read up on the TCHAR types, though, to be aware of them in case you ever come across them in code written by others.

When you get a Unicode string back from a COM method, you can convert it to a char string in one of several ways:

  1. Call the WideCharToMultiByte() API.
  2. Call the CRT function wcstombs().
  3. Use the CString constructor or assignment operator (MFC only).
  4. Use an ATL string conversion macro.

    WideCharToMultiByte()

    You can convert a Unicode string to an ANSI string with the WideCharToMultiByte() API. This API's prototype is:

    Hide   Copy Code

    int WideCharToMultiByte (

    UINT CodePage,

    DWORD dwFlags,

    LPCWSTR lpWideCharStr,

    int cchWideChar,

    LPSTR lpMultiByteStr,

    int cbMultiByte,

    LPCSTR lpDefaultChar,

    LPBOOL lpUsedDefaultChar );

    The parameters are:

    CodePage

    The code page to convert the Unicode characters into. You can pass CP_ACP to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it's important to use the right code page to get proper display of accented characters.

    dwFlags

    dwFlags determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character is è. If this character is in the code page specified inCodePage, then nothing special happens. However, if it is not in the code page, Windows has to convert it to something else.
    Passing WC_COMPOSITECHECK makes the API check for non-mapping composite characters. PassingWC_SEPCHARS makes Windows break the character into two, the letter followed by the diacritic, for examplee`. Passing WC_DISCARDNS makes Windows discard the diacritics. Passing WC_DEFAULTCHAR makes Windows replace the composite characters with a "default" character, specified in the lpDefaultCharparameter. The default behavior is WC_SEPCHARS.

    lpWideCharStr

    The Unicode string to convert.

    cchWideChar

    The length of lpWideCharStr in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.

    lpMultiByteStr

    A char buffer that will hold the converted string.

    cbMultiByte

    The size of lpMultiByteStr, in bytes.

    lpDefaultChar

    Optional - a one-character ANSI string that contains the "default" character to be inserted when dwFlagscontains WC_COMPOSITECHECK | WC_DEFAULTCHAR and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).

    lpUsedDefaultChar

    Optional - a pointer to a BOOL that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don't care about this information.

    Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here's an example showing how to use the API:

    Hide   Copy Code

    // Assuming we already have a Unicode string wszSomeString...

    char szANSIString [MAX_PATH];

     

    WideCharToMultiByte ( CP_ACP, // ANSI code page

    WC_COMPOSITECHECK, // Check for accented characters

    wszSomeString, // Source Unicode string

    -1, // -1 means string is zero-terminated

    szANSIString, // Destination char string

    sizeof(szANSIString), // Size of buffer

    NULL, // No default character

    NULL ); // Don't care about this flag

    After this call, szANSIString will contain the ANSI version of the Unicode string.

    wcstombs()

    The CRT function wcstombs() is a bit simpler, but it just ends up calling WideCharToMultiByte(), so in the end the results are the same. The prototype for wcstombs() is:

    Hide   Copy Code

    size_t wcstombs (

    char* mbstr,

    const
    wchar_t* wcstr,

    size_t count );

    The parameters are:

    mbstr

    A char buffer to hold the resulting ANSI string.

    wcstr

    The Unicode string to convert.

    count

    The size of the mbstr buffer, in bytes.

    wcstombs() uses the WC_COMPOSITECHECK | WC_SEPCHARS flags in its call to WideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:

    Hide   Copy Code

    wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );

    CString

    The MFC CString class contains constructors and assignment operators that accept Unicode strings, so you can let CString do the conversion work for you. For example:

    Hide   Copy Code

    // Assuming we already have wszSomeString...

     

    CString str1 ( wszSomeString ); // Convert with a constructor.

    CString str2;

     

    str2 = wszSomeString; // Convert with an assignment operator.

    ATL macros

    ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use the W2A() macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should use OLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here's an example of how to use these macros.

    Hide   Copy Code

    #include <atlconv.h>

     

    // Again assuming we have wszSomeString...

     

    {

    char szANSIString [MAX_PATH];

    USES_CONVERSION; // Declare local variable used by the macros.

     

    lstrcpy ( szANSIString, OLE2A(wszSomeString) );

    }

    The OLE2A() macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with lstrcpy(). Other macros you should look into areW2T() (Unicode to TCHAR), and W2CT() (Unicode string to const TCHAR string).

    There is an OLE2CA() macro (Unicode string to a const char string) which we could've used in the code snippet above. OLE2CA() is actually the correct macro for that situation, since the second parameter tolstrcpy() is a const char*, but I didn't want to throw too much at you at once.

关于COM的Unicode string的精彩论述的更多相关文章

  1. 【RF库测试】Encode String To Bytes&Decode Bytes To String& should be string&should be unicode string &should not be string

    场景1:判断类型 r ${d} set variable \xba\xcb\xbc\xf5\xcd\xa8\xb9\xfd #核减通过 Run Keyword And Continue On Fail ...

  2. unicode string和ansi string的转换函数及获取程序运行路径的代码

    #pragma once#include <string> namespace stds { class tool { public: std::string ws2s(const std ...

  3. python: int to unicode string

    >>> import types >>> print type(str(2)) <type 'str'> >>> ')) <ty ...

  4. Unicode String to a UTF-8 TypedArray Buffer in JavaScript

    https://coolaj86.com/articles/unicode-string-to-a-utf-8-typed-array-buffer-in-javascript/

  5. np.nan is an invalid document, expected byte or unicode string.

    ValueError Traceback (most recent call last) <ipython-input-12-1dc462ae8893> in <module> ...

  6. 从Java String实例来理解ANSI、Unicode、BMP、UTF等编码概念

    转(http://www.codeceo.com/article/java-string-ansi-unicode-bmp-utf.html#0-tsina-1-10971-397232819ff9a ...

  7. [转]SSIS cannot convert between unicode and non-unicode string

    本文转自:http://www.mssqltips.com/sqlservertip/1393/import-excel-unicode-data-with-sql-server-integratio ...

  8. UTF-8和Unicode

    What's the difference between unicode and utf8? up vote 103 down vote favorite 49 Is it true that un ...

  9. C#中文和UNICODE编码转换

    C#中文和UNICODE编码转换 //中文轉為UNICODE string str = "中文"; string outStr = ""; if (!strin ...

随机推荐

  1. 双频无线网安装设置(5g ) for linux

    为了在局域网实现远程wifi调试,例如调试需要图像数据传输,则需要搭建局域网5g无线网络. 1.硬件要求 a. TP-Link(型号:TL-WDR6500,AC1300双频无线路由器,支持5g,2.4 ...

  2. GIT(1)----更新代码和上传代码操作的步骤

    1.第一次下载代码 a.首先获得下载的地址,可从服务器,或者GitHut上获得.例如http://100.211.1.110:21/test/test.git b.终端里切换到想要将代码存放的目录,在 ...

  3. 浅入浅出---JQuery究竟是什么?

    学习完了JQuery之后.我便感觉云里雾里的,JQuery究竟是什么.朦朦胧胧感觉到JQuery应该是javascript函数的封装.就应该像WinForm窗口应用程序中能够调用的系统函数,据之前所学 ...

  4. Turtelizer 2 provide JTAG Flash programming and debugging of ARM based boards via USB

    http://www.ethernut.de/en/hardware/turtelizer/ Introducing Turtelizer 2 Overview Turtelizer 2 had be ...

  5. 引子——从Mac OS X的Lion说起

    最近感悟越来愈多,女儿越来越大,头发越来越少,我知道,自己老了. 30岁之后,时间仿佛开闸的河水一样滚滚而去,感觉自己浪费的时间太多.我们不得不承认,先知先觉的人会比我们领先10年甚至更多的身位.所以 ...

  6. Visual Studio中Debug和Release的区别

    在Visual Studio中,生成应用程序的时候有2种模式:Debug和Release.两者之间如何取舍呢? 假设有这么简单的一段代码,在主程序中调用方法M1,M1方法调用M2方法,M2方法调用M3 ...

  7. MVC实现文件下载

    □ 思路 点击一个链接,把该文件的Id传递给控制器方法,遍历文件夹所有文件,根据ID找到对应文件,并返回FileResult类型. 与文件相关的Model: namespace MvcApplicat ...

  8. matlab快捷键大全

    原文地址,点此查看 一.常用对象操作 除了一般windows窗口的常用功能键外. 1.!dir 可以查看当前工作目录的文件. !dir& 可以在dos状态下查看. 2.who   可以查看当前 ...

  9. java内存模型知识点汇总

    1.像windows/linux这种操作系统中,自带jvm么?以方便java程序的运行? 答:是的,一般操作系统都自带jvm的.但不带jdk,也就是说java的运行环境有,但编译环境没有. 1.jav ...

  10. arcgis runtime 100 Create geometries

    1 /* Copyright 2016 EsriEsri 2 * 3 * Licensed under the Apache License, Version 2.0 (the "Licen ...