I have been quite annoyed by a Windows bug that causes a huge number of open-source command-line tools to choke on multi-byte characters at the Windows Command Prompt. The MSVCRT.DLL shipped with Windows Vista or later has been having big troubles with such characters. While Microsoft tools and compilers after Visual Studio 6.0 do not use this DLL anymore, the GNU tools on Windows, usually built by MinGW or Mingw-w64, are dependent on this DLL and suffer from this problem. One cannot even use ls to display a Chinese file name, when the system locale is set to Chinese.

The following simple code snippet demonstrates the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <locale.h>
#include <stdio.h>
 
char msg[] = "\xd7\xd6\xb7\xfb Char";
wchar_t wmsg[] = L"字符 char";
 
void Test1()
{
    char* ptr = msg;
    printf("Test 1: ");
    while (*ptr) {
        putchar(*ptr++);
    }
    putchar('\n');
}
 
void Test2()
{
    printf("Test 2: ");
    puts(msg);
}
 
void Test3()
{
    wchar_t* ptr = wmsg;
    printf("Test 3: ");
    while (*ptr) {
        putwchar(*ptr++);
    }
    putwchar(L'\n');
}
 
int main()
{
    char buffer[32];
    puts("Default C locale");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("Chinese locale");
    setlocale(LC_CTYPE, "Chinese_China.936");
    Test1();
    Test2();
    Test3();
    putchar('\n');
    puts("English locale");
    setlocale(LC_CTYPE, "English_United States.1252");
    Test1();
    Test2();
    Test3();
}

When built with a modern version of Visual Studio, it gives the expected output (console code page is 936):

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: char Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3: char

I.e. when the locale is the default ‘C’, the ‘ANSI’ version of character output routines can successfully output single-byte and multi-byte characters, while putwchar, the ‘Unicode’ version of putchar, fails at the multi-byte characters (reasonably, as the C locale does not understand how to translate Chinese characters). When the locale is set correctly to code page 936 (Simplified Chinese), everything is correct. When the locale is set to code page 1252 (Latin), the corresponding characters at the same code points of the original Chinese characters (‘×Ö·û’ instead of ‘字符’) are shown with the ‘ANSI’ routines, though ‘Ö’ (\xd6) and ‘û’ (\xfb) are shown as ‘?’ because they do not exist in code page 936. The Chinese characters, of course, cannot be shown with putwchar in this locale, just like the C locale.

When built with GCC, the result is woeful:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: char Chinese locale
Test 1: Char
Test 2: 字符 Char
Test 3: char English locale
Test 1: ×?·? Char
Test 2: ×?·? Char
Test 3: char

Two things are worth noticing:

  • putchar stops working for Chinese when the locale is correctly set.
  • putwchar never works for Chinese.

Horrible and thoroughly broken! (Keep in mind that Microsoft is to blame here. You can compile the program with MSVC 6.0 using the /MD option, and the result will be the same—an executable that works in Windows XP but not in Windows Vista or later.)

I attacked this problem a few years ago, and tried some workarounds. The solution I came up with looked so fragile that I did not push it up to the MinGW library. It was a personal failure, as well as an indication that working around a buggy implementation without affecting the application code can be very difficult or just impossible.


The problem occurs only with the console, where the Microsoft runtime does some translation (broken in MSVCRT.DLL, but OK in newer MSVC runtimes). It vanishes when users redirect the output from the console. So one solution is not to use the Command Prompt at all. The Cygwin Terminal may be a good choice, especially for people familiar with Linux/Unix. I have Cygwin installed, but sometimes I still want to do things in the more Windows-y way. I figured I could make a small tool (like cat) to get the input from stdin, and forward everything to stdout. As long as this tool is compiled by a Microsoft compiler, things should be OK. Then I thought a script could be faster. Finally, I came up with putting the following line into an mbf.bat:

@perl -p -e ""

(Perl is still wonderful for text processing, even in this ‘empty’ program!)

Now the executables built by GCC and MSVC give the same result, if we append ‘|mbf’ on the command line:

Default C locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: char Chinese locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: 字符 char English locale
Test 1: 字符 Char
Test 2: 字符 Char
Test 3: char

If you know how to make Microsoft fix the DLL problem, do it. Otherwise you know at least a workaround now. 


The following code is my original partial solution to the problem, and it may be helpful to your GCC-based project. I don’t claim any copyright of it, nor will I take any responsibilities for its use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* mingw_mbcs_safe_io.c */
 
#include <mbctype.h>
#include <stdio.h>
 
/* Output functions that work with the Windows 7+ MSVCRT.DLL
 * for multi-byte characters on the console.  Please notice
 * that buffering must not be enabled for the console (e.g.
 * by calling setvbuf); otherwise weird things may occur. */
 
int __cdecl _mgw_flsbuf(int ch, FILE* fp)
{
  static char lead = '\0';
  int ret = 1;
 
  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
      if (ret < 0)
        return EOF;
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    return _flsbuf(ch, fp);
 
  return ch;
}
 
int __cdecl putc(int ch, FILE* fp)
{
  static __thread char lead = '\0';
  int ret = 1;
 
  if (lead != '\0')
    {
      ret = fprintf(fp, "%c%c", lead, ch);
      lead = '\0';
    }
  else if (_ismbblead(ch))
    lead = ch;
  else
    ret = fprintf(fp, "%c", ch);
 
  if (ret < 0)
    return EOF;
  else
    return ch;
}
 
int __cdecl putchar(int ch)
{
  putc(ch, stdout);
}
 
int __cdecl _mgwrt_putchar(int ch)
{
  putc(ch, stdout);
}

 

https://yongweiwu.wordpress.com/tag/mingw-w64/

MSVCRT.DLL Console I/O Bug(setlocale(LC_CTYPE, "Chinese_China.936"))的更多相关文章

  1. abap调vb写的dll实现电子天平的读数(带控件版)

    废话不多说,直接上. 鉴于abap调研的dll文件需要在wins注册,自己尝试过delphi和C#感觉不是很好,最后毅然选择了VB来写 因为需要用到MScomm控件,所以对于将要写的dll需要带for ...

  2. MinGW gcc 生成动态链接库 dll 的一些问题汇总(由浅入深,很详细)

    网络上关于用 MinGW gcc 生成动态链接库的文章很多.介绍的方法也都略有不同.这次我在一个项目上刚好需要用到,所以就花了点时间将网上介绍的各种方法都实验了一遍.另外,还根据自己的理解试验了些网上 ...

  3. 调用DATASNAP+FIREDAC的远程方法有时会执行二次SQL或存储过程的BUG(转永喃兄)

    调用DATASNAP+FIREDAC的远程方法有时会执行二次SQL或存储过程的BUG 1)查询会重复执行的情形:Result := DATASETPROVIDER.Data会触发它关联的DATASET ...

  4. 简单解决 Javascrip 浮点数计算的 Bug(.toFixed(int 小数位数))

    众所周知,Javascript 在进行浮点数运算时,结果会非预期地出现一大长串小数. 解决: 如果变量 result 是计算结果,则在返回时这样写,return result.toFixed(2): ...

  5. 【原创】IE11惊现无厘头Crash BUG(三招搞死你的IE11,并提供可重现代码)!

    前言 很多人都知道我们在做FineUI控件库,而且我们也做了超过 9 年的时间,在和浏览器无数次的交往中,也发现了多个浏览器自身的BUG,并公开出来方便大家查阅: 分享IE7一个神奇的BUG(不是封闭 ...

  6. 无法定位程序输入点到_ftol2于动态链接库msvcrt.dll的错误的解决

    作者:朱金灿 来源:http://blog.csdn.net/clever101 今天同事在Windows XP系统上运行程序遇到这样一个错误: 我试了一下,在Win7上运行则没有这个错误.只是程序运 ...

  7. 无法定位程序输入点_except_handler4_common于动态链接库msvcrt.dll

    这是由于sp3加载的驱动造成的:只需要将C:\WINDOWS\system32\dwmapi.dll重新命名一下即可以解决. 可以调试程序当系统加载到“c:\Program Files\China M ...

  8. 海王星给你好看!FineUI v4.0公测版发布暨《你找BUG我送书》活动开始(活动已结束!)

    <FineUI v4.0 你找BUG我送书>活动已结束,恭喜如下三位网友获得由 FineUI 作者亲自翻译的图书<jQuery实战 第二版>! 奋斗~ 吉吉﹑ purplebo ...

  9. DLL中传递STL参数(如Vector或者list等)会遇到的问题[转载]

    最近的一个项目中遇到了调用别人的sdk接口(dll库)而传给我的是一个vector指针,用完之后还要我来删除的情况.这个过程中首先就是在我的exe中将其vector指针转为相应指针再获取vector中 ...

随机推荐

  1. Quartz.Net线程处理用到的两个Attribute

    1.DisallowConcurrentExecution 加到IJob实现类上,主要防止相同JobDetail并发执行. 简单来说,现在有一个实现了IJob接口的CallJob,触发器设置的时间是每 ...

  2. 在VS2013 IIS Express 添加MIME映射

    打开VS2013返回json提示MIME映射问题 1.在DOS窗口下进入IIS Express安装目录,默认是C:\Program Files\IIS Express,cmd  命令行cd 到 该目录 ...

  3. 通过windows自带的系统监视器来查看IIS并发连接数(perfmon.msc)

    如果要查看IIS连接数,最简单方便的方法是通过“网站统计”来查看,“网站统计”的当前在线人数可以认为是当前IIS连接数.然而,“网站统计”的当前在线人数统计时间较长,一般为10分钟或15分钟,再加上统 ...

  4. RandomForest 调参

    在scikit-learn中,RandomForest的分类器是RandomForestClassifier,回归器是RandomForestRegressor,需要调参的参数包括两部分,第一部分是B ...

  5. RapidIO协议(1)

    RapidIO协议 1.概述 1.1介绍 RapidIO是基于包交换互联协议,主要作为系统内部接口使用,如:芯片间.板间的通讯,速度能在GB/S数量级.如连接处理器.内存.内存映射的I/O设备.这些设 ...

  6. Closure闭包示例

    var foo = function(){ var cnt = 0; return function(){ return cnt++; }; }; var closure = foo(); conso ...

  7. ThinkPHP使用PHPExcel实现Excel数据导入导出完整实例

    这篇文章主要介绍了ThinkPHP使用PHPExcel实现Excel数据导入导出,非常实用的功能,需要的朋友可以参考下 本文所述实例是使用在Thinkphp的开发框架上,要是使用在其他框架也是同样的方 ...

  8. Windows Server 2008 R2入门之用户管理

    一.用户账户概述: ”用户”是计算机的使用者在计算机系统中的身份映射,不同的用户身份拥有不同的权限,每个用户包含一个名称和一个密码: 在Windows中,每个用户帐户有一个唯一的安全标识符(Secur ...

  9. IOS设计模式浅析之建造者模式(Builder)

    定义 "将一个复杂对象的构建与它的表现分离,使得同样的构建过程可以创建不同的表现". 最初的定义出现于<设计模式>(Addison-Wesley,1994). 看这个概 ...

  10. ini_set() php.ini设置的功能

    ini_set()具有更改php.ini设置的功能.此函数接收两个参数:需要调整的配置变量名,以及变量的新值. [c-sharp] view plaincopyprint? <?php ini_ ...