I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have the option to use C++11 am looking at the new string types.

It looks like I can use stringu16string and u32string for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8 and codecvt_utf16 which look to be able to do a conversion between charor char16_t and char32_t and what looks like a higher level wstring_convert but that only appears to work with bytes/std::string and not a great deal of documentation.

Am I meant to use a wstring_convert somehow for the UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32 case? I only really found examples for UTF-8 to UTF-16, which I am not even sure will be correct on Linux where wchar_t is normally considered UTF-32... Or do something more complex with those codecvt things directly?

Or is this just still not really in a usable state and I should stick with my own existing small routines using 8, 16 and 32bit unsigned integers?

---------------------------------------------------------------------------------------------------------------------------------------------------------------

answer:

If you read the documentation at CppReference.com for wstring_convertcodecvt_utf8codecvt_utf16, and codecvt_utf8_utf16, the pages include a table that tells you exactly what you can use for the various UTF conversions.

And yes, you would use std::wstring_convert to facilitate the conversion between the various UTFs. Despite its name, it is not limited to just std::wstring, it actually operates with any std::basic_string type (which std::stringstd::wstring, and std::uXXstring are all based on).

Class template std::wstring_convert performs conversions between byte string std::stringand wide string std::basic_string<Elem>, using an individual code conversion facet Codecvt. std::wstring_convert assumes ownership of the conversion facet, and cannot use a facet managed by a locale. The standard facets suitable for use with std::wstring_convert are std::codecvt_utf8 for UTF-8/UCS2 and UTF-8/UCS4 conversions and std::codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.

For example:

typedef std::string u8string;

u8string To_UTF8(const std::u16string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.to_bytes(s);
} u8string To_UTF8(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
return conv.to_bytes(s);
} std::u16string To_UTF16(const u8string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.from_bytes(s);
} std::u16string To_UTF16(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
std::string bytes = conv.to_bytes(s);
return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
} std::u32string To_UTF32(const u8string &s)
{
std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
return conv.from_bytes(s);
} std::u32string To_UTF32(const std::u16string &s)
{
const char16_t *pData = s.c_str();
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
}

std::u32string conversion to/from std::string and std::u16string的更多相关文章

  1. 对std::string和std::wstring区别的解释,807个赞同,有例子

    807down vote string? wstring? std::string is a basic_string templated on a char, and std::wstring on ...

  2. C++ MFC std::string转为 std::wstring

    std::string转为 std::wstring std::wstring UTF8_To_UTF16(const std::string& source) { unsigned long ...

  3. 实战c++中的string系列--std:vector 和std:string相互转换(vector to stringstream)

    string.vector 互转 string 转 vector vector  vcBuf;string        stBuf("Hello DaMao!!!");----- ...

  4. no match for call to ‘(std::__cxx11::string {aka std::__cxx11::basic_string

    问题: t->package().ship_id(sqlRow[1]);其中 ship_id为 结构体package中的string类型.如下: typedef struct Package{  ...

  5. Item 25: 对右值引用使用std::move,对universal引用则使用std::forward

    本文翻译自<effective modern C++>,由于水平有限,故无法保证翻译完全正确,欢迎指出错误.谢谢! 博客已经迁移到这里啦 右值引用只能绑定那些有资格被move的对象上去.如 ...

  6. 如何使用 window api 转换字符集?(std::string与std::wstring的相互转换)

    //宽字符转多字节 std::string W2A(const std::wstring& utf8) { int buffSize = WideCharToMultiByte(CP_ACP, ...

  7. std::string与std::wstring互相转换

    作者:zzandyc来源:CSDN原文:https ://blog.csdn.net/zzandyc/article/details/77540056 版权声明:本文为博主原创文章,转载请附上博文链接 ...

  8. 实战c++中的string系列--std::string与MFC中CString的转换

    搞过MFC的人都知道cstring,给我们提供了非常多便利的方法. CString 是一种非常实用的数据类型. 它们非常大程度上简化了MFC中的很多操作,使得MFC在做字符串操作的时候方便了非常多.无 ...

  9. gcc编译链接std::__cxx11::string和std::string的问题

    今天公司的小伙伴遇到一个问题,这里做一个记录. 问题是这样的,他编译了公司的基础库,然后在程序中链接的时候遇到点问题,报错找不到定义. 用到的函数声明大概是这样的: void function(con ...

随机推荐

  1. winform 用户控件事件的写法

    public partial class UcTest : UserControl { public UcTest() { InitializeComponent(); } //定义事件 public ...

  2. C# 执行CMD命令的方法

    /// <summary> /// 执行CMD命令 /// </summary> /// <param name="str"></para ...

  3. C# 抓取并导出网页里面所有超链接方法

    public class app { // 获取指定网页的HTML代码 public static string GetPageSource(string URL) { Uri uri = new U ...

  4. javascript对数据处理

    数组去重 法一: // 遍历数组,建立新数组,利用indexOf判断是否存在于新数组中,不存在则push到新数组,最后返回新数组 function unique(ar) { var ret = []; ...

  5. Python爬虫学习——获取网页

    通过GET请求获取返回的网页,其中加入了User-agent信息,不然会抛出"HTTP Error 403: Forbidden"异常, 因为有些网站为了防止这种没有User-ag ...

  6. [原]使用Fiddler捕获java的网络通信数据

    [原]使用Fiddler捕获java的网络通信数据 System.setProperty("http.proxySet", "true"); System.se ...

  7. 用外部物理路由器时与外部dhcp服务时怎样使用metadata服务(by quqi99)

    作者:张华  发表于:2015-12-31版权声明:能够随意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明 ( http://blog.csdn.net/quqi99 ) 用外部 ...

  8. SharePoint 2013 显示“以其他用户身份登录”菜单项

    最近在SharePoint 2013的网站上发现,没有看到有切换不同用户登录的入口,在SharePoint 2010中是存在这样的菜单项能够很方便的进行用户切换的,不知道为什么,SharePoint ...

  9. ubuntu-14.04.2-desktop-i386.iso:ubuntu-14.04.2-desktop-i386:安装Oracle11gR2

    ubuntu 桌面版的安装不介绍. 如何安装oracle:核心步骤和关键点. ln -sf /bin/bash /bin/sh ln -sf /usr/bin/basename /bin/basena ...

  10. innodb分区

    当 MySQL的总记录数超过了100万后,性能会大幅下降,可以采用分区方案 分区允许根据指定的规则,跨文件系统分配单个表的多个部分.表的不同部分在不同的位置被存储为单独的表. 1.先看下innodb的 ...