Unicode, UTF, ASCII, ANSI format differences
Going down your list:
- "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
- UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
- UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
- UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
- UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET
Utf32String
class as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.) - ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
- ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained viaEncoding.Default, and is often Windows-1252 but can be other locales.
There's more on my Unicode page and tips for debugging Unicode problems.
The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.
Unicode, UTF, ASCII, ANSI format differences的更多相关文章
- 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)
PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明 普通的linux 和 普通的windows. ...
- Unicode(UTF&UCS)深度历险
Unicode(UTF&UCS)深度历险 计算机网络诞生后,大家慢慢地发现一个问题:一个字节放不下一个字符了!因为需要交流,本地化的文字需要能够被支持. 最初的字符集使用7bit来存储字符,因 ...
- UNICODE与ASCII
1.ASCII的特点 ASCII 是用来表示英文字符的一种编码规范.每个ASCII字符占用1 个字节,因此,ASCII 编码可以表示的最大字符数是255(00H—FFH).这对于英文而言,是没有问题的 ...
- utf-8,Unicode和ASCII区别
一.ASCII 码 我们知道,计算机内部,所有信息最终都是一个二进制值.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态,这被称为一个字节(byte).也就是说,一个 ...
- V++ MFC CEdit输出数组 UNICODE TO ASCII码
MFC怎么在静态编辑框中输出数组 //字符转ASCII码void CUTF8Dlg::OnBnClickedButtonCharAscii(){ // TODO: 在此添加控件通知处理程序代码 Upd ...
- 创建文件夹并解决解决unicode和ASCII码转换的问题
# -*- coding: UTF-8 -*-import sysimport timeimport os #解决unicode和ASCII码转换的问题reload(sys) #解决unicode和A ...
- c# Unicode 转换 ASCII
/// <summary> /// Unicode 转换 ASCII /// </summary> /// <param name="theText" ...
- python2.7 处理unicode和ascii字符串混用问题
python2.7默认的编码方式为ascii码,如下可以查询: import sys sys.getdefaultencoding() 如果直接在unicode和ascii字符串之间做计算.比较.连接 ...
- ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
转载请注明原文地址:https://www.cnblogs.com/cnodoo/p/9281366.html ValueError: All strings must be XML compati ...
随机推荐
- ora-01033和ora-12560错误的解决方案
1.登录pl sql 报01033的错误,如下图: 2.登录cmd中,报12560的错误,如下图: 3.查看服务和注册表都没有问题,如下: 查看服务,已启动,如下图: 运行regedit,进入HKEY ...
- Java多线程-并发容器
Java多线程-并发容器 在Java1.5之后,通过几个并发容器类来改进同步容器类,同步容器类是通过将容器的状态串行访问,从而实现它们的线程安全的,这样做会消弱了并发性,当多个线程并发的竞争容器锁的时 ...
- Linux tcp黏包解决方案
tcpip协议使用"流式"(套接字)进行数据的传输,就是说它保证数据的可达以及数据抵达的顺序,但并不保证数据是否在你接收的时候就到达,特别是为了提高效率,充分利用带宽,底层会使用缓 ...
- [原创]纯CSS3打造的3D翻页翻转特效
刚接触CSS3动画,心血来潮实现了一个心目中自己设计的翻页效果的3D动画,页面纯CSS3,目前只能在Chrome中玩,理论上可以支持Safari. 1. 新建HTML,代码如下(数据和翻页后的数据都是 ...
- ELF Format 笔记(十三)—— 段权限
ilocker:关注 Android 安全(新手) QQ: 2597294287 一个可被系统加载的程序至少拥有一个可加载段.当系统创建可加载段的内存映像时,会根据 p_flags 赋予一定的访问权限 ...
- Ubuntu配置Ruby和Rails
安装curl sudo apt-get install curl 安装RVM curl -L https://get.rvm.io | bash -s stable 通过RVM来安装Ruby rvm ...
- CF219D. Choosing Capital for Treeland [树形DP]
D. Choosing Capital for Treeland time limit per test 3 seconds memory limit per test 256 megabytes i ...
- js中的hasOwnProperty和isPrototypeOf方法
hasOwnProperty:是用来判断一个对象是否有你给出名称的属性或对象.不过需要注意的是,此方法无法检查该对象的原型链中是否具有该属性,该属性必须是对象本身的一个成员. isPrototypeO ...
- Mac下打开eclipse 始终提示 你需要安装Java SE 6 Runtime
Mac下打开eclipse 始终提示 你需要安装Java SE 6 Runtime 周银辉 我的mac os 版本是10.9.2, JDK配置得好好的,但打开eclipse时还是提示需 ...
- SecureCRT连接虚拟机(ubuntu)配置
使用SecureCRT连接虚拟机(ubuntu)配置记录 这种配置方法,可以非常方便的操作虚拟机里的Linux系统,且让VMware在后台运行,因为有时候我直接在虚拟机里操作会稍微卡顿,或者切换速 ...