Unicode, UTF, ASCII, ANSI format differences
Going down your list:
- "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
- UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
- UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
- UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
- UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET
Utf32Stringclass as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.) - ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
- ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained viaEncoding.Default, and is often Windows-1252 but can be other locales.
There's more on my Unicode page and tips for debugging Unicode problems.
The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.
Unicode, UTF, ASCII, ANSI format differences的更多相关文章
- 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)
PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明 普通的linux 和 普通的windows. ...
- Unicode(UTF&UCS)深度历险
Unicode(UTF&UCS)深度历险 计算机网络诞生后,大家慢慢地发现一个问题:一个字节放不下一个字符了!因为需要交流,本地化的文字需要能够被支持. 最初的字符集使用7bit来存储字符,因 ...
- UNICODE与ASCII
1.ASCII的特点 ASCII 是用来表示英文字符的一种编码规范.每个ASCII字符占用1 个字节,因此,ASCII 编码可以表示的最大字符数是255(00H—FFH).这对于英文而言,是没有问题的 ...
- utf-8,Unicode和ASCII区别
一.ASCII 码 我们知道,计算机内部,所有信息最终都是一个二进制值.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态,这被称为一个字节(byte).也就是说,一个 ...
- V++ MFC CEdit输出数组 UNICODE TO ASCII码
MFC怎么在静态编辑框中输出数组 //字符转ASCII码void CUTF8Dlg::OnBnClickedButtonCharAscii(){ // TODO: 在此添加控件通知处理程序代码 Upd ...
- 创建文件夹并解决解决unicode和ASCII码转换的问题
# -*- coding: UTF-8 -*-import sysimport timeimport os #解决unicode和ASCII码转换的问题reload(sys) #解决unicode和A ...
- c# Unicode 转换 ASCII
/// <summary> /// Unicode 转换 ASCII /// </summary> /// <param name="theText" ...
- python2.7 处理unicode和ascii字符串混用问题
python2.7默认的编码方式为ascii码,如下可以查询: import sys sys.getdefaultencoding() 如果直接在unicode和ascii字符串之间做计算.比较.连接 ...
- ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
转载请注明原文地址:https://www.cnblogs.com/cnodoo/p/9281366.html ValueError: All strings must be XML compati ...
随机推荐
- JavaSe:-javaagent,-agentlib,-agentpath
内容简述 -javaagent,-agentlib, -agentpath 说明 -javaagent示例 -javaagent.-agentlib.-agentpath -agentlib:li ...
- 实时事件统计项目:优化solr和morphline的时间字段
morphline优化,如下: 传过来的时间戳被复制到3个字段:eventTimeInMinuteChina_tdt ,eventTimeInMinuteUTC_tdt ,eventTimeInHou ...
- java常用的文件读写操作
现在算算已经做java开发两年了,回过头想想还真是挺不容易的,java的东西是比较复杂但是如果基础功扎实的话能力的提升就很快,这次特别整理了点有关文件操作的常用代码和大家分享 1.文件的读取(普通方式 ...
- linux文件拼接命令 paste
paste [文件名1 [文件名2] --] [选项] -s 把文件以行的方式拼接 -d 制定分隔符,默认以制表符分隔 [root@dagege ~]# >.txt [root@dagege ~ ...
- Jenkins学习九:Jenkins插件之构建MSBuild
Jenkins是Java语言编写的,一直好奇是否可以构建NET语言的项目,目前只了解到有一个插件MSBuild支持构建NET项目. 一.Jenkins安装插件MSBuild 二.VS构建CsharpH ...
- RS-232 vs. TTL Serial Communication(转载)
RS-232串口一度像现在的USB接口一样,是PC的标准接口,用来连接打印机.Modem和其他一些外设.后来逐渐被USB接口所取代,现在PC上已经看不到它的身影了.开发调试时如果用到串口,一般都是用U ...
- Linux下管道编程
功能: 父进程创建一个子进程父进程负责读用户终端输入,并写入管道 子进程从管道接收字符流写入另一个文件 代码: #include <stdio.h> #include <unistd ...
- iOS第八课——Navigation Controller和Tab bar Controller
今天我们要学习Navigation Controller和Tab bar Controller. Navigation Controller是iOS编程中比较常用的一种容器,用来管理多个视图控制器. ...
- 使用scrollpagination实现页面底端自动加载无需翻页功能
当阅读到页面最底端的时候,会自动显示一个"加载中"的功能,并自动从服务器端无刷新的将内容下载到本地浏览器显示. 这样的自动加载功能是如何实现的?jQuery的插件 ScrollPa ...
- usb驱动开发之大结局
从usb总线的那个match函数usb_device_match()开始到现在,遇到了设备,遇到了设备驱动,遇到了接口,也遇到了接口驱动,期间还多次遇到usb_device_match(),又多次与它 ...