Going down your list:

  • "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
  • UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
  • UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
  • UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
  • UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET Utf32String class as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.)
  • ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
  • ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained viaEncoding.Default, and is often Windows-1252 but can be other locales.

There's more on my Unicode page and tips for debugging Unicode problems.

The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.

Unicode, UTF, ASCII, ANSI format differences的更多相关文章

  1. 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)

    PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明   普通的linux 和 普通的windows.    ...

  2. Unicode(UTF&UCS)深度历险

    Unicode(UTF&UCS)深度历险 计算机网络诞生后,大家慢慢地发现一个问题:一个字节放不下一个字符了!因为需要交流,本地化的文字需要能够被支持. 最初的字符集使用7bit来存储字符,因 ...

  3. UNICODE与ASCII

    1.ASCII的特点 ASCII 是用来表示英文字符的一种编码规范.每个ASCII字符占用1 个字节,因此,ASCII 编码可以表示的最大字符数是255(00H—FFH).这对于英文而言,是没有问题的 ...

  4. utf-8,Unicode和ASCII区别

    一.ASCII 码 我们知道,计算机内部,所有信息最终都是一个二进制值.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态,这被称为一个字节(byte).也就是说,一个 ...

  5. V++ MFC CEdit输出数组 UNICODE TO ASCII码

    MFC怎么在静态编辑框中输出数组 //字符转ASCII码void CUTF8Dlg::OnBnClickedButtonCharAscii(){ // TODO: 在此添加控件通知处理程序代码 Upd ...

  6. 创建文件夹并解决解决unicode和ASCII码转换的问题

    # -*- coding: UTF-8 -*-import sysimport timeimport os #解决unicode和ASCII码转换的问题reload(sys) #解决unicode和A ...

  7. c# Unicode 转换 ASCII

    /// <summary> /// Unicode 转换 ASCII /// </summary> /// <param name="theText" ...

  8. python2.7 处理unicode和ascii字符串混用问题

    python2.7默认的编码方式为ascii码,如下可以查询: import sys sys.getdefaultencoding() 如果直接在unicode和ascii字符串之间做计算.比较.连接 ...

  9. ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

    转载请注明原文地址:https://www.cnblogs.com/cnodoo/p/9281366.html  ValueError: All strings must be XML compati ...

随机推荐

  1. java中遍历map的两种方式

    1.先将map对象转成set,然后再转为迭代器 Iterator iterator = map.entrySet().iterator(); while(iterator.hasNext()){ En ...

  2. JavaScript的写类方式(6)

    时间到了2015年6月18日,ES6正式发布了,到了ES6,前面的各种模拟类写法都可以丢掉了,它带来了关键字 class,extends,super. ES6的写类方式 // 定义类 Person c ...

  3. S3C6410移植apache和php

    需要准备的东西: Apache-1.3.39 for linux Php-4.4.8 for linux Ubuntu amd64位 PC机 6410开发板,我用的是友善之臂 交叉编译: 交叉编译呢, ...

  4. Spring中常用类型的bean配置(Map,List,Set,基本类型)

    给自己做个笔记... 有时会用到配置文件中配置一下映射关系,方便以后扩展.此时可采用集合类型的bean配置方式配置.程序中直接注入即可. map类型的: <!-- 旧版方式,无需util包 -- ...

  5. 如何捕捉并分析SIGSEGV的现场

    linux下程序对SIGSEGV信号的默认处理方式是产生coredump并终止程序,可以参考man 7 signal Signal Value Action Comment ───────────── ...

  6. 帆软报表FineReport中数据连接之Jboss配置JNDI连接

    使用sqlsever 2000数据库数据源来做实例讲解,帆软报表FineReport数据连接中Jboss配置JNDI大概的过程和WEBSPHERE以及WEBLOGIC基本相同,用JDBC连接数据库制作 ...

  7. 初识【Windows API】--文本去重

    最近学习操作系统中,老师布置了一个作业,运用系统调用函数删除文件夹下两个重复文本类文件,Linux玩不动,于是就只能在Windows下进行了. 看了一下介绍Windows API的博客: 点击打开 基 ...

  8. hadoop安装

    环境 RedHad Linux9.0  java6   hadoop1.2.1 hadoop下载地址:http://mirror.bit.edu.cn/apache/hadoop/common/ 版本 ...

  9. Hibernate对象标识符

    Hibernate提供的内置标识符生成器 Java语言按内存地址来识别或区分同一个类的不同对象,而关系数据库按主键来识别或区分同一个表的不同记录.Hibernate使用OID(对象标识符)来统一两者之 ...

  10. android第一行代码-2.activity基本用法

    摘要: 本节主要涉及到的有activity的创建,标题栏隐藏,button绑定方法(toast的使用),menu使用,活动销毁 1.activity的创建跟注册 创建: public class Te ...