Going down your list:

  • "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
  • UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
  • UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
  • UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
  • UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET Utf32String class as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.)
  • ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
  • ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained viaEncoding.Default, and is often Windows-1252 but can be other locales.

There's more on my Unicode page and tips for debugging Unicode problems.

The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.

Unicode, UTF, ASCII, ANSI format differences的更多相关文章

  1. 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)

    PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明   普通的linux 和 普通的windows.    ...

  2. Unicode(UTF&UCS)深度历险

    Unicode(UTF&UCS)深度历险 计算机网络诞生后,大家慢慢地发现一个问题:一个字节放不下一个字符了!因为需要交流,本地化的文字需要能够被支持. 最初的字符集使用7bit来存储字符,因 ...

  3. UNICODE与ASCII

    1.ASCII的特点 ASCII 是用来表示英文字符的一种编码规范.每个ASCII字符占用1 个字节,因此,ASCII 编码可以表示的最大字符数是255(00H—FFH).这对于英文而言,是没有问题的 ...

  4. utf-8,Unicode和ASCII区别

    一.ASCII 码 我们知道,计算机内部,所有信息最终都是一个二进制值.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态,这被称为一个字节(byte).也就是说,一个 ...

  5. V++ MFC CEdit输出数组 UNICODE TO ASCII码

    MFC怎么在静态编辑框中输出数组 //字符转ASCII码void CUTF8Dlg::OnBnClickedButtonCharAscii(){ // TODO: 在此添加控件通知处理程序代码 Upd ...

  6. 创建文件夹并解决解决unicode和ASCII码转换的问题

    # -*- coding: UTF-8 -*-import sysimport timeimport os #解决unicode和ASCII码转换的问题reload(sys) #解决unicode和A ...

  7. c# Unicode 转换 ASCII

    /// <summary> /// Unicode 转换 ASCII /// </summary> /// <param name="theText" ...

  8. python2.7 处理unicode和ascii字符串混用问题

    python2.7默认的编码方式为ascii码,如下可以查询: import sys sys.getdefaultencoding() 如果直接在unicode和ascii字符串之间做计算.比较.连接 ...

  9. ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

    转载请注明原文地址:https://www.cnblogs.com/cnodoo/p/9281366.html  ValueError: All strings must be XML compati ...

随机推荐

  1. [翻译]——SQL Server使用链接服务器的5个性能杀手

    前言: 本文是对博客http://www.dbnewsfeed.com/2012/09/08/5-performance-killers-when-working-with-linked-server ...

  2. Linux命令学习总结:hexdump

    命令简介: hexdump是Linux下的一个二进制文件查看工具,它可以将二进制文件转换为ASCII.八进制.十进制.十六进制格式进行查看. 指令所在路径:/usr/bin/hexdump 命令语法: ...

  3. MS Sql Server 数据库或表修复(Log日志文件损坏的修复方法)

    ----------------- [1] use master go sp_configure reconfigure with override go ----------------- [2] ...

  4. mysql启动失败:不能创建pid文件

    2016-03-09T07:51:38.905444Z 0 [ERROR] /usr/sbin/mysqld: Can't create/write to file '/var/run/mysqld/ ...

  5. Centos7中systemctl命令详解

    Linux Systemctl是一个系统管理守护进程.工具和库的集合,用于取代System V.service和chkconfig命令,初始进程主要负责控制systemd系统和服务管理器.通过Syst ...

  6. linux下shell脚本执行jar文件

    最近在搞一个shell脚本启动jar文件个关闭jar文件的东东.搞得我都蛋疼了.今天晚上终于弄好了 话说,小弟的linux只是刚入门,经过各方查资料终于搞定了.话不多说,下面开始上小弟写的shell脚 ...

  7. CF724D. Dense Subsequence[贪心 字典序!]

    D. Dense Subsequence time limit per test 2 seconds memory limit per test 256 megabytes input standar ...

  8. NOIP2015斗地主[DFS 贪心]

    题目描述 牛牛最近迷上了一种叫斗地主的扑克游戏.斗地主是一种使用黑桃.红心.梅花.方片的A到K加上大小王的共54张牌来进行的扑克牌游戏.在斗地主中,牌的大小关系根据牌的数码表示如下:3<4< ...

  9. 如何用Maven创建web项目

    使用eclipse插件创建一个web project 首先创建一个Maven的Project如下图 我们勾选上Create a simple project (不使用骨架) 这里的Packing 选择 ...

  10. [No000075]有没有安全的工作?

    如果你经常使用互联网,可能知道有一种东西叫做Flash. 它是一种软件,用来制作网页游戏.动画,以及视频播放器.只要观看网络视频,基本都会用到它. 七八年前,它是最热门的互联网技术之一.如果不安装Fl ...