Unicode, UTF, ASCII, ANSI format differences
Going down your list:
- "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
- UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
- UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
- UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
- UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET
Utf32Stringclass as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.) - ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
- ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained viaEncoding.Default, and is often Windows-1252 but can be other locales.
There's more on my Unicode page and tips for debugging Unicode problems.
The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.
Unicode, UTF, ASCII, ANSI format differences的更多相关文章
- 字符集、字符编码、国际化、本地化简要总结(UNICODE/UTF/ASCII/GB2312/GBK/GB18030)
PS:要转载请注明出处,本人版权所有. PS: 这个只是基于<我自己>的理解, 如果和你的原则及想法相冲突,请谅解,勿喷. 环境说明 普通的linux 和 普通的windows. ...
- Unicode(UTF&UCS)深度历险
Unicode(UTF&UCS)深度历险 计算机网络诞生后,大家慢慢地发现一个问题:一个字节放不下一个字符了!因为需要交流,本地化的文字需要能够被支持. 最初的字符集使用7bit来存储字符,因 ...
- UNICODE与ASCII
1.ASCII的特点 ASCII 是用来表示英文字符的一种编码规范.每个ASCII字符占用1 个字节,因此,ASCII 编码可以表示的最大字符数是255(00H—FFH).这对于英文而言,是没有问题的 ...
- utf-8,Unicode和ASCII区别
一.ASCII 码 我们知道,计算机内部,所有信息最终都是一个二进制值.每一个二进制位(bit)有0和1两种状态,因此八个二进制位就可以组合出256种状态,这被称为一个字节(byte).也就是说,一个 ...
- V++ MFC CEdit输出数组 UNICODE TO ASCII码
MFC怎么在静态编辑框中输出数组 //字符转ASCII码void CUTF8Dlg::OnBnClickedButtonCharAscii(){ // TODO: 在此添加控件通知处理程序代码 Upd ...
- 创建文件夹并解决解决unicode和ASCII码转换的问题
# -*- coding: UTF-8 -*-import sysimport timeimport os #解决unicode和ASCII码转换的问题reload(sys) #解决unicode和A ...
- c# Unicode 转换 ASCII
/// <summary> /// Unicode 转换 ASCII /// </summary> /// <param name="theText" ...
- python2.7 处理unicode和ascii字符串混用问题
python2.7默认的编码方式为ascii码,如下可以查询: import sys sys.getdefaultencoding() 如果直接在unicode和ascii字符串之间做计算.比较.连接 ...
- ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
转载请注明原文地址:https://www.cnblogs.com/cnodoo/p/9281366.html ValueError: All strings must be XML compati ...
随机推荐
- javascript-装饰者模式
装饰者模式笔记 在不改变原对象的基础上, 通过对其进行包装拓展(添加属性或方法)使原有对象可以满足用户的更复杂要求. 需求不是一成不变的,需求会不断改进,以增强用户体验 demo实例:对输入框添加fo ...
- javascript-代理模式
JavaScript代理模式笔记 由于一个对象不能直接引用另一个对象,所以要用过代理对象在这两个对象之间起到中介作用 1.代理对象形式是通过script标签 demo实例实现的方式也被人称之为JSON ...
- 【转】Hive的insert操作
insert 语法格式为: 1. 基本的插入语法: insert overwrite table tablename [partition(partcol1=val1,partclo2=val2)] ...
- 【hive】——Hive sql语法详解
Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构 化的数据文件映射为一张数据库表,并提供完整的SQL查 ...
- SQL Server DAC——专用管理员连接
今天打开数据库刚要连接时,看到“连接到服务器”窗口,突发的想到:要是SQL Server 不再响应正常的连接请求,又想使用数据库时,我们该怎么办? 其实我们还能通过“SQL Server D ...
- redis 源码阅读 数值转字符 longlong2str
redis 在底层中会把long long转成string 再做存储. 主个功能是在sds模块里. 下面两函数是把long long 转成 char 和 unsiged long long 转成 ...
- Serial Communication Protocol Design Hints And Reference
前面转载的几篇文章详细介绍了UART.RS-232和RS-485的相关内容,可以知道,串口通信的双方在硬件层面需要约定如波特率.数据位.校验位和停止位等属性,才可以正常收发数据.实际项目中使用串口通信 ...
- Save()saveOrUpdate()Hibernate的merge()方法
Save save()方法能够保存实体到数据库,正如方法名称save这个单词所表明的意思.我们能够在事务之外调用这个方法,这也是我不喜欢使用这个方法保存数据的原因.假如两个实体之间有关系(例如empl ...
- insert、update select from
1.insert select from <一棵树-博客园> 收集整理,转载请注明出处! 使用SELECT INTO 和 INSERT INTO SELECT 表复制语句了. 1.INSE ...
- 访问服务端的HttpProxy
tip:加密部分暂时先注释掉 package com.zqc.share.manager.framework; import java.net.HttpURLConnection; import ja ...