[转帖]字符集 AL32UTF8 和 UTF8

https://blog.51cto.com/comtv/383254#

文章标签职场休闲字符集 AL32UTF8 和 UTF8文章分类数据库阅读数1992

The difference between UTF8 and AL32UTF8 are:

UTF8 stores Unicode characters with code points > U+FFFF as two surrogate characters, three bytes each

AL32UTF8 stores Unicode characters with code points > U+FFFF as one four-byte character

UTF8 will not be updated anymore when new Unicode versions are released, only AL32UTF8 and AL16UTF16 will.

Due to compatibility problems with pre-9i versions use UTF8 if you have Oracle8i clients connecting to the database. Use AL32UTF8 in pure Oracle9i environment.

UTF-8 encoding is variable-width. In UTF-8, each character can be represented by either one, two, or three bytes.

UTF8 is a varying width 1-3 bytes per character Unicode encoding. It is supported for both database and national character sets. It is a binary superset of US7ASCII. UTF8 corresponds to Unicode CESU-8 encoding.

AL32UTF8 is a varying width 1-4 bytes per character. It is supported for CHAR, VARCHAR2, LONG and CLOB only (database character set). It is a binary superset of UTF8 (in 9.2 only) and US7ASCII. AL32UTF8 corresponds to Unicode UTF-8 encoding.

This is what Metalink says. In Note: 237593.1

There is a possible problem for 817 and lower versions: Problems connecting to AL32UTF8 databases from older versions 8i and lower.

The default UTF8 characterset for 9i/10G is AL32UTF8, however this characterset is NOT recognised by any pre-9i clients/server systems.

Oracle recommends that you use UTF8 instead of AL32UTF8 as database characterset if you have 8i (or older) servers and clients connecting to the 9i/10g system until you can upgrade the older versions.

UTF8 is unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is Unicode 3.0 in 9.0.1, Unicode 3.1 in 9.2, Unicode 3.2 in 10.1 and Unicode 4.01 in 10.2

Besides the difference in Unicode version the "big difference" is that AL32UTF8 has build in support for "Surrogate Pairs" (also known as Surrogate characters or "Supplementary characters").
Practically this means that in 99% of the cases you can use UTF8 instead of AL32UTF8 without any problem.

There are only a few situations where Surrogate Pairs are already used on client side. Windows system with HKSCS2001 (hong kong extension) is one of those. Note that you actually can *store* Surrogate Pairs in UTF8 but will store 2 * 3 byte characters and not like AL32UTF8 one 4 byte character.

Note that if you now use UTF8 as database characterset and -in the future you do a roll out of new 9i or higher clients and all your other databases are upgraded to 9i or higher, you can simply do a alter database characterset to go from UTF8 to AL32UTF8 so downtime will be limited to a few minutes if the need to go to AL32UTF8 should arise. There is no performance impact on staying on UTF8

NOTE:
This note is ONLY relevant if you have already a 9i AL32UTF8 database with data in. If you still need to create the 9i system then choose UTF8 instead of AL32UTF8 as database characterset in the database creation assistant.

So, *IF* you have already a 9i system running with AL32UTF8 then you can use the following steps in this note to change the database characterset to UTF8 without losing data.

You can't simply use "ALTER DATABASE CHARACTERSET" to go from AL32UTF8 to UTF8 because UTF8 is a SUB-set of AL32UTF8 (some codepoints which are correct in AL32UTF8 are not known in UTF8)

But again, UTF8 *contains* all characters know in AL32UTF8, the difference between them is pure the way some characters are stored (AL32UTF8 is a bit more efficient for some characters)

So you will run into ORA-12712 if you try alter database.....

source:http://space.itpub.net/160455/viewspace-499942

[转帖]字符集 AL32UTF8 和 UTF8的更多相关文章

oracle 字符集 AL32UTF8、UTF8
简介:ORACLE数据库字符集,即Oracle全球化支持(Globalization Support),或即国家语言支持(NLS)其作用是用本国语言和格式来存储.处理和检索数据.利用全球化支持,ORA ...
AL32UTF8 and UTF8 and ZHS16GBK
About AL32UTF8 ORACLE数据库字符集,即Oracle全球化支持(Globalization Support), 或即国家语言支持(NLS)其作用是用本国语言和格式来存储.处理和检索数 ...
Oracle11g字符集AL32UTF8修改为ZHS16GBK详解
此问题发生在数据库迁移过程中.源数据库:自己笔记本上win7 64位系统的oracle11g个人版,字符集ZHS16GBK :目标数据库,HP的sqlserver2008 系统 64位数据库服务器,字 ...
Oracle11g字符集AL32UTF8修改为ZHS16GBK详解【转】
------感谢作者,确实解决了问题.分享下,希望帮到更多人此问题发生在数据库迁移过程中.源数据库:自己笔记本上win7 64位系统的oracle11g个人版,字符集ZHS16GBK :目标数据库, ...
Oracle 11g修改字符集AL32UTF8为ZHS16GBK
oracle11g更改字符集AL32UTF8为ZHS16GBK当初安装oracle的时候选择的默认安装,结果字符集不是以前经常用的16GBK,要改字符集,从网上找到了方法并试了一下,果然好用! 具体如 ...
MySQL 解决 emoji表情的方法，使用utf8mb4 字符集(4字节 UTF-8 Unicode 编码)
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #454545} span.s1 {font: ...
Oracle 12C 新特性之 db默认字符集AL32UTF8、PDB支持不同字符集
一. db默认字符集AL32UTF8Specify the database character set when you create the database. Starting from Ora ...
（原创）Linux下MySQL 5.5/5.6的修改字符集编码为UTF8（彻底解决中文乱码问题）
« CloudStack+XenServer详细部署方案(10):高级网络功能应用 (总结)CentOS Linux 5.x在GPT分区不能引导的解决方法 » 2013-1 11 (原创)Linux下 ...
mac 设置 MySQL 数据库默认编码（字符集）为 UTF-8
mac 设置 MySQL 数据库默认编码(字符集)为 UTF-8 原文链接:https://juejin.im/post/5bbdca76e51d45021147de44 鉴于有些刚接触 MySQ ...
centos6.5和centos7.5统一字符集为zh_CN.UTF-8解决系统和MySQL数据库乱码问题
linux的服务器需要做的操作 centos6.5下: 修改默认字符集为 zh_CN.UTF-8,如果没有中文语言包可能需要安装中文语言包支持 [root@meinv01 ~]# yum groupi ...

随机推荐

Kiractf
信息收集主机发现和端口扫描只开放了80的web服务 ‍ WEB打点访问首页有文件上传,肯定可以利用一波.language那个页面甚至文件包含都写脸上了. root@Lockly tmp/ki ...
JavaFx之横向布局左右两侧对齐（十九）
JavaFx之横向布局左右两侧对齐(十九) 横向布局HBox在子节点A.B中添加<HBox HBox.hgrow="ALWAYS"></HBox> 即可做到 ...
Java中单体应用锁的局限性&分布式锁
互联网系统架构的演进在互联网系统发展之初,系统比较简单,消耗资源小,用户访问量也比较少,我们只部署一个Tomcat应用就可以满足需求.系统架构图如下: 一个Tomcat可以看作是一个JVM进程,当大 ...
java中synchronized和ReentrantLock的加锁和解锁能在不同线程吗？如果能，如何实现？
java中synchronized和ReentrantLock的加锁和解锁能在不同线程吗?如果能,如何实现? 答案2023-06-21: java的: 这个问题,我问了一些人,部分人是回答得有问题的. ...
SignalR：React + ASP.NET Core Api
一. 后台WebApi配置: 注:Vision为业务名称,以此为例,可随意修改 1. 安装包:Microsoft.AspNetCore.SignalR 2. 注入 Startup.cs Configu ...
详解CCE服务：一站式告警配置和云原生日志视图
本文分享自华为云社区<新一代云原生可观测平台之CCE服务日志和告警篇>,作者:云容器大未来. 告警和日志是运维人员快速定位问题.恢复异常的主要手段.运维人员日常的工作模式往往是先接收告警信 ...
Web 全栈开发利器: 强大的在线 Cloud IDE
摘要:近年来,敏捷.DevOps的理念已逐步成为主流.基于云计算的开发环境也正获得越来越多开发者的青睐.不难想象,云端IDE已成未来的趋势. 学了Web全栈开发,就得动手实践,要动手,得先有开发环境. ...
GaussDB(DWS)字符串处理函数返回错误结果集排查
摘要:在使用字符串处理函数时,有时会出现非预期结果的场景.在排除使用问题后,应该从encoding和数据本身开始排查. 本文分享自华为云社区<GaussDB(DWS)字符串处理函数返回错误结果集 ...
详解NLP和时序预测的相似性【附赠AAAI21最佳论文INFORMER的详细解析】
摘要:本文主要分析自然语言处理和时序预测的相似性,并介绍Informer的创新点. 前言时序预测模型无外乎RNN(LSTM, GRU)以及现在非常火的Transformer.这些时序神经网络模型的主 ...
揭秘GES超大规模图计算引擎HyG：图切分
摘要:GES大规模图计算引擎HyG通过实现不同的点边分区算法,可以灵活地供用户选择多种多样的切分策略,进而达到更好的运算性能. 本文分享自华为云社区<GES超大规模图计算引擎HyG揭秘之图切分& ...

[转帖]字符集 AL32UTF8 和 UTF8

[转帖]字符集 AL32UTF8 和 UTF8的更多相关文章

随机推荐

热门专题