sim卡中的汉字存储格式
Sim卡中的ucs2格式
Sim卡中的中文都是以ucs2格式存储的,ucs2和unicode只是字节序不同,unicode是小头在前,ucs2是大头在前。
Ucs2与GB2312互换可以用VC中的WideCharToMultiByte以及MultiByteToWideChar函数。
Ucs2本身有3种格式,常用的是80格式,即80开头,每两个字节表示一个字符,还有81,82格式,后两种可以用一个字节表示一个汉字。80,81,82,gb2312在特定条件下可以互换。
下面对规范做一些简要解释
Annex B (normative):
Coding of Alpha fields in the SIM for UCS2
If 16 bit UCS2 characters as defined in ISO/IEC 10646 [31] are being used in an alpha field, the coding can take one of three forms. If the ME supports UCS2 coding of alpha fields in the SIM, the ME shall support all three coding schemes for character sets containing 128 characters or less; for character sets containing more than 128 characters, the ME shall at least support the first coding scheme. If the alpha field record contains GSM default alphabet characters only, then none of these schemes shall be used in that record. Within a record, only one coding scheme, either GSM default alphabet, or one of the three described below, shall be used.
1) If the first octet in the alpha string is '80', then the remaining octets are 16 bit UCS2 characters, with the more significant octet (MSO) of the UCS2 character coded in the lower numbered octet of the alpha field, and the less significant octet (LSO) of the UCS2 character is coded in the higher numbered alpha field octet, i.e. octet 2 of the alpha field contains the more significant octet (MSO) of the first UCS2 character, and octet 3 of the alpha field contains the less significant octet (LSO) of the first UCS2 character (as shown below). Unused octets shall be set to 'FF', and if the alpha field is an even number of octets in length, then the last (unusable) octet shall be set to 'FF'.
Example 1
Octet 1 |
Octet 2 |
Octet 3 |
Octet 4 |
Octet 5 |
Octet 6 |
Octet 7 |
Octet 8 |
Octet 9 |
'80' |
Ch1MSO |
Ch1LSO |
Ch2MSO |
Ch2LSO |
Ch3MSO |
Ch3LSO |
'FF' |
'FF' |
这话的意思是说,以80开头的,是ucs2格式,大头在前,小头在后,不用的字节用FF填充。
举例,汉字“中国”,其
GB2312内码是 D6D0B9FA,
用ucs2的80方案表示是 4E2D56FD
2) If the first octet of the alpha string is set to '81', then the second octet contains a value indicating the number of characters in the string, and the third octet contains an 8 bit number which defines bits 15 to 8 of a 16 bit base pointer, where bit 16 is set to zero, and bits 7 to 1 are also set to zero. These sixteen bits constitute a base pointer to a "half-page" in the UCS2 code space, to be used with some or all of the remaining octets in the string. The fourth and subsequent octets in the string contain codings as follows; if bit 8 of the octet is set to zero, the remaining 7 bits of the octet contain a GSM Default Alphabet character, whereas if bit 8 of the octet is set to one, then the remaining seven bits are an offset value added to the 16 bit base pointer defined earlier, and the resultant 16 bit value is a UCS2 code point, and completely defines a UCS2 character.
Example 2
Octet 1 |
Octet 2 |
Octet 3 |
Octet 4 |
Octet 5 |
Octet 6 |
Octet 7 |
Octet 8 |
Octet 9 |
'81' |
'05' |
'13' |
'53' |
'95' |
'A6' |
'XX' |
'FF' |
'FF' |
In the above example;
- Octet 2 indicates there 5 characters in the string.
- Octet 3 indicates bits 15 to 8 of the base pointer, and indicates a bit pattern of 0hhh hhhh h000 0000 as the 16 bit base pointer number. Bengali characters for example start at code position 0980 (0000 1001 1000 0000), which is indicated by the coding '13' in octet 3 (shown by the italicised digits).
- Octet 4 indicates GSM Default Alphabet character '53', i.e. "S".
- Octet 5 indicates a UCS2 character offset to the base pointer of '15', expressed in binary as follows 001 0101, which, when added to the base pointer value results in a sixteen bit value of 0000 1001 1001 0101, i.e. '0995', which is the Bengali letter KA.
- Octet 8 contains the value 'FF', but as the string length is 5, this a valid character in the string, where the bit pattern 111 1111 is added to the base pointer, yielding a sixteen bit value of 0000 1001 1111 1111 for the UCS2 character (i.e. '09FF').
这段话的意思是说,81格式中,有一个基址,然后在这个基址上用一个字节表示一个ucs2,如果要进行ucs2显示,首先要算出来基址,然后每个字节算出来一个16bit的ucs2 80格式码。
有了80格式码,就容易了。
在格式上,81是标识,后面是一个字节的长度,再后面是基址,基址要左移7位,低位以及高位都置成0,具体看英文吧,最后是数据。
由于格式限制,所以81格式只有表示255个字符,且这255个字符在ucs2 80编码中,最多有127个英文与128个中文,而且这128个中文的ucs2 80格式编码一定在相邻的128个范围内。因为,中文只能用80-ff来表示,所以最多容纳128个中文和127个英文,所以一个值是30和80的处理方法是不一样的,30直接表示'0',而80要用基址来计算,(82格式也是这样)
举例,汉字 一丁丂七丄丅丆万丈三
GB2312内码 D2BBB6A18140C6DF814181428143CDF2D5C9C8FD
80格式编码 4E004E014E024E034E044E054E064E074E084E09 (连续的)
81编码 0A9C80818283848586878889 (连续的)
3) If the first octet of the alpha string is set to '82', then the second octet contains a value indicating the number of characters in the string, and the third and fourth octets contain a 16 bit number which defines the complete 16 bit base pointer to a "half-page" in the UCS2 code space, for use with some or all of the remaining octets in the string. The fifth and subsequent octets in the string contain codings as follows; if bit 8 of the octet is set to zero, the remaining 7 bits of the octet contain a GSM Default Alphabet character, whereas if bit 8 of the octet is set to one, the remaining seven bits are an offset value added to the base pointer defined in octets three and four, and the resultant 16 bit value is a UCS2 code point, and defines a UCS2 character.
Example 3
Octet 1 |
Octet 2 |
Octet 3 |
Octet 4 |
Octet 5 |
Octet 6 |
Octet 7 |
Octet 8 |
Octet 9 |
'82' |
'05' |
'05' |
'30' |
'2D' |
'82' |
'D3' |
'2D' |
'31' |
In the above example
- Octet 2 indicates there are 5 characters in the string.
- Octets 3 and 4 contain a sixteen bit base pointer number of '0530', pointing to the first character of the Armenian character set.
- Octet 5 contains a GSM Default Alphabet character of '2D', which is a dash "-".
- Octet 6 contains a value '82', which indicates it is an offset of '02' added to the base pointer, resulting in a UCS2 character code of '0532', which represents Armenian character Capital BEN.
- Octet 7 contains a value 'D3', an offset of '53', which when added to the base pointer results in a UCS2 code point of '0583', representing Armenian Character small PIWR.
82格式编码与81类似,不同的就是81格式以一个字节表示基址,82是以2个字节为基址。
举例,汉字 一丁丂七丄丅丆万丈三
GB2312内码 D2BBB6A18140C6DF814181428143CDF2D5C9C8FD
80格式编码 4E004E014E024E034E044E054E064E074E084E09 (连续的)
81编码 0A9C80818283848586878889 (连续的)
82编码 0A4E0080818283848586878889 (连续的)
错误纠正
81,82编码中,中文最多128个,英文为0-256个,因为可以全是英文的。
sim卡中的汉字存储格式的更多相关文章
- SIM卡中UCS2编码的三种格式(80,81,82)分析
网上看到一篇比较好的说ucs2编码的文章,保存一下,原文地址: http://hi.baidu.com/youren4548/blog/item/fa08bd1bf61005058618bf1d.ht ...
- simtrace之探秘SIM卡中的世界
0×00 关于SIM卡 众所周知SIM卡是一张插在手机上的小卡,其全称为Subscriber Identity Module 客户识别模块.不过,这个世界上并没有多少人知道SIM卡中的操作系统是基于j ...
- android中判断sim卡状态和读取联系人资料的方法
在写程序中,有时候可能需要获取sim卡中的一些联系人资料.在获取sim卡联系人前,我们一般会先判断sim卡状态,找到sim卡后再获取它的资料,如下代码我们可以读取sim卡中的联系人的一些信息. Pho ...
- 十九、android中判断sim卡状态和读取联系人资料的方法
在写程序中,有时候可能需要获取sim卡中的一些联系人资料.在获取sim卡联系人前,我们一般会先判断sim卡状态,找到sim卡后再获取它的资料,如下代码我们可以读取sim卡中的联系人的一些信息. Pho ...
- Android向手机通讯录中的所有的联系人(包括SIM卡),向手机通讯录中插入联系人
package com.example.myapi.phonepersion; import java.util.ArrayList; import java.util.List; import an ...
- Android 判断SIM卡属于哪个移动运营商
第一种方法:获取手机的IMSI码,并判断是中国移动\中国联通\中国电信 TelephonyManager telManager = (TelephonyManager) getSystemServic ...
- Android本机号码及Sim卡状态的获取
SIM卡存储的数据可分为四类:第一类是固定存放的数据.这类数据在移动电话机被出售之前由SIM卡中心写入,包括国际移动用户识别号(IMSI).鉴权密钥(KI).鉴权和加密算法等等.第二类是暂时存放的有关 ...
- SIM卡应用-OPN,PLMN,SPN
SIM卡应用 移动运营商已经将SIM卡用於很多不同的应用,下面列出了其中最主要的应 用∶ ·漫游应用∶确保手机可以在漫游之後选择缺省的运营商网络.一个SIM应用是可以在手机漫游到某个合作夥伴运营商网络 ...
- SIM卡里的文件
SIM卡里的所有文件按树来组织:主文件MF(Master File)——每一块SIM卡只有一个唯一的主文件, 其他所有文件都是它的子孙, 主文件只有文件头,里面存放着整个SIM卡的控制和管理信息专用文 ...
随机推荐
- zoj1025 Wooden Sticks
DAG转移,从切题的数量来看是一道水题,给你n个棒,大的可以延续小的,问最少上升子序列的个数. 其实这道题是用贪心来写的,因为这是个有向无环图,到达分叉口,每一条路都要便历,所以每条路应该一样对待,有 ...
- 商品列表中显示类别名称而不是类别ID
商品表中的字段包裹商品信息和categoryid 若要在商品列表中显示出categoryname,有两种做法: 第一种做法: 拿到categoryid后再跟数据库连接一下,然后拿出categoryna ...
- struts项目中添加的jar包
一般我们使用struts时,添加的jar如下: commons-fileupload-1.2.1.jar commons-io-1.3.2.jar freemarker-2.3.16.jar java ...
- 在vim保存时获得sudo权限
在维护线上服务的时候,经常要编辑一些不属于操作用户的文件,比如只有r权限的文件,每次保存都会提示read only.这时可以使用如下命令代替原有的 :wq 命令 :w !sudo tee % 命令:w ...
- 如何使用getopt()函数解析参数
最近在写程序的过程中,把一部分时间都花费在程序对参数的处理上.今天听了学长说到getopt函数,才发现原来c里面还有一个专门解决参数处理的函数,查询了相关资料,这里简单总结一下. 使用int main ...
- Eclipse配色插件
1.打开Help -- Eclipse Marketplace 2.搜索Eclipse Color Theme,点击Install 3.安装完成后点击Window -- Preference -- A ...
- 第52周二Restful
今天去spring官网发现一个关键词:Restful,以前只在与一个系统对接时用到过这种形式的接口,但印象不深,百度搜索后才感觉自己太out了,这个概念2000年提出,2009年时国内就有很多人推荐使 ...
- HDOJ1166 敌兵布阵
赤裸裸的线段树,借个模板,改写一下即可. 代码: #include<iostream> #include<cstdio> #include<stdio.h> #in ...
- VS2013启动项目调试的时候会启动本地IIS
VS2013启动项目调试的时候会启动本地IIS ,而在这种状态下去调试程序,会有很多功能用不了,而且还会有错误:如下图: 解决方法,将托管管道模式更改一下就行了:
- JS计算两个日期时间之差之天数不正确
做了一个时间倒计时,发现天数总是不正确. js代码: //定义目标日期 var targetTime = new Date(); //目标日期 targetTime.setFullYear(2015, ...