A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work: Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding. Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data. Example 1: data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2: data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 public class Solution {
public boolean validUtf8(int[] data) {
if (data==null || data.length==0) return false;
for (int i=0; i<data.length; i++) {
if (data[i] > 255) return false;
int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
if ((data[i] & 0b10000000) == 0) moreChecks = 0;
else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
else return false;
for (int j=1; j<=moreChecks; j++) {
if (i+j >= data.length) return false;
if ((data[i+j] & 0b11000000) != 0b10000000) return false;
}
i = i + moreChecks;
}
return true;
}
}

Leetcode: UTF-8 Validation的更多相关文章

  1. [LeetCode] 393. UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  2. [LeetCode] UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  3. LeetCode赛题393----UTF-8 Validation

    393. UTF-8 Validation A character in UTF8 can be from 1 to 4 bytes long, subjected to the following ...

  4. 【LeetCode】393. UTF-8 Validation 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/utf-8-va ...

  5. LeetCode All in One 题目讲解汇总(持续更新中...)

    终于将LeetCode的免费题刷完了,真是漫长的第一遍啊,估计很多题都忘的差不多了,这次开个题目汇总贴,并附上每道题目的解题连接,方便之后查阅吧~ 477 Total Hamming Distance ...

  6. leetcode N-Queens/N-Queens II, backtracking, hdu 2553 count N-Queens, dfs 分类: leetcode hdoj 2015-07-09 02:07 102人阅读 评论(0) 收藏

    for the backtracking part, thanks to the video of stanford cs106b lecture 10 by Julie Zelenski for t ...

  7. 393. UTF-8 Validation

    393. UTF-8 Validation 这个题很明确,刚开始我以为只能是一个utf,长度大于5的都判断为false,后来才明白题意. 有个小trick,就是长度大于1的时候,判断第一个数字开始1的 ...

  8. LeetCode 53. Maximum Subarray(最大的子数组)

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  9. LeetCode 26. Remove Duplicates from Sorted Array (从有序序列里移除重复项)

    Given a sorted array, remove the duplicates in place such that each element appear only once and ret ...

随机推荐

  1. CentOS 6.5 源码安装MySQL5.6.26

    1:下载安装cmake (mysql5.5以后是通过cmake来编译的) 2:创建mysql的安装目录及数据库存放目录 #mkdir /usr/mysql                 //安装my ...

  2. .Net使用CDO发送邮件,需安装注册的组件

    regsvr32 C:\Program Files\Common Files\System\ado\msado15.dll regsvr32 CDOEX.DLL

  3. 蓝牙 BLE GATT 剖析(二)-- GATT UUID and 举例

    generic attribute profile (GATT)The Generic Attributes (GATT) define a hierarchical data structure t ...

  4. Block的简单使用

    代码块本质上是和其他变量类似.不同的是,代码块存储的数据是一个函数体.使用代码块是,你可以像调用其他标准函数一样,传入参数,并得到返回值. 代码块本质上是变量,只不过它存储的数据是一个函数体,因此名字 ...

  5. sqlserver 通过convert取得指定格式的时间

    http://msdn.microsoft.com/zh-cn/library/ms187928(v=sql.105).aspx CONVERT(NVARCHAR(10),Created,112) 不 ...

  6. JS性能消耗在哪里?

    内部原因:构造,递归,循环,拷贝,动态执行,字符串操作等   1.过度的封装(过多的创建“庞大的”对象,但是如果在允许的条件下,面向对象的封装是可以提高维护性,而且符合我们的高内聚低耦合原则): 2. ...

  7. NSNotificationCenter带参

    (1)post Notification类 [[NSNotificationCenter defaultCenter] postNotificationName:CRMPerformanceNewCe ...

  8. OO之美

    ㈠ 设计的分寸 对于设计,还有很多看似"惯常"的法则与经验广泛存在于软件系统中,例如除了经典的23种设计设计模式.还有很多模式之外的模式,按照粒度的大小,系统的特点,规模的大小,而 ...

  9. Hibernate报错解决Error parsing JNDI name [],Need to specify class name

    能实现写数据,但是报错. 出错信息: 十月 21, 2016 3:46:18 下午 org.hibernate.Version logVersionINFO: HHH000412: Hibernate ...

  10. c# ini file

    ini文件主要用于保存配置.之前一直以为是当作普通文本进行操作,读取里面的内容,再自己解析读取的文本.后来发现已经有写好的api函数:WritePrivateProfileString()和GetPr ...