A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work: Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding. Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data. Example 1: data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2: data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 public class Solution {
public boolean validUtf8(int[] data) {
if (data==null || data.length==0) return false;
for (int i=0; i<data.length; i++) {
if (data[i] > 255) return false;
int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
if ((data[i] & 0b10000000) == 0) moreChecks = 0;
else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
else return false;
for (int j=1; j<=moreChecks; j++) {
if (i+j >= data.length) return false;
if ((data[i+j] & 0b11000000) != 0b10000000) return false;
}
i = i + moreChecks;
}
return true;
}
}

Leetcode: UTF-8 Validation的更多相关文章

  1. [LeetCode] 393. UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  2. [LeetCode] UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  3. LeetCode赛题393----UTF-8 Validation

    393. UTF-8 Validation A character in UTF8 can be from 1 to 4 bytes long, subjected to the following ...

  4. 【LeetCode】393. UTF-8 Validation 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/utf-8-va ...

  5. LeetCode All in One 题目讲解汇总(持续更新中...)

    终于将LeetCode的免费题刷完了,真是漫长的第一遍啊,估计很多题都忘的差不多了,这次开个题目汇总贴,并附上每道题目的解题连接,方便之后查阅吧~ 477 Total Hamming Distance ...

  6. leetcode N-Queens/N-Queens II, backtracking, hdu 2553 count N-Queens, dfs 分类: leetcode hdoj 2015-07-09 02:07 102人阅读 评论(0) 收藏

    for the backtracking part, thanks to the video of stanford cs106b lecture 10 by Julie Zelenski for t ...

  7. 393. UTF-8 Validation

    393. UTF-8 Validation 这个题很明确,刚开始我以为只能是一个utf,长度大于5的都判断为false,后来才明白题意. 有个小trick,就是长度大于1的时候,判断第一个数字开始1的 ...

  8. LeetCode 53. Maximum Subarray(最大的子数组)

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  9. LeetCode 26. Remove Duplicates from Sorted Array (从有序序列里移除重复项)

    Given a sorted array, remove the duplicates in place such that each element appear only once and ret ...

随机推荐

  1. linux回到上次目录与历史命令查找快捷方式

    # cd -进入上次访问目录 二.历史命令搜索操作快捷键:[Ctrl + r], [Ctrl + p], [Ctrl + n] 在终端中按捉 [Ctrl] 键的同时 [r] 键,出现提示:(rever ...

  2. 控制变量法-初中物理-Nobel Lecture, December 12, 1929-php执行SET GLOBAL connect_timeout=2效果

    $link = mysqli_connect("localhost", "wu", "wp", "wdb"); $sql ...

  3. A BRIEF HISTORY OF COMPUTERS

    COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION Vacuum Tubes   Transi ...

  4. ubuntu日志清理

    由于ubuntu日志文件syslog 和 kern.log 时刻在增长,一会儿就使得根目录文件夹不够用了,需使用如下命令清理 sudo -i输入密码echo  > /var/log/syslog ...

  5. SQL控制语句中内置函数讲解

    一.伪表.系统内置的只有一行一列数据的表.常用来执行函数. select 函数名 from dual 注:以下所有函数为了方便理解均用 伪表 做为事例! 二. 时间函数 1.sysdate:获取数据库 ...

  6. NRF51822之动态广播使用

    本教程基于nRF51_SDK_10.0.0_dc26b5e\examples\ble_peripheral\ble_app_uart工程 本教程主要是演示 现在演示通过nus来修改ADV中maufac ...

  7. 转 创建 JavaScript XML 文档注释

    http://www.cnblogs.com/chenxizhang/archive/2009/07/12/1522058.html 如何:创建 JavaScript XML 文档注释 Visual ...

  8. netbeans环境搭建

    1.下载文件http://pan.baidu.com/s/1kUu52mV 2.安装. 3.设置字体颜色,原先的太亮,我设置了保护色,参照sublime 我设置的字体高亮效果http://pan.ba ...

  9. 【译】Android系统架构

    让我们来快速预览一下整个android系统的架构.从下面的图中我们可以发现,这个架构分为几个不同的层,底层为上一层提供服务.  Linux Kernel android系统建立在一个坚固的基石上:Li ...

  10. Lazarus for Raspbian安装

    春节前看到树莓派 2代开始销售,第一时间在淘宝下单购买,无奈春节期间放假,要到3月份才可能收到,只能用QEMU模拟器先熟悉树莓系统.对从turbo Pascal开始的人来讲,如果能在树莓系统使用Pas ...