A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work: Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding. Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data. Example 1: data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2: data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 public class Solution {
public boolean validUtf8(int[] data) {
if (data==null || data.length==0) return false;
for (int i=0; i<data.length; i++) {
if (data[i] > 255) return false;
int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
if ((data[i] & 0b10000000) == 0) moreChecks = 0;
else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
else return false;
for (int j=1; j<=moreChecks; j++) {
if (i+j >= data.length) return false;
if ((data[i+j] & 0b11000000) != 0b10000000) return false;
}
i = i + moreChecks;
}
return true;
}
}

Leetcode: UTF-8 Validation的更多相关文章

  1. [LeetCode] 393. UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  2. [LeetCode] UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  3. LeetCode赛题393----UTF-8 Validation

    393. UTF-8 Validation A character in UTF8 can be from 1 to 4 bytes long, subjected to the following ...

  4. 【LeetCode】393. UTF-8 Validation 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/utf-8-va ...

  5. LeetCode All in One 题目讲解汇总(持续更新中...)

    终于将LeetCode的免费题刷完了,真是漫长的第一遍啊,估计很多题都忘的差不多了,这次开个题目汇总贴,并附上每道题目的解题连接,方便之后查阅吧~ 477 Total Hamming Distance ...

  6. leetcode N-Queens/N-Queens II, backtracking, hdu 2553 count N-Queens, dfs 分类: leetcode hdoj 2015-07-09 02:07 102人阅读 评论(0) 收藏

    for the backtracking part, thanks to the video of stanford cs106b lecture 10 by Julie Zelenski for t ...

  7. 393. UTF-8 Validation

    393. UTF-8 Validation 这个题很明确,刚开始我以为只能是一个utf,长度大于5的都判断为false,后来才明白题意. 有个小trick,就是长度大于1的时候,判断第一个数字开始1的 ...

  8. LeetCode 53. Maximum Subarray(最大的子数组)

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  9. LeetCode 26. Remove Duplicates from Sorted Array (从有序序列里移除重复项)

    Given a sorted array, remove the duplicates in place such that each element appear only once and ret ...

随机推荐

  1. Programs vs. processes

    Computer Systems A Programmer's Perspective Second Edition This is a good place to pause and make su ...

  2. fatal error C1854: 无法覆盖在创建对象文件.obj”的预编译头过程中形成的信息

    原因: 将stdafx.cpp 的预编译头属性  由 创建预编译头(/Yc) 改成了 使用预编译头(/Yu) 解决: 改回为 创建预编译头(/Yc) 参考文档 http://blog.csdn.net ...

  3. Android Studio工具修理集

    本文来自http://blog.csdn.net/liuxian13183/ ,引用必须注明出处! 1.Common依赖项目找不到.因为主项目没有引进setting.gradle 2.从Eclipse ...

  4. Linux的常用基本命令

    Linux的常用基本命令. 首先启动Linux.启动完毕后需要进行用户的登录,选择登陆的用户不同自然权限也不一样,其中“系统管理员”拥有最高权限. 在启动Linux后屏幕出现如下界面显示: …… Re ...

  5. json的eval为什么要用msg.d

    在做一个关于搜索功能时用到了jquery autocomplete,发现返回数据时都用到了一个.d,比如: var datas = eval('(' + msg.d + ')'); 这个.d是什么呢, ...

  6. CSS知识点补充

    一.css框模型概述 元素框的最内部分是实际的内容,直接包围内容的是内边距.内边距呈现了元素的背景.内边距的边缘是边框.边框以外是外边距,外边距默认是透明的,因此不会遮挡其后的任何元素 1.css内边 ...

  7. iOS7上TableViewCell的button和UIImageView个别未显示的bug

    要做这个cell,用xib将cell做成之后,在iPhone6.6Plus.5s上运行良好,但是在iOS7的5s和iPad上,黄色的小星星和下载按钮均没有显示. 甚为惊奇. 在网上百度之,发现了解决办 ...

  8. gulp教程

    1. http://www.tuicool.com/articles/FJVNZf 2.http://www.ydcss.com/archives/18 3.手动创建package.json: 如:c ...

  9. php文件和目录操作函数

    文件:打开和关闭:fopen(), fclose()读:readfile(), file(), file_get_contents(), fgets(), fgetss(), fgetc()写:fwr ...

  10. iOS: 实现微信支付

    一.介绍: 现在的消费越来越方便,直接带个手机用各种三方的支付平台进行支付就行,例如微信.支付宝.现在正好我所做的项目中用到了微信支付,今天就来整理一下. 二.准备: 1.去微信官方开发者平台注册开发 ...