A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work: Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding. Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data. Example 1: data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001. Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2: data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100. Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 public class Solution {
public boolean validUtf8(int[] data) {
if (data==null || data.length==0) return false;
for (int i=0; i<data.length; i++) {
if (data[i] > 255) return false;
int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
if ((data[i] & 0b10000000) == 0) moreChecks = 0;
else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
else return false;
for (int j=1; j<=moreChecks; j++) {
if (i+j >= data.length) return false;
if ((data[i+j] & 0b11000000) != 0b10000000) return false;
}
i = i + moreChecks;
}
return true;
}
}

Leetcode: UTF-8 Validation的更多相关文章

  1. [LeetCode] 393. UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  2. [LeetCode] UTF-8 Validation 编码验证

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte char ...

  3. LeetCode赛题393----UTF-8 Validation

    393. UTF-8 Validation A character in UTF8 can be from 1 to 4 bytes long, subjected to the following ...

  4. 【LeetCode】393. UTF-8 Validation 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/utf-8-va ...

  5. LeetCode All in One 题目讲解汇总(持续更新中...)

    终于将LeetCode的免费题刷完了,真是漫长的第一遍啊,估计很多题都忘的差不多了,这次开个题目汇总贴,并附上每道题目的解题连接,方便之后查阅吧~ 477 Total Hamming Distance ...

  6. leetcode N-Queens/N-Queens II, backtracking, hdu 2553 count N-Queens, dfs 分类: leetcode hdoj 2015-07-09 02:07 102人阅读 评论(0) 收藏

    for the backtracking part, thanks to the video of stanford cs106b lecture 10 by Julie Zelenski for t ...

  7. 393. UTF-8 Validation

    393. UTF-8 Validation 这个题很明确,刚开始我以为只能是一个utf,长度大于5的都判断为false,后来才明白题意. 有个小trick,就是长度大于1的时候,判断第一个数字开始1的 ...

  8. LeetCode 53. Maximum Subarray(最大的子数组)

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  9. LeetCode 26. Remove Duplicates from Sorted Array (从有序序列里移除重复项)

    Given a sorted array, remove the duplicates in place such that each element appear only once and ret ...

随机推荐

  1. Nginx_Lua

    http://www.ttlsa.com/nginx/nginx-lua/ 1.1. 介绍ngx_lua – 把lua语言嵌入nginx中,使其支持lua来快速开发基于nginx下的业务逻辑该模块不在 ...

  2. cPanel设置自定义404错误页

    利用这个cpanel的错误页工具,你就可以定制错误页面了.设置自定义404错误页,有两种简单的方法. 一,利用cpanel后台控制面板添加设置404自定义错误页的方法 步骤       1.登录cPa ...

  3. OmniThreadLibrary 3.03b发布了

    虽然版本号升的不大,但这也是一个重要的版本.作者发现了一个长期存在的bug,就是建立一个线程,如果不指定线程的优先级则默认设置为idle.(正确的应是Normal) 看一下具体的改动情况: 新功能: ...

  4. String作为方法参数传递 与 引用传递

    String作为方法参数传递 String 和 StringBuffer的区别见这里: http://wenku.baidu.com/view/bb670f2abd64783e09122bcd.htm ...

  5. php面向对象之__toString()

    似曾相识,在php面向对象编程之魔术方法__set,曾经介绍了什么是魔术方法,这一章又介绍一个魔术方法__tostring(). __toString()是快速获取对象的字符串信息的便捷方式,似乎魔术 ...

  6. fatal error C1189: #error : WINDOWS.H already included. MFC apps must not #include <windows.h>

    给对话框添加类, 报错 CalibrateMFCDlg.h(6) : error C2504: “CDialog”: 未定义基类 等多个错误 加上 #include "afxwin.h&qu ...

  7. BLE-NRF51822教程17-DFU使用手机升级

    演示的工程是 [application]    nRF51_SDK_10.0.0_dc26b5e\examples\ble_peripheral\ble_app_hrs\pca10028\s110_w ...

  8. Qt系统托盘

    Qt的系统托盘的使用,可比mfc中好多了!他封装了一个专门的QSystemTrayIcon类,建立系统托盘图标.其实在Qt提供的示例程序已经很不错了,$QTDIR\examples\desktop\s ...

  9. the differences between function and procedure

    一.自定义函数: 1. 可以返回表变量 2. 限制颇多,包括 不能使用output参数: 不能用临时表: 函数内部的操作不能影响到外部环境: 不能通过select返回结果集: 不能update,del ...

  10. 利用快速排序原理找出数组中前n大的数

    #include <stdio.h> #include <stdint.h> #include <stdlib.h> #define MAX_SIZE 400001 ...