Problem K. UTF-8 Decoder

题目连接:

http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&all_runs=1&action=140

Description

UTF-8 is a character encoding capable of encoding all possible characters, or code points, in Unicode.

Nowadays UTF-8 is the dominant character encoding for the World Wide Web, accounting for 85.1% of

all Web pages in September 2015.

Peter works in a large company as a software engineer and develops a new Internet search engine.

Its crawler needs a UTF-8 decoder to parse Web pages and put them into index. Peter has already

checked if there are any ready-made solutions available. He used his own search engine to look for opensource

implementations on the Web and found nothing that satisfied him. Several huge libraries following

‘batteries included’ philosophy were rejected because they are too heavy and contain tons of code. Several

small but relevant libraries didn’t get to the top of search results page because Peter’s search engine is

not perfect at present. . . So Peter decided to invent the wheel and write his custom lightweight UTF-8

decoder.

Let’s define a code point as an integer from range [0, 2

31). One code point is encoded into variable-length

sequence of 8-bit units (bytes).

The design of UTF-8 can be seen in this table (the x characters are replaced by the bits of the code point):

One-byte codes are used only for the ASCII code point values 0 through 127. In this case the UTF-8 code

has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that

ASCII text is valid UTF-8.

Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and

one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while

continuation bytes all have 10 in the high-order position. UTF-8 offers clear distinction between multibyte

and single-byte characters. The high order bits of every byte determine the type of byte; single bytes

(0xxxxxxx), leading bytes (11xxxxxx), and continuation bytes (10xxxxxx) do not share values.

The number of high-order 1s in the leading byte of a multi-byte sequence indicates the number of bytes

in the sequence. The remaining bits of the encoding (the x bits in the above patterns) are used for the

bits of the code point being encoded, padded with high-order 0s if necessary. The high-order bits go in

the lead byte, lower-order bits in succeeding continuation bytes.

The standard specifies that the correct encoding of a code point use only the minimum number of bytes

required to hold the significant bits of the code point. Longer encodings are called overlong and are not

valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between

code points and their valid encodings, so that there is a unique valid encoding for each code point. This

ensures that string comparisons and searches are well-defined.

Modern real-life UTF-8 encoding contains more restrictions. For instance, RFC 3629 removed all 5-, 6-

byte sequences and some 4-byte sequences in order to match the constraints of the UTF-16 character

encoding. Peter wants his decoder to be flexible and to be able to decode as much texts as possible, that’s

why Peter does not implement these additional restrictions.

Input

The first line of input contains an integer N (1 ≤ N ≤ 100 000). The second line contains the values of

N bytes (in range between 0 and 255 each, inclusive) given in hexadecimal. A value consists of two hex

digits. The symbols 0–9 represent digits zero to nine, and A, B, C, D, E, F represent digits ten to fifteen.

Values of bytes are separated by single spaces.

Output

If the input sequence of N bytes can be decoded successfully into sequence of L code points, then in the

first line print the number L and in the second line print L code point values (31-bit integers in usual

decimal notation with no leading zeros) separated by spaces.

If the input cannot be decoded, output a single line Epic Fail.

Sample Input

1

24

Sample Output

1

36

Hint

题意

其实就是让你写一个UTF8译码器。

大体上来说,0xxxxx表示这是一个单bit的,10xxxx表示这个是填补的,11xxx0xxxx这个表示后面有多少个bit

然后把所有的数转成二进制就好了。

你还得判断是否非法。

然后这个数必须得使用最简单的表示方法才行。

题解:

模拟题,把题目讲的东西全部模拟一遍就好了……

代码

 #include<bits/stdc++.h>
using namespace std; string s[100005];
string tmp;
vector<long long>ans;
string get(char c){
if(c=='0')return "0000";
if(c=='1')return "0001";
if(c=='2')return "0010";
if(c=='3')return "0011";
if(c=='4')return "0100";
if(c=='5')return "0101";
if(c=='6')return "0110";
if(c=='7')return "0111";
if(c=='8')return "1000";
if(c=='9')return "1001";
if(c=='A')return "1010";
if(c=='B')return "1011";
if(c=='C')return "1100";
if(c=='D')return "1101";
if(c=='E')return "1110";
if(c=='F')return "1111";
}
int main(){
int n;
scanf("%d",&n);
for(int i=0;i<n;i++){
cin>>tmp;
s[i]+=get(tmp[0]);
s[i]+=get(tmp[1]);
}
string now; for(int i=0;i<n;i++){
if(s[i][0]=='1'&&s[i][1]=='0'){
printf("Epic Fail");
return 0;
}
int j;
for(j=0;j<s[i].size();j++)
if(s[i][j]=='0')break;
if(j==8||j==7){
printf("Epic Fail");
return 0;
}
for(int t=j+1;t<s[i].size();t++)
now+=s[i][t];
if(n<i+j){
printf("Epic Fail");
return 0;
}
for(int t=i+1;t<i+j;t++){
if(s[t][0]!='1'||s[t][1]!='0'){
printf("Epic Fail");
return 0;
}
for(int k=2;k<s[t].size();k++)
now+=s[t][k];
}
if(j!=0)i=i+j-1;
reverse(now.begin(),now.end());
int k = 0;
for(int t=0;t<now.size();t++)
if(now[t]=='1')k=t;
if(now.size()==11&&k<7){
printf("Epic Fail");
return 0;
}
if(now.size()==16&&k<11){
printf("Epic Fail");
return 0;
}
if(now.size()==21&&k<16){
printf("Epic Fail");
return 0;
}
if(now.size()==26&&k<21){
printf("Epic Fail");
return 0;
}
if(now.size()==31&&k<26){
printf("Epic Fail");
return 0;
}
long long tmp = 1;
long long Ans = 0;
for(int t=0;t<now.size();t++){
if(now[t]=='1')Ans+=tmp;
tmp*=2;
}
ans.push_back(Ans);
now="";
}
cout<<ans.size()<<endl;
for(int i=0;i<ans.size();i++)
cout<<ans[i]<<" ";
cout<<endl;
}

Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem K. UTF-8 Decoder 模拟题的更多相关文章

  1. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem I. Alien Rectangles 数学

    Problem I. Alien Rectangles 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c ...

  2. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem H. Parallel Worlds 计算几何

    Problem H. Parallel Worlds 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7 ...

  3. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem F. Turning Grille 暴力

    Problem F. Turning Grille 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c70 ...

  4. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem C. Cargo Transportation 暴力

    Problem C. Cargo Transportation 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed ...

  5. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem G. k-palindrome dp

    Problem G. k-palindrome 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022 ...

  6. Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem A. A + B

    Problem A. A + B 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&al ...

  7. 2010 NEERC Western subregional

    2010 NEERC Western subregional Problem A. Area and Circumference 题目描述:给定平面上的\(n\)个矩形,求出面积与周长比的最大值. s ...

  8. 2009-2010 ACM-ICPC, NEERC, Western Subregional Contest

    2009-2010 ACM-ICPC, NEERC, Western Subregional Contest 排名 A B C D E F G H I J K L X 1 0 1 1 1 0 1 X ...

  9. 【GYM101409】2010-2011 ACM-ICPC, NEERC, Western Subregional Contest

    A-Area and Circumference 题目大意:在平面上给出$N$个三角形,问周长和面积比的最大值. #include <iostream> #include <algo ...

随机推荐

  1. 流媒体技术学习笔记之(十)HLS协议直播延时优化(35s到10S)

    1.首先要了解HLS延时的机制,也就是为什么会延时,延时主要发生在什么地方. HTTP Live Streaming 并不是一个真正实时的流媒体系统,这是因为对应于媒体分段的大小和持续时间有一定潜在的 ...

  2. 在Ubuntu14.04 64bit上搭建单机Spark环境,IDE为Intelli IDEA

    在Ubuntu14.04 64bit上搭建单机Spark环境,IDE为Intelli IDEA 一. 环境 Ubuntu14.04 64位    JDK 1.8.0_73    scala-2.10. ...

  3. Flex 经验笔记一

    Module页面嵌套子Module页面直接用标签嵌入是不行的,无法显示出来,需要用到 ModuleManager 使用ModuleInfo 的 addEventListener 判断当子Module ...

  4. shell 判断脚本参数

    测试登陆脚本 ./test.sh -p 123 -P 3306 -h 127.0.0.1 -u root #!/bin/sh ];then echo "USAGE: $0 -u user - ...

  5. Linux - 磁盘操作

    Linux 磁盘常见操作 : df -Ph # 查看硬盘容量 df -T # 查看磁盘分区格式 df -i # 查看inode节点 如果inode用满后无法创建文件 du -h 目录 # 检测目录下所 ...

  6. source insigh安装使用

    下载和安装: 最好去官网下载(http://www.sourceinsight.com/),最新版本是3.5. 第一次去六维下载了sourceinsight,免安装,但是打开后发现界面没有任何窗口,全 ...

  7. IE下常见兼容性问题总结

    概述 本小菜平时主要写后台程序,偶尔也会去写点前端页面,写html.css.js的时候,会同时开着ie6.ie7.ie8.ie9.chrome.firefox等浏览器进行页面测试,和大部分前端开发一样 ...

  8. python3光学字符识别模块tesserocr与pytesseract

    OCR,即Optical Character Recognition,光学字符识别,是指通过扫描字符,然后通过其形状将其翻译成电子文本的过程,对应图形验证码来说,它们都是一些不规则的字符,这些字符是由 ...

  9. AF_INET域与AF_UNIX域socket通信原理对比

    原文 1.  AF_INET域socket通信过程 典型的TCP/IP四层模型的通信过程. 发送方.接收方依赖IP:Port来标识,即将本地的socket绑定到对应的IP端口上,发送数据时,指定对方的 ...

  10. 微信小程序Http高级封装 es6 promise

    公司突然要开放微信小程序,持续蒙蔽的我还不知道小程序是个什么玩意. 于是上网查了一下,就开始着手开发..... 首先开发客户端的东西,都有个共同点,那就是  数据请求! 看了下小程序的请求方式大概和a ...