Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem K. UTF-8 Decoder 模拟题
Problem K. UTF-8 Decoder
题目连接:
http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&all_runs=1&action=140
Description
UTF-8 is a character encoding capable of encoding all possible characters, or code points, in Unicode.
Nowadays UTF-8 is the dominant character encoding for the World Wide Web, accounting for 85.1% of
all Web pages in September 2015.
Peter works in a large company as a software engineer and develops a new Internet search engine.
Its crawler needs a UTF-8 decoder to parse Web pages and put them into index. Peter has already
checked if there are any ready-made solutions available. He used his own search engine to look for opensource
implementations on the Web and found nothing that satisfied him. Several huge libraries following
‘batteries included’ philosophy were rejected because they are too heavy and contain tons of code. Several
small but relevant libraries didn’t get to the top of search results page because Peter’s search engine is
not perfect at present. . . So Peter decided to invent the wheel and write his custom lightweight UTF-8
decoder.
Let’s define a code point as an integer from range [0, 2
31). One code point is encoded into variable-length
sequence of 8-bit units (bytes).
The design of UTF-8 can be seen in this table (the x characters are replaced by the bits of the code point):
One-byte codes are used only for the ASCII code point values 0 through 127. In this case the UTF-8 code
has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that
ASCII text is valid UTF-8.
Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and
one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while
continuation bytes all have 10 in the high-order position. UTF-8 offers clear distinction between multibyte
and single-byte characters. The high order bits of every byte determine the type of byte; single bytes
(0xxxxxxx), leading bytes (11xxxxxx), and continuation bytes (10xxxxxx) do not share values.
The number of high-order 1s in the leading byte of a multi-byte sequence indicates the number of bytes
in the sequence. The remaining bits of the encoding (the x bits in the above patterns) are used for the
bits of the code point being encoded, padded with high-order 0s if necessary. The high-order bits go in
the lead byte, lower-order bits in succeeding continuation bytes.
The standard specifies that the correct encoding of a code point use only the minimum number of bytes
required to hold the significant bits of the code point. Longer encodings are called overlong and are not
valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between
code points and their valid encodings, so that there is a unique valid encoding for each code point. This
ensures that string comparisons and searches are well-defined.
Modern real-life UTF-8 encoding contains more restrictions. For instance, RFC 3629 removed all 5-, 6-
byte sequences and some 4-byte sequences in order to match the constraints of the UTF-16 character
encoding. Peter wants his decoder to be flexible and to be able to decode as much texts as possible, that’s
why Peter does not implement these additional restrictions.
Input
The first line of input contains an integer N (1 ≤ N ≤ 100 000). The second line contains the values of
N bytes (in range between 0 and 255 each, inclusive) given in hexadecimal. A value consists of two hex
digits. The symbols 0–9 represent digits zero to nine, and A, B, C, D, E, F represent digits ten to fifteen.
Values of bytes are separated by single spaces.
Output
If the input sequence of N bytes can be decoded successfully into sequence of L code points, then in the
first line print the number L and in the second line print L code point values (31-bit integers in usual
decimal notation with no leading zeros) separated by spaces.
If the input cannot be decoded, output a single line Epic Fail.
Sample Input
1
24
Sample Output
1
36
Hint
题意
其实就是让你写一个UTF8译码器。
大体上来说,0xxxxx表示这是一个单bit的,10xxxx表示这个是填补的,11xxx0xxxx这个表示后面有多少个bit
然后把所有的数转成二进制就好了。
你还得判断是否非法。
然后这个数必须得使用最简单的表示方法才行。
题解:
模拟题,把题目讲的东西全部模拟一遍就好了……
代码
#include<bits/stdc++.h>
using namespace std;
string s[100005];
string tmp;
vector<long long>ans;
string get(char c){
if(c=='0')return "0000";
if(c=='1')return "0001";
if(c=='2')return "0010";
if(c=='3')return "0011";
if(c=='4')return "0100";
if(c=='5')return "0101";
if(c=='6')return "0110";
if(c=='7')return "0111";
if(c=='8')return "1000";
if(c=='9')return "1001";
if(c=='A')return "1010";
if(c=='B')return "1011";
if(c=='C')return "1100";
if(c=='D')return "1101";
if(c=='E')return "1110";
if(c=='F')return "1111";
}
int main(){
int n;
scanf("%d",&n);
for(int i=0;i<n;i++){
cin>>tmp;
s[i]+=get(tmp[0]);
s[i]+=get(tmp[1]);
}
string now;
for(int i=0;i<n;i++){
if(s[i][0]=='1'&&s[i][1]=='0'){
printf("Epic Fail");
return 0;
}
int j;
for(j=0;j<s[i].size();j++)
if(s[i][j]=='0')break;
if(j==8||j==7){
printf("Epic Fail");
return 0;
}
for(int t=j+1;t<s[i].size();t++)
now+=s[i][t];
if(n<i+j){
printf("Epic Fail");
return 0;
}
for(int t=i+1;t<i+j;t++){
if(s[t][0]!='1'||s[t][1]!='0'){
printf("Epic Fail");
return 0;
}
for(int k=2;k<s[t].size();k++)
now+=s[t][k];
}
if(j!=0)i=i+j-1;
reverse(now.begin(),now.end());
int k = 0;
for(int t=0;t<now.size();t++)
if(now[t]=='1')k=t;
if(now.size()==11&&k<7){
printf("Epic Fail");
return 0;
}
if(now.size()==16&&k<11){
printf("Epic Fail");
return 0;
}
if(now.size()==21&&k<16){
printf("Epic Fail");
return 0;
}
if(now.size()==26&&k<21){
printf("Epic Fail");
return 0;
}
if(now.size()==31&&k<26){
printf("Epic Fail");
return 0;
}
long long tmp = 1;
long long Ans = 0;
for(int t=0;t<now.size();t++){
if(now[t]=='1')Ans+=tmp;
tmp*=2;
}
ans.push_back(Ans);
now="";
}
cout<<ans.size()<<endl;
for(int i=0;i<ans.size();i++)
cout<<ans[i]<<" ";
cout<<endl;
}
Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem K. UTF-8 Decoder 模拟题的更多相关文章
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem I. Alien Rectangles 数学
Problem I. Alien Rectangles 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem H. Parallel Worlds 计算几何
Problem H. Parallel Worlds 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem F. Turning Grille 暴力
Problem F. Turning Grille 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c70 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem C. Cargo Transportation 暴力
Problem C. Cargo Transportation 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem G. k-palindrome dp
Problem G. k-palindrome 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022 ...
- Western Subregional of NEERC, Minsk, Wednesday, November 4, 2015 Problem A. A + B
Problem A. A + B 题目连接: http://opentrains.snarknews.info/~ejudge/team.cgi?SID=c75360ed7f2c7022&al ...
- 2010 NEERC Western subregional
2010 NEERC Western subregional Problem A. Area and Circumference 题目描述:给定平面上的\(n\)个矩形,求出面积与周长比的最大值. s ...
- 2009-2010 ACM-ICPC, NEERC, Western Subregional Contest
2009-2010 ACM-ICPC, NEERC, Western Subregional Contest 排名 A B C D E F G H I J K L X 1 0 1 1 1 0 1 X ...
- 【GYM101409】2010-2011 ACM-ICPC, NEERC, Western Subregional Contest
A-Area and Circumference 题目大意:在平面上给出$N$个三角形,问周长和面积比的最大值. #include <iostream> #include <algo ...
随机推荐
- bzoj千题计划248:bzoj3697: 采药人的路径
http://www.lydsy.com/JudgeOnline/problem.php?id=3697 点分治 路径0改为路径-1 g[i][0/1] 和 f[i][0/1]分别表示当前子树 和 已 ...
- jQuery1.11源码分析(3)-----Sizzle源码中的浏览器兼容性检测和处理[原创]
上一章讲了正则表达式,这一章继续我们的前菜,浏览器兼容性处理. 先介绍一个简单的沙盒测试函数. /** * Support testing using an element * @param {Fun ...
- TED_Topic7:How we unearthed the spinosaurus
By Nizar Ibrahim A 50-foot-long carnivore who hunted its prey in rivers 97 million years ago, the sp ...
- 20155338 2016-2017-2 《Java程序设计》第7周学习总结
20155338 2016-2017-2 <Java程序设计>第7周学习总结 教材学习内容总结 本周学习了第十二章和第十三章的内容,我重点学习了第十三章时间与日期的相关内容. 时间的度量: ...
- django+mysql安装和设置
之前我们已经用sqlite建立了第一个web app.今天来学习如何在django中使用MySQL. 首先需要安装MySQL,到官网下载安装包:https://dev.mysql.com/downlo ...
- Hibernate5总结
1. 明确Hibernate是一个实现了ORM思想的框架,它封装了JDBC,是程序员可以用对象编程思想来操作数据库. 2. 明确ORM(对象关系映射)是一种思想,JPA(Java Persistenc ...
- linux中使用corntab和shell脚本自动备份nginx日志,按天备份
编写shell脚本,实现nginx日志每天自动备份到指定文件夹! 需要的命令mv , corntab -e(定时任务),shell脚本 这里先说一下corntab: https://www.cnblo ...
- 用代码截图去理解MVC原理
[概述] 看了蒋金楠先生的<Asp.Net Mvc框架揭密>,这本书详细地讲解了mvc的原理,很深奥也很复杂,看了几遍才将就明白了一点.他在第一章用了一个他自己写的mvc框架作为例子,代码 ...
- springMVC版本和jdk版本不匹配造成的问题
一个简单的例子项目,使用springMVC的版本是3.2,jdk的版本是1.7,使用的是注解的处理器适配器和处理器映射器.spring的xml配置文件中单独配置每个handler,可以正常的使用,如果 ...
- maven-replacer-plugin 静态资源版本号解决方案(css/js等)
本文介绍如何使用 maven 的 com.google.code.maven-replacer-plugin 插件来自动添加版本号,防止浏览器缓存. 目录 1.解决方案 2.原始文件和最终生成效果 3 ...