SWERC13 Trending Topic
map暴力。
。。
Imagine you are in the hiring process for a company whose principal activity is the analysis
of information in the Web. One of the tests consists in writing a program for maintaining up to
date a set of trending topics. You will be hired depending on the efficiency of your solution.
They provide you with text from the most active blogs. The text is organised daily and you
have to provide the sorted list of the N most frequent words during the last 7 days, when asked.
INPUT
Each input file contains one test case. The text corresponding to a day is delimited by tag
<text>. Queries of top N words can appear between texts corresponding to two different days.
A top N query appears as a tag like <top 10 />. In order to facilitate you the process of reading
from input, the number always will be delimited by white spaces, as in the sample.
Notes:
• All words are composed only of lowercase letters of size at most 20.
• The maximum number of different words that can appear is 20000.
• The maximum number of words per day is 20000.
• Words of length less than four characters are considered of no interest.
• The number of days will be at most 1000.
• 1 ≤ N ≤ 20
OUTPUT
The list of N most frequent words during the last 7 days must be shown given a query. Words
must appear in decreasing order of frequency and in alphabetical order when equal frequency.
There must be shown all words whose counter of appearances is equal to the word
at position N. Even if the amount of words to be shown exceeds N.
SAMPLE INPUT
<text>
imagine you are in the hiring process of a company whose
main business is analyzing the information that appears
in the web
</text>
<text>
a simple test consists in writing a program for
maintaining up to date a set of trending topics
</text>
<text>
you will be hired depending on the efficiency of your solution
</text>
<top 5 />
<text>
they provide you with a file containing the text
corresponding to a highly active blog
</text>
<text>
the text is organized daily and you have to provide the
sorted list of the n most frequent words during last week
when asked
</text>
<text>
each input file contains one test case the text corresponding
to a day is delimited by tag text
</text>
<text>
the query of top n words can appear between texts corresponding
to two different days
</text>
<top 3 />
<text>
blah blah blah blah blah blah blah blah blah
please please please
</text>
<top 3 />
2
Problem IProblem I
Trending Topic
SAMPLE OUTPUT
<top 5>
analyzing 1
appears 1
business 1
company 1
consists 1
date 1
depending 1
efficiency 1
hired 1
hiring 1
imagine 1
information 1
main 1
maintaining 1
process 1
program 1
simple 1
solution 1
test 1
that 1
topics 1
trending 1
whose 1
will 1
writing 1
your 1
</top>
<top 3>
text 4
corresponding 3
file 2
provide 2
test 2
words 2
</top>
<top 3>
blah 9
text 4
corresponding 3
please 3
</top>
#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
#include <string>
#include <map>
#include <vector> using namespace std; typedef pair<int,int> pII; map<string,int> Hash;
vector<int> dy[11];
string rHash[20200];
int day_sum[11][20200];
char cache[30];
int now=9,pre=0,id=1;
int arr[20020],na;
string rss[20020];
bool vis[20020]; void DEBUG(int x)
{
int sz=dy[x].size();
for(int i=0;i<sz;i++)
{
cout<<"ID: "<<dy[x][i]<<" : "<<rHash[dy[x][i]]<<endl;
cout<<"sum: "<<day_sum[x][dy[x][i]]<<endl;
}
} struct RSP
{
int times;
string word;
}rsp[20020]; bool cmpRSP(RSP a,RSP b)
{
if(a.times!=b.times)
return a.times>b.times;
else
return a.word<b.word;
} void get_top(int now,int k)
{
int sz=dy[now].size();
na=0;
int _7dayago=(now+3)%10;
memset(vis,false,sizeof(vis));
for(int i=0;i<sz;i++)
{
if(vis[dy[now][i]]==false)
{
arr[na++]=day_sum[now][dy[now][i]]-day_sum[_7dayago][dy[now][i]];
vis[dy[now][i]]=true;
}
}
sort(arr,arr+na);
int sig=arr[max(0,na-k)];
int rn=0;
memset(vis,false,sizeof(vis));
for(int i=0;i<sz;i++)
{
int times=day_sum[now][dy[now][i]]-day_sum[_7dayago][dy[now][i]];
if(times >= sig &&vis[dy[now][i]]==false)
{
rsp[rn++]=(RSP){times,rHash[dy[now][i]]};
vis[dy[now][i]]=true;
}
}
sort(rsp,rsp+rn,cmpRSP);
printf("<top %d>\n",k);
for(int i=0;i<rn;i++)
{
cout<<rsp[i].word<<" "<<rsp[i].times<<endl;
}
printf("</top>\n");
} int main()
{
while(scanf("%s",cache)!=EOF)
{
if(strcmp(cache,"<text>")==0)
{
///read cache
pre=now;
now=(now+1)%10;
dy[now]=dy[pre];
memcpy(day_sum[now],day_sum[pre],sizeof(day_sum[0]));
///7 day ago ....
while(scanf("%s",cache))
{
if(cache[0]=='<') break;
if(strlen(cache)<4) continue;
string word=cache;
if(Hash[word]==0)
{
rHash[id]=word;
Hash[word]=id++;
}
int ID=Hash[word];
if(day_sum[pre][ID]==0)
dy[now].push_back(ID);
day_sum[now][ID]++;
}
}
else if(strcmp(cache,"<top")==0)
{
int top;
scanf("%d",&top); scanf("%s",cache);
get_top(now,top);
}
}
return 0;
}
SWERC13 Trending Topic的更多相关文章
- UVA 12686 Trending Topic
Trending Topic Time limit: 1.000 seconds Imagine you are in the hiring process for a company whose p ...
- USER STORIES AND USE CASES - DON’T USE BOTH
We’re in Orlando for a working session as part of the Core Team building BABOK V3 and over dinner th ...
- [转载]Three Trending Computer Vision Research Areas, 从CVPR看接下来几年的CV的发展趋势
As I walked through the large poster-filled hall at CVPR 2013, I asked myself, “Quo vadis Computer V ...
- Kafka 如何读取offset topic内容 (__consumer_offsets)
众所周知,由于Zookeeper并不适合大批量的频繁写入操作,新版Kafka已推荐将consumer的位移信息保存在Kafka内部的topic中,即__consumer_offsets topic,并 ...
- Kafka如何创建topic?
Kafka创建topic命令很简单,一条命令足矣:bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-facto ...
- Kafka0.8.2.1删除topic逻辑
前提条件: 在启动broker时候开启删除topic的开关,即在server.properties中添加: delete.topic.enable=true 命令: bin/kafka-topics ...
- [bigdata] kafka基本命令 -- 迁移topic partition到指定的broker
版本 0.9.2 创建topic bin/kafka-topics.sh --create --topic topic_name --partition 6 --replication-factor ...
- Kafka vs RocketMQ——多Topic对性能稳定性的影响-转自阿里中间件
引言 上期我们对比了RocketMQ和Kafka在多Topic场景下,收发消息的对比测试,RocketMQ表现稳定,而Kafka的TPS在64个Topic时可以保持13万,到了128个Topic就跌至 ...
- Kafka vs RocketMQ—— Topic数量对单机性能的影响-转自阿里中间件
引言 上一期我们对比了三类消息产品(Kafka.RabbitMQ.RocketMQ)单纯发送小消息的性能,受到了程序猿们的广泛关注,其中大家对这种单纯的发送场景感到并不过瘾,因为没有任何一个网站的业务 ...
随机推荐
- 4.Flask-alembic数据迁移工具
alembic是用来做ORM模型与数据库的迁移与映射.alembic使用方式跟git有点类似,表现在两个方面,第一个,alemibi的所有命令都是以alembic开头: 第二,alembic的迁移文件 ...
- A - Fox And Snake
Problem description Fox Ciel starts to learn programming. The first task is drawing a fox! However, ...
- α&β测试的定义及结束的标准
α测试在系统开发接近完成时对应用系统的测试:测试后仍然会有少量的设计变更.这种测试一般由最终用户或其他人员完成,不能由程序或测试员完成. β测试当开发和测试根本完成时所做的用例,最终的错误和问题需要在 ...
- 使用 CSS 追踪用户
原文地址:Crooked Style Sheets 作者:jbtronics 除了使用 JS 追踪用户,现在有人提出了还可以使用 CSS 进行网页追踪和分析,译者认为,这种方式更为 优雅,更为 简洁, ...
- 【Android】实例 忐忑的精灵
在Android Studio中创建项目,名称为“Animation And Multimedia”,然后在该项目中创建一个Module,名称为“Frame-By-Frame Animation”.在 ...
- github添加公钥出现 github ssh key Key is invalid. Ensure you've copied the file correctly的解决办法
因为在公钥查看的时候可能是利用了vim明明查看,所以会有换行,导致这个错误,解决方法是用cat命令查看文件,或者其他方式查看,总之公钥不能有换行.
- 课上练习 script
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...
- C#让电脑发声,播放声音
//1. [DllImport("Kernel32.dll")] //引入命名空间 using System.Runtime.InteropServices; public sta ...
- 图像局部显著性—点特征(SURF)
1999年的SIFT(ICCV 1999,并改进发表于IJCV 2004,本文描述):参考描述:图像特征点描述. 参考原文:SURF特征提取分析 本文有大量删除,如有疑义,请参考原文. SURF对SI ...
- 数据库操作(一)DML
1.数据库 数据库可视为电子化的文件柜——存储电子文件的处所,用户可以对文件中的数据进行新增.查询.更新.删除等操作. 所谓“数据库”是以一定方式储存在一起.能与多个用户共享.具有尽可能小的冗余度.与 ...