uva 1597 Searching the Web
The word "search engine" may not be strange to you. Generally speaking, a search engine searches the web pages available in the Internet, extracts and organizes the information and responds to users' queries with the most relevant pages. World famous search engines, like GOOGLE, have become very important tools for us to use when we visit the web. Such conversations are now common in our daily life:
"What does the word like ****** mean?" "Um... I am not sure, just google it."
In this problem, you are required to construct a small search engine. Sounds impossible, does it? Don't worry, here is a tutorial teaching you how to organize large collection of texts efficiently and respond to queries quickly step by step. You don't need to worry about the fetching process of web pages, all the web pages are provided to you in text format as the input data. Besides, a lot of queries are also provided to validate your system. Modern search engines use a technique called inversion for dealing with very large sets of documents. The method relies on the construction of a data structure, called an inverted index,which associates terms (words) to their occurrences in the collection of documents. The set of terms of interest is called the vocabulary, denoted as V. In its simplest form, an inverted index is a dictionary where each search key is a term ω∈V. The associated value b(ω) is a pointer to an additional intermediate data structure, called a bucket. The bucket associated with a certain term ω is essentially a list of pointers marking all the occurrences of ω in the text collection. Each entry in each bucket simply consists of the document identifier (DID), the ordinal number of the document within the collection and the ordinal line number of the term's occurrence within the document. Let's take Figure-1 for an example, which describes the general structure. Assuming that we only have three documents to handle, shown at the right part in Figure-1; first we need to tokenize the text for words (blank, punctuations and other non-alphabetic characters are used to separate words) and construct our vocabulary from terms occurring in the documents. For simplicity, we don't need to consider any phrases, only a single word as a term. Furthermore, the terms are case-insensitive (e.g. we consider "book" and "Book" to be the same term) and we don't consider any morphological variants (e.g. we consider "books" and "book", "protected" and "protect" to be different terms) and hyphenated words (e.g. "middle-class" is not a single term, but separated into 2 terms "middle" and "class" by the hyphen). The vocabulary is shown at the left part in Figure-1.Each term of the vocabulary has a pointer to its bucket. The collection of the buckets is shown at the middle part in Figure-1. Each item in a bucket records the DID of the term's occurrence. After constructing the whole inverted index structure, we may apply it to the queries. The query is in any of the following formats: term term AND term term OR term NOT term A single term can be combined by Boolean operators: AND, OR and NOT ("term1 AND term2" means to query the documents including term1 and term2; "term1 OR term2" means to query the documents including term1 or term2; "NOT term1" means to query the documents not including term1). Terms are single words as defined above. You are guaranteed that no non-alphabetic characters appear in a term, and all the terms are in lowercase. Furthermore, some meaningless stop words (common words such as articles, prepositions, and adverbs, specified to be "the, a, to, and, or, not" in our problem) will not appear in the query, either. For each query, the engine based on the constructed inverted index searches the term in the vocabulary, compares the terms' bucket information, and then gives the result to user. Now can you construct the engine?

Input
Output
Sample Input
4
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.
**********
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
**********
Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results
**********
I am very very very happy!
What about you?
**********
6
computer
books AND computer
books OR protected
NOT security
very
slick
Sample Output
want the computer only to write her
----------
computer system) is essential to the
==========
intend to read his books. She might
want the computer only to write her
fees. Books might be the only way she
==========
intend to read his books. She might
fees. Books might be the only way she
----------
for works protected by copyright law
==========
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
----------
I am very very very happy!
What about you?
==========
I am very very very happy!
==========
Sorry, I found nothing.
========== 代码超时,改进后仍然超时,以下是交了两次后的超时代码
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <sstream>
#include <stdio.h> using namespace std; vector<string>lines[][]; //共不超过100篇文章,每篇文章不超过1500行
vector<string>::iterator t;
int line_num[], N; //line_num每篇文章的行数,N bool strCmp(const string a , const string b) //将a ,b都转化为大写字母比较,若相同返回true
{
int aLen = a.length();
int bLen = b.length();
bool flag = true; int p = ;
if(!isalpha(b[]))p = ;
//cout << a<<"a b--"<<b<<endl;
for(int i = ;i < aLen;i++){
if(tolower(a[i]) != tolower(b[p++])){
flag = false;
break;
}
} if(flag && p < bLen){
if(isalpha(b[p]))flag = false;
}
return flag;
} bool deal_find(string a,int p, int q) //在一行中查找a,若存在返回true
{
for(t = lines[p][q].begin();t != lines[p][q].end();t++){
if(strCmp(a,*t))return true;
}
return false;
} void output(int i,int j)
{
for(t = lines[i][j].begin();t != lines[i][j].end();t++){
if( t == lines[i][j].begin())cout<<*t;
else cout<<" "<<*t;
}
cout<<endl;
} bool WORD(string a, int k)
{
bool flag = false, re = false; for(int i = ; i < N; i++){
for(int j = ; j < line_num[i]; j++){ if(deal_find(a,i,j)){
re = true;
if(flag&&k)cout<<"----------"<<endl;
flag = true;
output(i,j);
} } }
return re;
} bool AND(string a, string b)
{
bool flag = false, re = false;
for(int i = ; i < N; i++){
int a0 = ,b0 = ; //分别记录在文章中有没有查找到字符串a或b
set<int> and_line;
for(int j = ; j < line_num[i]; j++){
if(deal_find(a,i,j)){
a0 = ;
and_line.insert(j);
}
if(deal_find(b,i,j)){
b0 = ;
and_line.insert(j);
}
}
if(a0 && b0){
re = true;
if(flag)cout<<"----------"<<endl;
flag = true;
set<int>::iterator iter;
for(iter=and_line.begin();iter!=and_line.end();iter++)output(i,*iter);
}
}
return re;
} bool NOT(string a)
{
bool flag , re = false, k = false;
for(int i = ; i < N; i++){
flag = false;
for(int j = ; j < line_num[i]; j++){
if(deal_find(a,i,j)){
flag = true;
break;
}
}
if(flag)continue;
else{
re = true;
if(k)cout<<"----------"<<endl;
k = true;
for(int j = ;j < line_num[i]; j++)output(i,j);
}
}
return re;
} int main()
{
int num1 = , M;
cin >> N;
int n = N;
while(n--){ //n篇文章输入
int num2 = ;
string line; while((getline(cin,line)) != NULL){
bool flag = true;
stringstream ss(line);
string word; while(ss >> word){
if( word[] == '*' ){
flag = false;
break;
}
lines[num1][num2].push_back(word);
} if(!flag)break;
num2++; }
line_num[num1] = num2;
num1++;
} cin >> M;
bool re1,re2;
string com;
getchar();
while(M--){
getline(cin, com);
if(com.find("AND") != string::npos){
re1 = AND(com.substr(,com.find_first_of(' ')), com.substr(com.find_last_of(' ')+));
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
else if(com.find("OR") != string::npos){
re1 = WORD(com.substr(,com.find_first_of(' ')) ,);
cout<<"----------"<<endl;
re2 = WORD(com.substr(com.find_last_of(' ')+) ,);
if(!re1&&!re2)cout << "Sorry, I found nothing."<<endl;
}
else if(com.find("NOT")!= string::npos){
re1 = NOT(com.substr(com.find_last_of(' ')+));
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
else {
re1 = WORD(com, );
if(!re1)cout << "Sorry, I found nothing."<<endl;
} cout << "==========" << endl;
} //system("pause");
return ;
}
最后分别在VS,CB上运行,发现主函数的返回值有问题,程序已经运行结束,然而程序仍没有退出。出现以下情况
需要再点一次回车
然后
程序内部的错误吧........越来越不懂计算机了....T_T
接下来又继续改,已经没有上面的问题了,而且感觉结果正确,但是!!!!!还是超时了!!!!
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <sstream>
#include <set>
#include <algorithm>
#include <iterator> using namespace std; vector<string>lines[][]; //共不超过100篇文章,每篇文章不超过1500行
vector<string>::iterator t;
int line_num[], N; #define FILE bool deal_find(string a,int p, int q) //在一行中查找a,若存在返回true
{
for(t = lines[p][q].begin();t != lines[p][q].end();t++){
int aLen = a.length(), bLen = (*t).length();
bool flag = true;
int p = ;
if(!isalpha((*t)[]))p = ;
for(int i = ;i < aLen;i++){
if(tolower(a[i]) != tolower((*t)[p++])){
flag = false;
break;
}
} if(flag && p < bLen){
if(isalpha((*t)[p]))flag = false;
}
if(flag)return true;
}
return false;
} void output(int i,int j)
{
for(t = lines[i][j].begin();t != lines[i][j].end();t++){
if( t == lines[i][j].begin())cout<<*t;
else cout<<" "<<*t;
}
cout<<endl;
} bool WORD(string a)
{
bool flag = false, re = false; for(int i = ; i < N; i++){ flag = false;int k = ;
for(int j = ; j < line_num[i]; j++){ if(deal_find(a,i,j)){
if(re)flag = true; //falg = true 说明前已有文章的片段输出
re = true;
if(flag && re && !k)cout<<"----------"<<endl;
k = ;
output(i,j);
} } }
return re;
} bool AND(string a, string b)
{
bool flag = false, re = false;
for(int i = ; i < N; i++){
int a0 = ,b0 = ; //分别记录在文章中有没有查找到字符串a或b
set<int> and_line;
for(int j = ; j < line_num[i]; j++){
if(deal_find(a,i,j)){
a0 = ;
and_line.insert(j);
}
if(deal_find(b,i,j)){
b0 = ;
and_line.insert(j);
}
}
if(a0 && b0){
re = true;
if(flag)cout<<"----------"<<endl;
flag = true;
set<int>::iterator iter;
for(iter=and_line.begin();iter!=and_line.end();iter++)output(i,*iter);
}
}
return re;
} bool OR(string a, string b)
{
bool flag = false, re = false;
for(int i = ; i < N; i++){
flag = true;
int k = ;
for(int j = ; j < line_num[i]; j++){
if(deal_find(a,i,j)){
if(flag&&k&&re){
cout<<"----------"<<endl;
k = ;
}
flag = false;
re = true;
output(i,j);
}
if(deal_find(b,i,j)) {
if(flag&&k&&re){
cout<<"----------"<<endl;
k = ;
}
flag = false;
re = true;
output(i,j);
}
}
}
return re;
}
bool NOT(string a)
{
bool flag , re = false, k = false;
for(int i = ; i < N; i++){
flag = false;
for(int j = ; j < line_num[i]; j++){
if(deal_find(a,i,j)){
flag = true;
break;
}
}
if(flag)continue;
else{
re = true;
if(k)cout<<"----------"<<endl;
k = true;
for(int j = ;j < line_num[i]; j++){
output(i,j);
}
}
}
return re;
} int main(int argc, char* argv[])
{
int M, num1 = ,num2 = ;
string line;
cin >> N;
cin.get(); for(int i = ; i <N; i++){
num2 = ;
while((getline(cin,line)) != NULL){
if(line == "**********") break; stringstream ss(line);
string word; while(ss >> word)lines[num1][num2].push_back(word);
num2++;
}
line_num[num1] = num2;
num1++;
} cin >> M;
bool re1,re2;
string com;
cin.get();
for(int i=;i<M;i++)
{
getline(cin,com);
if(com[]=='N')
{
re1 = NOT(com.substr(com.find_last_of(' ')+));
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
else if(com.find("AND")!=string::npos)
{
re1 = AND(com.substr(,com.find_first_of(' ')), com.substr(com.find_last_of(' ')+));
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
else if(com.find("OR")!=string::npos)
{
re1 = OR(com.substr(,com.find_first_of(' ')),com.substr(com.find_last_of(' ')+));
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
else
{
re1 = WORD(com);
if(!re1)cout << "Sorry, I found nothing."<<endl;
}
cout<<"=========="<<endl;
}
//system("pause");
return ;
}
醉了,这下足以说明是思路的问题了,思路不正确导致超时。
uva 1597 Searching the Web的更多相关文章
- [刷题]算法竞赛入门经典(第2版) 5-10/UVa1597 - Searching the Web
		题意:不难理解,照搬题意的解法. 代码:(Accepted,0.190s) //UVa1597 - Searching the Web //#define _XIENAOBAN_ #include&l ... 
- STL --- UVA 123 Searching Quickly
		UVA - 123 Searching Quickly Problem's Link: http://acm.hust.edu.cn/vjudge/problem/viewProblem.acti ... 
- Searching the Web论文阅读
		Searching the Web (Arvind Arasu etc.) 1. 概述 2000年,23%网页每天更新,.com域内网页40%每天更新.网页生存半衰期是10天.描述方法可用Pois ... 
- Searching the Web UVA - 1597
		The word "search engine" may not be strange to you. Generally speaking, a search engine ... 
- UVa 12505 Searching in sqrt(n)
		传送门 一开始在vjudge上看到这题时,标的来源是CSU 1120,第八届湖南省赛D题“平方根大搜索”.今天交题时CSU突然跪了,后来查了一下看哪家OJ还挂了这道题,竟然发现这题是出自UVA的,而且 ... 
- uva 123 Searching Quickly
		Searching Quickly Background Searching and sorting are part of the theory and practice of computer ... 
- 湖南省第八届大学生程序设计大赛原题 D - 平方根大搜索 UVA 12505 - Searching in sqrt(n)
		http://acm.hust.edu.cn/vjudge/contest/view.action?cid=30746#problem/D D - 平方根大搜索 UVA12505 - Searchin ... 
- POJ 2050 Searching the Web
		题意简述:做一个极其简单的搜索系统,对以下四种输入进行分析与搜索: 1. 只有一个单词:如 term, 只需找到含有这个单词的document,然后把这个document的含有这个单词term的那些行 ... 
- 【习题 5-10 UVA-1597】Searching the Web
		[链接] 我是链接,点我呀:) [题意] 在这里输入题意 [题解] 用map < string,vector < int > >mmap[100];来记录每一个数据段某个字符串 ... 
随机推荐
- AngularJS的$http服务的应用
			$http有很多参数和调用方法,本文只记录比较常用的应用及参数. $http 服务:只是简单封装了浏览器原生的XMLHttpRequest对象,接收一个参数,这个参数是一个对象,包含了用来生成HTTP ... 
- 8 fastJson的使用
			Fastjson介绍 Fastjson是一个Java语言编写的JSON处理器,由阿里巴巴公司开发. 1.遵循http://json.org标准,为其官方网站收录的参考实现之一. 2.功能qiang打, ... 
- coconHashMap实现原理分析
			1. HashMap的数据结构 数据结构中有数组和链表来实现对数据的存储,但这两者基本上是两个极端. 数组 数组存储区间是连续的,占用内存严重,故空间复杂的很大.但数组的二分查找时间复杂度小,为O(1 ... 
- java-web-dom4j解析XML-递归方式
			<?xml version="1.0" encoding="UTF-8"?><书架> <书 出版日期="2013-10 ... 
- ASP转PHP手记
			打算将动易网站管理系统移植到PHP环境中,寻寻觅觅了很多PHP内容管理网站,发现网上有动易转PHPCMS的代码,所以就拿定注意用PHPCMS的在google上找到一转换程序,动手做来还成功了,现将此次 ... 
- Lua绑定C++类
			原文:http://blog.csdn.net/chenee543216/article/details/12074771 以下是代码: Animal.h文件 #pragma once #ifndef ... 
- The Contiki build system 编译系统
			The Contiki build system======================== The Contiki build system is designed to make it eas ... 
- BIN和HEX文件的区别
			HEX文件和BIN文件是我们经常碰到的2种文件格式.下面简单介绍一下这2种文件格式的区别: 1.HEX文件是包括地址信息的,而BIN文件格式只包括了数据本身.在烧写或下载HEX文件的时候,一般都不需要 ... 
- 全局函数的Result一定要每次都初始化,否则上次的结果会被保持到下一次继续使用
			测试半天,原来是因为这个原因.下面例子中,Result:=''必须写,否则其结果会被累计,真是昏倒!! function MyPaths(tache: IXMLTaskType) : String; ... 
- UML--核心元素之业务实体
			如果说参与者和用例描述了我们在这个问题领域中达到什么样的目标,那么业务实体就描述了我们使用什么来达到业务目标以及通过什么来记录这个业务目标. 如果把问题领域比喻成一幢大楼的话,业务实体就是构成这幢大楼 ... 
 
			
		
