[0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".
Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.
First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".
Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.
So, what are the contents appropriate to children?
Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)
1.Notion -- encyclopedia
2.Material -- picture, music, video
3.Entertainment -- game
4.Study -- homework, knowledge
Moreover, there are some kinds of content that children don't need:
1.advertisement
2.adult(mature) content
3.sexual or homosexual content
4.sidebar(ad. or adult content or useless for children mostly)
Now that we have known what children need, what we should do next is to tackle them one by one.
What the technology we will use?
After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.
Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.
More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.
[0.0]Analysis of Baidu search engine的更多相关文章
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- dotnet cli 5.0 新特性——dotnet tool search
dotnet cli 5.0 新特性--dotnet tool search Intro .NET 5.0 SDK 的发布,给 dotnet cli 引入了一个新的特性,dotnet tool sea ...
- 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。
周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...
- Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation Ethical Hacking Boot Camp OUR MOST POPULAR COURSE! CLI ...
- Known BREAKING CHANGES from NH3.3.3.GA to 4.0.0
Build 4.0.0.Alpha1 ============================= ** Known BREAKING CHANGES from NH3.3.3.GA to 4.0. ...
- Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models 2019-06-13 10:2 ...
- 未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11.0.0.0, Culture=neutral, PublicKeyToken...
刚开始看老师 用VS新建一个“ADO.NET 实体数据模型” 但是一直报错:未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11. ...
- 42 Bing Search Engine Hacks
42 Bing Search Engine Hacks November 13, 2010 By Ivan Remember Bing, the search engine Microsoft lau ...
随机推荐
- 如果你只会JQuery的插件式开发, 那么你可以进来看看?
对于JQuery的学习,已经有3年多的时间了,直到去年与我的组长一起做项目,看到他写的JS,确实特别漂亮,有时甚至还看不太懂, 我才发现其实我不太会JQuery.所以我有时间就会去看看他写的JS代码, ...
- #include<unistd.h>头文件的理解
1.百度百科定义 unistd.h 是 C 和 C++ 程序设计语言中提供对 POSIX 操作系统 API 的访问功能的头文件的名称.该头文件由 POSIX.1 标准(单一UNIX规范的基础)提出,故 ...
- Java单例模式和volatile关键字
单例模式是最简单的设计模式,实现也非常"简单".一直以为我写没有问题,直到被 Coverity 打脸. 1. 暴露问题 前段时间,有段代码被 Coverity 警告了,简化一下代码 ...
- Android 嵌套GridView,ListView只显示一行的解决办法
重写ListView.GridView即可: public class MyListView extends ListView { public MyListView(Context context) ...
- DevExpress 中根据数据库字典动态生成卡式菜单 z
第三方的Devexpress套件因为要使用权限机制控制不同用户进入系统显示菜单所以要配合字典数据动态生成.在WEB中这种问题灰常的轻松在winform里就稍微有点不同为了用DEV实现卡式菜单有组的概念 ...
- IOCP模型
IOCP http://blog.csdn.net/zhongguoren666/article/details/7386592 Winsock IO模型之IOCP模型 http://blog.csd ...
- 《Python核心编程》 第八章 条件和循环
8–1.条件语句. 请看下边的代码 # statement A if x > 0: # statement B pass elif x < 0: # statement C pass el ...
- js中的String数据类型
string中包含一些特殊的字符字面量,又叫转义序列,\n 意思是换行,\t 意为制表,\b意为空格,\r回车,\\斜杠. 1.ECMAScript中字符串是不可变的. 2.转换字符串的方法:toSt ...
- MFC特定函数的应用20160720(SystemParametersInfo,GetWindowRect,WriteProfileString,GetSystemMetrics)
1.SystemParametersInfo函数可以获取和设置数量众多的windows系统参数 MFC中可以用 SystemParametersInfo(……) 函数来获取和设置系统信息,如下面例子所 ...
- Bias/variance tradeoff
线性回归中有欠拟合与过拟合,例如下图: 则会形成欠拟合, 则会形成过拟合. 尽管五次多项式会精确的预测训练集中的样本点,但在预测训练集中没有的数据,则不能很好的预测,也就是说有较大的泛化误差,上面的右 ...