[0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".
Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.
First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".
Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.
So, what are the contents appropriate to children?
Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)
1.Notion -- encyclopedia
2.Material -- picture, music, video
3.Entertainment -- game
4.Study -- homework, knowledge
Moreover, there are some kinds of content that children don't need:
1.advertisement
2.adult(mature) content
3.sexual or homosexual content
4.sidebar(ad. or adult content or useless for children mostly)
Now that we have known what children need, what we should do next is to tackle them one by one.
What the technology we will use?
After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.
Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.
More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.
[0.0]Analysis of Baidu search engine的更多相关文章
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- dotnet cli 5.0 新特性——dotnet tool search
dotnet cli 5.0 新特性--dotnet tool search Intro .NET 5.0 SDK 的发布,给 dotnet cli 引入了一个新的特性,dotnet tool sea ...
- 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。
周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...
- Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation Ethical Hacking Boot Camp OUR MOST POPULAR COURSE! CLI ...
- Known BREAKING CHANGES from NH3.3.3.GA to 4.0.0
Build 4.0.0.Alpha1 ============================= ** Known BREAKING CHANGES from NH3.3.3.GA to 4.0. ...
- Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models 2019-06-13 10:2 ...
- 未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11.0.0.0, Culture=neutral, PublicKeyToken...
刚开始看老师 用VS新建一个“ADO.NET 实体数据模型” 但是一直报错:未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11. ...
- 42 Bing Search Engine Hacks
42 Bing Search Engine Hacks November 13, 2010 By Ivan Remember Bing, the search engine Microsoft lau ...
随机推荐
- 原创: 做一款属于自己风格的音乐播放器 (HTML5的Audio新特性)
灵感的由来是前些天看到了博: http://www.cnblogs.com/li-cheng 的首页有一个很漂亮的播放器,感觉很不错,是用Flex做的Flash播放器. 于是我也便想到了,自己也来来弄 ...
- [反汇编练习] 160个CrackMe之007
[反汇编练习] 160个CrackMe之007. 本系列文章的目的是从一个没有任何经验的新手的角度(其实就是我自己),一步步尝试将160个CrackMe全部破解,如果可以,通过任何方式写出一个类似于注 ...
- asp.net输出docx文档出现【文件已损坏 无法打开】问题的解决方案
在某个项目中,有个需求需要将一些附件文档以字节流的形式直接存储在数据库中. 功能实现后,尝试过很多格式文件的上传下载处理,均未发现问题, 唯独在下载docx格式文件后,一打开文件就提示: “无法打开文 ...
- 【转】iPhone屏幕尺寸、分辨率及适配
原文网址:http://blog.csdn.net/phunxm/article/details/42174937 1.iPhone尺寸规格 设备 iPhone 宽 Width 高 Height 对角 ...
- Ruby窗口程序
require 'tk' tkroot=TkRoot.new { title 'hellw word' geometry '300x200' } lb = TkLabel.new(tkroot) do ...
- 【转】从INF文件认识驱动
在工控机安装xp操作系统时,由于工控机的集成显卡驱动只支持win7,之前没接触过windows驱动相关内容,折腾了半天.下载的驱动是exe的,双击安装就提示安装失败(未签名) 上图是网上随便找的,现象 ...
- poj 1651 http://poj.org/problem?id=1651
http://poj.org/problem?id=1651Multiplication Puzzle Time Limit: 1000MS Memory Limit: 65536K To ...
- 基于MFC和opencv的FFT
在网上折腾了一阵子,终于把这个程序写好了,程序是基于MFC的,图像显示的部分和获取图像的像素点是用到了opencv的一些函数,不过FFT算法没有用opencv的(呵呵,老师不让),网上的二维的FFT程 ...
- motan解读:添加spring 支持
代码位置: motan-core的目录下 motan中使用spring管理配置对象.motan利用Spring的spi机制创建了自定义标签和相应的标签处理代码.具体使用方法见这篇.本文以m ...
- web服务器分析与设计(三)
面向对象分析与设计第二步:健壮性分析,完善对象 通过上一篇的分析,已经得到了构建系统中最重要的对象-----实体对象,它们封装着构成系统最重要的数据,实体数据是系统的生命. 但是光有实体还系统是运转不 ...