[0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".
Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.
First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".
Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.
So, what are the contents appropriate to children?
Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)
1.Notion -- encyclopedia
2.Material -- picture, music, video
3.Entertainment -- game
4.Study -- homework, knowledge
Moreover, there are some kinds of content that children don't need:
1.advertisement
2.adult(mature) content
3.sexual or homosexual content
4.sidebar(ad. or adult content or useless for children mostly)
Now that we have known what children need, what we should do next is to tackle them one by one.
What the technology we will use?
After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.
Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.
More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.
[0.0]Analysis of Baidu search engine的更多相关文章
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- dotnet cli 5.0 新特性——dotnet tool search
dotnet cli 5.0 新特性--dotnet tool search Intro .NET 5.0 SDK 的发布,给 dotnet cli 引入了一个新的特性,dotnet tool sea ...
- 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。
周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...
- Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation Ethical Hacking Boot Camp OUR MOST POPULAR COURSE! CLI ...
- Known BREAKING CHANGES from NH3.3.3.GA to 4.0.0
Build 4.0.0.Alpha1 ============================= ** Known BREAKING CHANGES from NH3.3.3.GA to 4.0. ...
- Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models 2019-06-13 10:2 ...
- 未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11.0.0.0, Culture=neutral, PublicKeyToken...
刚开始看老师 用VS新建一个“ADO.NET 实体数据模型” 但是一直报错:未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11. ...
- 42 Bing Search Engine Hacks
42 Bing Search Engine Hacks November 13, 2010 By Ivan Remember Bing, the search engine Microsoft lau ...
随机推荐
- 搜集的一些RTMP项目,有Server端也有Client端
查询一些RTMP的协议封装时找到了一些RTMP开源项目,在这里列举一下,以后有时间或是有兴趣可以参考一下: just very few of them. Red5 only contains a se ...
- Chrome 实用调试技巧
Chrome 实用调试技巧 2016-07-23 如今Chrome浏览器无疑是最受前端青睐的工具,原因除了界面简洁.大量的应用插件,良好的代码规范支持.强大的V8解释器之外,还因为Chrome开发者工 ...
- 基于Fragment实现Tab的切换,滑出侧边栏
最近在学习Fragment(碎片)这是android3.0以后提出的概念,很多pad上面的设置部分都是通过Fragment来实现的,先看看具体的效果吧(图一) (图二) (图三)第一章图片是初始时的 ...
- dell optiplex台式机 安装win7 清楚分区的方法
http://jingyan.baidu.com/article/92255446e1065f851648f42b.html
- webdriver(python)学习笔记二
自己开始一个脚本开始学习: # coding = utf-8 from selenium import webdriver browser = webdriver.Firefox() browser. ...
- UVA 10529-Dumb Bones(概率dp)
题意: 给出放一个多米诺骨牌,向左向右倒的概率,求要放好n个骨牌,需要放置的骨牌的期望次数. 分析: 用到区间dp的思想,如果一个位置的左面右面骨牌都已放好,考虑,放中间的情况, dp[i]表示放好前 ...
- Android应用启动时间及启动日志获取方法
1. Android应用中,可以使用如下方式进行应用启动时间的查看 2. 启动日志获取方法:
- Allegro从.brd文件中导出器件封装
打开.brd文件,File→Export→Libraries,除了No libraries dependencies之外,所有选项都勾选上,设定好存放路径之后,Export. 注意事项: 1. 一般的 ...
- 【Windows核心编程】重载类成员函数new / new[] / delete / delete[]
// Heap.cpp : 定义控制台应用程序的入口点. // #include "stdafx.h" #include <Windows.h> #include &l ...
- DX11&C++