Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".

Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.

First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".

Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.

So, what are the contents appropriate to children?

Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)

1.Notion -- encyclopedia

2.Material -- picture, music, video

3.Entertainment -- game

4.Study -- homework, knowledge

Moreover, there are some kinds of content that children don't need:

1.advertisement

2.adult(mature) content

3.sexual or homosexual content

4.sidebar(ad. or adult content or useless for children mostly)

Now that we have known what children need, what we should do next is to tackle them one by one.

What the technology we will use?

After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.

Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.

More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.

[0.0]Analysis of Baidu search engine的更多相关文章

  1. 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩

    快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...

  2. Iveely Search Engine 0.4.0 的发布

    千呼万唤始出来,Iveely Search Engine 0.4.0 的发布   经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...

  3. dotnet cli 5.0 新特性——dotnet tool search

    dotnet cli 5.0 新特性--dotnet tool search Intro .NET 5.0 SDK 的发布,给 dotnet cli 引入了一个新的特性,dotnet tool sea ...

  4. 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。

    周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...

  5. Search Engine Hacking – Manual and Automation

    Search Engine Hacking – Manual and Automation Ethical Hacking Boot Camp OUR MOST POPULAR COURSE! CLI ...

  6. Known BREAKING CHANGES from NH3.3.3.GA to 4.0.0

    Build 4.0.0.Alpha1 =============================   ** Known BREAKING CHANGES from NH3.3.3.GA to 4.0. ...

  7. Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models

    Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models  2019-06-13 10:2 ...

  8. 未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11.0.0.0, Culture=neutral, PublicKeyToken...

    刚开始看老师 用VS新建一个“ADO.NET 实体数据模型” 但是一直报错:未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11. ...

  9. 42 Bing Search Engine Hacks

    42 Bing Search Engine Hacks November 13, 2010 By Ivan Remember Bing, the search engine Microsoft lau ...

随机推荐

  1. Toast 用于一个页面有多个提示

    private Toast mToast; 2 初始化 mToast = Toast.makeText(this,"",Toast.LENGTH_SHORT); 3 方法 priv ...

  2. linux kernel 模块多文件编译

    /*************************************************************************** * linux kernel 模块多文件编译 ...

  3. 私有pod简记

    http://www.jianshu.com/p/7a82e977281c http://www.jianshu.com/p/ddc2490bff9f 两个工程 1 代码工程 在github上创建一个 ...

  4. HWM的实验

    HWM是数据段中使用空间和未使用空间之间的界限,假如现有自由链表上的数据块不能满足需求,Oracle把HWM指向的数据块加入到自由链表上,HWM向前移动到下一个数据块.简单说,一个数据段中,HWM左边 ...

  5. mysql互为主从复制配置笔记

    MySQL-master1:192.168.72.128 MySQL-master2:192.168.72.129 OS版本:CentOS 5.4MySQL版本:5.5.9(主从复制的master和s ...

  6. 【转】Android Studio -修改LogCat的颜色*美爆了*

    原文网址:http://www.2cto.com/kf/201505/400357.html 一. 先看效果 二.设置 File->Settings 或Ctrl + Alt +S 找到 Edit ...

  7. Java应用调优指南之-工具篇

    1. 土法调优两大件 先忆苦思甜,一般人在没有Profile工具的时候,调优的两大件,无非Heap Dump 与 Thread Dump. 1.1 Heap Dump jmap -dump:live, ...

  8. Android设计模式源码解析之桥接模式

    模式介绍 模式的定义 将抽象部分与实现部分分离,使它们都可以独立的变化. 模式的使用场景 如果一个系统需要在构件的抽象化角色和具体化角色之间添加更多的灵活性,避免在两个层次之间建立静态的联系. 设计要 ...

  9. 小技巧--让JS代码只执行一次

    有时候实在是没办法,就像我这个比赛系统中,有一个弹出框,这个弹出框之外都是模糊的(这是在ajax写出弹出框时,加了一个水印). 然而遇到的问题,也是蹊跷古怪,因为这个弹出框的事件是数据查询事件,但是因 ...

  10. winform实现自动更新并动态调用form实现

    winform实现自动更新并动态调用form实现 标签: winform作业dllbytenull服务器 2008-08-04 17:36 1102人阅读 评论(0) 收藏 举报  分类: c#200 ...