Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".

Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.

First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".

Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.

So, what are the contents appropriate to children?

Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)

1.Notion -- encyclopedia

2.Material -- picture, music, video

3.Entertainment -- game

4.Study -- homework, knowledge

Moreover, there are some kinds of content that children don't need:

1.advertisement

2.adult(mature) content

3.sexual or homosexual content

4.sidebar(ad. or adult content or useless for children mostly)

Now that we have known what children need, what we should do next is to tackle them one by one.

What the technology we will use?

After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.

Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.

More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.

[0.0]Analysis of Baidu search engine的更多相关文章

  1. 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩

    快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...

  2. Iveely Search Engine 0.4.0 的发布

    千呼万唤始出来,Iveely Search Engine 0.4.0 的发布   经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...

  3. dotnet cli 5.0 新特性——dotnet tool search

    dotnet cli 5.0 新特性--dotnet tool search Intro .NET 5.0 SDK 的发布,给 dotnet cli 引入了一个新的特性,dotnet tool sea ...

  4. 微软的一篇ctr预估的论文:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine。

    周末看了一下这篇论文,觉得挺难的,后来想想是ICML的论文,也就明白为什么了. 先简单记录下来,以后会继续添加内容. 主要参考了论文Web-Scale Bayesian Click-Through R ...

  5. Search Engine Hacking – Manual and Automation

    Search Engine Hacking – Manual and Automation Ethical Hacking Boot Camp OUR MOST POPULAR COURSE! CLI ...

  6. Known BREAKING CHANGES from NH3.3.3.GA to 4.0.0

    Build 4.0.0.Alpha1 =============================   ** Known BREAKING CHANGES from NH3.3.3.GA to 4.0. ...

  7. Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models

    Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models  2019-06-13 10:2 ...

  8. 未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11.0.0.0, Culture=neutral, PublicKeyToken...

    刚开始看老师 用VS新建一个“ADO.NET 实体数据模型” 但是一直报错:未能加载文件或程序集“Microsoft.SqlServer.Management.Sdk.Sfc, Version=11. ...

  9. 42 Bing Search Engine Hacks

    42 Bing Search Engine Hacks November 13, 2010 By Ivan Remember Bing, the search engine Microsoft lau ...

随机推荐

  1. Relativelayout属性

    // 相对于给定ID控件 android:layout_above 将该控件的底部置于给定ID的控件之上; android:layout_below 将该控件的底部置于给定ID的控件之下; andro ...

  2. [转] jQuery Infinite Ajax Scroll(ias) 分页插件介绍

    原文链接:http://justflyhigh.com/index.php/articlec/index/index.php?s=content&m=aticle&id=91 Infi ...

  3. cocoapods 终极方案

    最近各种错误, 全部刷新 再说 sudo gem install -n /usr/local/bin cocoapods $ sudo gem update --system // 先更新gem $ ...

  4. java 访问器方法中对象引用的问题

    "注意不要编写返回引用可变对象的访问器方法".因为会破坏类的封装性,引用的内容可能会被改变,产生业务逻辑上的错误. 什么是可变对象? 先要搞清楚java中值传递和引用传递的问题,总结如下: 1.对象就 ...

  5. 【转】Github轻松上手1-Git的工作原理与设置

    转自:http://blog.sina.com.cn/s/blog_4b55f6860100zzgp.html 作为一个程序猿,如果没有接触过stack overflow和Github,就如同在江湖中 ...

  6. Delphi打开窗体时报"Corrupt Portfolio Stream"

      今天在打开一个Delphi窗体时报了这么一个错误: Corrupt Portfolio Stream 查了一下,主要是由于Delphi窗体的*.ddp文件损坏引起的. 解决方法: 删除.ddp 文 ...

  7. ASP.NET 经典60道面试题

    转:http://bbs.chinaunix.net/thread-4065577-1-1.html ASP.NET 经典60道面试题 1. 简述 private. protected. public ...

  8. golang windows程序获取管理员权限(UAC ) via gocn

    golang windows程序获取管理员权限(UAC ) 在windows上执行有关系统设置命令的时候需要管理员权限才能操作,比如修改网卡的禁用.启用状态.双击执行是不能正确执行命令的,只有右键以管 ...

  9. CXF之八 RESTFul服务

    JAX-RS概述 JAX-RS是Java提供用于开发RESTful Web服务基于注解(annotation)的API.JAX-RS旨在定义一个统一的规范,使得Java程序员可以使用一套固定的接口来开 ...

  10. 二分+最短路 uvalive 3270 Simplified GSM Network(推荐)

    // 二分+最短路 uvalive 3270 Simplified GSM Network(推荐) // 题意:已知B(1≤B≤50)个信号站和C(1≤C≤50)座城市的坐标,坐标的绝对值不大于100 ...