利用wikipedia 的API实现对其内容的查询
wikipedia提供了api可以供我们对其内容进行操作。其API文档地址为:
http://en.wikipedia.org/w/api.php
列举一些常见用法:
1、全文搜索
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=fluoxetine
srsearch为要检索的内容
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <searchinfo totalhits="224" />
- <search>
- <p ns="0" title="Fluoxetine" snippet="<span class='searchmatch'>Fluoxetine</span> (also known by the tradenames Prozac, Sarafem) is an antidepressant of the selective serotonin reuptake inhibitor (SSRI) class <b>...</b> " size="53978" wordcount="7052" timestamp="2010-10-31T23:22:00Z" />
- <p ns="0" title="Olanzapine/fluoxetine" snippet="The drug combination olanzapine/<span class='searchmatch'>fluoxetine</span> (trade name Symbyax, created by Eli Lilly and Company ) is a single capsule containing the <b>...</b> " size="5703" wordcount="629" timestamp="2010-09-21T09:10:34Z" />
- <p ns="0" title="Sertraline" snippet="Evidence suggests that sertraline may work better than <span class='searchmatch'>fluoxetine</span> (Prozac) for some subtypes of depression. Sertraline is highly <b>...</b> " size="104510" wordcount="13933" timestamp="2010-10-28T22:13:04Z" />
- <p ns="0" title="Antidepressant" snippet="The first such compound to be patented was zimelidine in 1971, while the first released clinically was indalpine . <span class='searchmatch'>Fluoxetine</span> was <b>...</b> " size="128712" wordcount="17532" timestamp="2010-10-30T08:05:06Z" />
- <p ns="0" title="Selective serotonin reuptake inhibitor" snippet="four newer antidepressants (including the SSRIs paroxetine and <span class='searchmatch'>fluoxetine</span> , and two non-SSRI antidepressants nefazodone and venlafaxine ). <b>...</b> " size="78327" wordcount="10398" timestamp="2010-11-01T00:11:30Z" />
- <p ns="0" title="Paroxetine" snippet="Unlike two other popular SSRI antidepressants, <span class='searchmatch'>fluoxetine</span> and sertraline , paroxetine is associated with clinically significant weight <b>...</b> " size="48886" wordcount="6491" timestamp="2010-10-31T23:11:12Z" />
- <p ns="0" title="Venlafaxine" snippet="Its efficacy is similar to or better than sertraline (Zoloft) and <span class='searchmatch'>fluoxetine</span> (Prozac), depending on the criteria and rating scales used <b>...</b> " size="49655" wordcount="6574" timestamp="2010-11-01T00:38:00Z" />
- <p ns="0" title="Olanzapine" snippet="Olanzapine (trade names Zyprexa, Zalasta, Zolafren, Olzapin, Oferta, Zypadhera or in combination with <span class='searchmatch'>fluoxetine</span> Symbyax ) is an atypical <b>...</b> " size="34028" wordcount="4540" timestamp="2010-10-30T17:45:42Z" />
- <p ns="0" title="Prozac (disambiguation)" snippet="Prozac is a proprietary name for the antidepressant drug <span class='searchmatch'>fluoxetine</span>. Prozac may also refer to: Prozac+ , an Italian punk band <b>...</b> " size="581" wordcount="78" timestamp="2010-04-23T20:24:31Z" />
- <p ns="0" title="SSRI discontinuation syndrome" snippet="paroxetine having the highest number of withdrawal syndrome reports and <span class='searchmatch'>fluoxetine</span> the highest number of drug dependence reports; the note <b>...</b> " size="41099" wordcount="5444" timestamp="2010-09-23T06:19:55Z" />
- </search>
- </query>
- <query-continue>
- <search sroffset="10" />
- </query-continue>
- </api>
2、列举wikipedia 的 category:
http://en.wikipedia.org/w/api.php?action=query&list=allcategories&acprefix=drug&aclimit=10
返回10条以drug开头的category;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <allcategories>
- <c xml:space="preserve">Drug-induced Suicide</c>
- <c xml:space="preserve">Drug-realted suicides</c>
- <c xml:space="preserve">Drug-related Films</c>
- <c xml:space="preserve">Drug-related Suicides</c>
- <c xml:space="preserve">Drug-related death in California</c>
- <c xml:space="preserve">Drug-related deaths</c>
- <c xml:space="preserve">Drug-related deaths by country</c>
- <c xml:space="preserve">Drug-related deaths in Alabama</c>
- <c xml:space="preserve">Drug-related deaths in Alaska</c>
- <c xml:space="preserve">Drug-related deaths in Arizona</c>
- </allcategories>
- </query>
- <query-continue>
- <allcategories acfrom="Drug-related deaths in Arkansas" />
- </query-continue>
- </api>
3、返回具有相应title页面的timestamp|user|comment|content 信息;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <pages>
- <page pageid="27697087" ns="0" title="API">
- <revisions>
- <rev user="Graham87" timestamp="2010-06-13T08:41:17Z" comment="Protected API: restore protection ([edit=sysop] (indefinite) [move=sysop] (indefinite))" xml:space="preserve">#REDIRECT [[Application programming interface]]{{R from abbreviation}}</rev>
- </revisions>
- </page>
- </pages>
- </query>
- </api>
4、解析页面:
http://en.wikipedia.org/w/api.php?action=parse&format=xml&page=fluoxetine
用上面的查询返回的[content]是wikipedia的标记格式,这个api返回的是html格式的文本:
可以用xpath="api/parse/text" 返回html内容。
* action=parse *
This module parses wikitext and returns parser output
This module requires read rights.
Parameters:
title - Title of page the text belongs to
Default: API
text - Wikitext to parse
summary - Summary to parse
page - Parse the content of this page. Cannot be used together with text and title
redirects - If the page parameter is set to a redirect, resolve it
oldid - Parse the content of this revision. Overrides page
prop - Which pieces of information to get.
NOTE: Section tree is only generated if there are more than 4 sections, or if the __TOC__ keyword is present
Values (separate with '|'): text, langlinks, categories, links, templates, images, externallinks, sections, revid, displaytitle, headitems, headhtml
Default: text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle
pst - Do a pre-save transform on the input before parsing it.
Ignored if page or oldid is used.
onlypst - Do a PST on the input, but don't parse it.
Returns PSTed wikitext. Ignored if page or oldid is used.
Example:
api.php?action=parse&text={{Project:Sandbox}}
来源:http://john2007.iteye.com/blog/800446
利用wikipedia 的API实现对其内容的查询的更多相关文章
- 利用百度地图API实现地址和经纬度互换查询
import json import requests def baiduMap(input_para): headers = { 'User-Agent': 'Mozilla/5.0 (Window ...
- 利用百度词典API和Volley网络库开发的android词典应用
关于百度词典API的说明,地址在这里:百度词典API介绍 关于android网络库Volley的介绍说明,地址在这里:Android网络通信库Volley 首先我们看下大体的界面布局!
- 利用Google Speech API实现Speech To Text
很久很久以前, 网上流传着一个免费的,识别率暴高的,稳定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的时候,总是返回500 Error. 后来 ...
- 利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)
利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)作者:Tuuzed(土仔) 发表于:2008年3月3日23:12:38 版权声明:可以任意转载,转载时请 ...
- 【百度地图API】建立全国银行位置查询系统(五)——如何更改百度地图的信息窗口内容?
原文:[百度地图API]建立全国银行位置查询系统(五)--如何更改百度地图的信息窗口内容? 摘要: 酷讯.搜房.去哪儿网等大型房产.旅游酒店网站,用的是百度的数据库,却显示了自定义的信息窗口内容,这是 ...
- ASP.NET Core Web APi获取原始请求内容
前言 我们讲过ASP.NET Core Web APi路由绑定,本节我们来讲讲如何获取客户端请求过来的内容. ASP.NET Core Web APi捕获Request.Body内容 [HttpPos ...
- 利用WordPress REST API 开发微信小程序从入门到放弃
自从我发布并开源WordPress版微信小程序以来,很多WordPress网站的站长问有关程序开发的问题,其实在文章:<用微信小程序连接WordPress网站>讲述过一些基本的要点,不过仍 ...
- 利用百度翻译API,获取翻译结果
利用百度翻译API,获取翻译结果 translate.py #!/usr/bin/python #-*- coding:utf-8 -*- import sys reload(sys) sys.set ...
- 白话SpringCloud | 第十一章:路由网关(Zuul):利用swagger2聚合API文档
前言 通过之前的两篇文章,可以简单的搭建一个路由网关了.而我们知道,现在都奉行前后端分离开发,前后端开发的沟通成本就增加了,所以一般上我们都是通过swagger进行api文档生成的.现在由于使用了统一 ...
随机推荐
- EXCEL 2010学习笔记 —— 数据透视表
今天整理一下EXCEL2010 数据透视表的课程笔记,数据透视表可以对多组数据进行统计和整理,是一种基本的数据可视化工具. 记录6个方面的总结: 1.创建数据透视表 2.更改数据透视表的汇总方式 3. ...
- Android 自动安装脚本
建立一个install.bat,写入下面 adb install -r %1PAUSE 把apk拖拽到install.bat上
- windows 下搭建简易nginx+PHP环境
2016年11月19日 14:40:16 星期六 官网下载 nginx, php windows下的源码包(windows下不用安装, 解压即可) 修改配置文件, (稍后补上) 路径如下: 启动脚本: ...
- 内存动态分配之realloc(),malloc(),calloc()与new运算符
1,malloc与free是C/C++的标准库函数,new/delete是C++的运算符,是C++面向对象的特征,它们都可用于申请动态内存和释放内存.2,对于非内部数据类型的对象而言,光用maloc/ ...
- 第三天--html列表
<!Doctype html><html> <head> <meta charset="utf-8"> ...
- Echarts 3.19 制作常用的图形 非静态
最近阿里内部使用的 图表也向外开放了 而百度就好像更有良心一点,Echarts 早就开放了 . 自己学Echarts的时候走了很多的弯路,毕竟谁让自己菜呢,多撞几次南墙才晓得疼 才知道学习方法,新手上 ...
- Java笔记:文件夹操作
创建目录: File类中有两个方法可以用来创建文件夹: mkdir( )方法创建一个文件夹,成功则返回true,失败则返回false.失败表明File对象指定的路径已经存在,或者由于整个路径还不存在, ...
- 悬浮TabBar的实现--此段代码来自网络
悬浮TabBar的实现 这个TabBar看着像是用自定义TabBar做的,但事实上它还是用的系统的TabBar,给系统的tabBar.backgroundImage设置一张设计好的背景图片. TabB ...
- WPF 如何绘制不规则按钮,并且有效点击范围也是不规则的
最近在做一个东西,如地图,点击地图上的某一区域,这一区域需要填充成其他颜色.区域是不规则的,而且点击该区域的任一点,都能够变色.普通的按钮只是简单的加载一幅图肯定是不行的.查了很多资料,终于把它搞定了 ...
- Mac eclipse配置Python环境
1.给Eclipse安装PyDev插件 第一次启动会让选择一个工作空间,按缺省设置,勾选一下不再提醒,就可以了.在Help菜单中,选择Install New Software···, 选择Add按钮, ...