利用wikipedia 的API实现对其内容的查询
wikipedia提供了api可以供我们对其内容进行操作。其API文档地址为:
http://en.wikipedia.org/w/api.php
列举一些常见用法:
1、全文搜索
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=fluoxetine
srsearch为要检索的内容
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <searchinfo totalhits="224" />
- <search>
- <p ns="0" title="Fluoxetine" snippet="<span class='searchmatch'>Fluoxetine</span> (also known by the tradenames Prozac, Sarafem) is an antidepressant of the selective serotonin reuptake inhibitor (SSRI) class <b>...</b> " size="53978" wordcount="7052" timestamp="2010-10-31T23:22:00Z" />
- <p ns="0" title="Olanzapine/fluoxetine" snippet="The drug combination olanzapine/<span class='searchmatch'>fluoxetine</span> (trade name Symbyax, created by Eli Lilly and Company ) is a single capsule containing the <b>...</b> " size="5703" wordcount="629" timestamp="2010-09-21T09:10:34Z" />
- <p ns="0" title="Sertraline" snippet="Evidence suggests that sertraline may work better than <span class='searchmatch'>fluoxetine</span> (Prozac) for some subtypes of depression. Sertraline is highly <b>...</b> " size="104510" wordcount="13933" timestamp="2010-10-28T22:13:04Z" />
- <p ns="0" title="Antidepressant" snippet="The first such compound to be patented was zimelidine in 1971, while the first released clinically was indalpine . <span class='searchmatch'>Fluoxetine</span> was <b>...</b> " size="128712" wordcount="17532" timestamp="2010-10-30T08:05:06Z" />
- <p ns="0" title="Selective serotonin reuptake inhibitor" snippet="four newer antidepressants (including the SSRIs paroxetine and <span class='searchmatch'>fluoxetine</span> , and two non-SSRI antidepressants nefazodone and venlafaxine ). <b>...</b> " size="78327" wordcount="10398" timestamp="2010-11-01T00:11:30Z" />
- <p ns="0" title="Paroxetine" snippet="Unlike two other popular SSRI antidepressants, <span class='searchmatch'>fluoxetine</span> and sertraline , paroxetine is associated with clinically significant weight <b>...</b> " size="48886" wordcount="6491" timestamp="2010-10-31T23:11:12Z" />
- <p ns="0" title="Venlafaxine" snippet="Its efficacy is similar to or better than sertraline (Zoloft) and <span class='searchmatch'>fluoxetine</span> (Prozac), depending on the criteria and rating scales used <b>...</b> " size="49655" wordcount="6574" timestamp="2010-11-01T00:38:00Z" />
- <p ns="0" title="Olanzapine" snippet="Olanzapine (trade names Zyprexa, Zalasta, Zolafren, Olzapin, Oferta, Zypadhera or in combination with <span class='searchmatch'>fluoxetine</span> Symbyax ) is an atypical <b>...</b> " size="34028" wordcount="4540" timestamp="2010-10-30T17:45:42Z" />
- <p ns="0" title="Prozac (disambiguation)" snippet="Prozac is a proprietary name for the antidepressant drug <span class='searchmatch'>fluoxetine</span>. Prozac may also refer to: Prozac+ , an Italian punk band <b>...</b> " size="581" wordcount="78" timestamp="2010-04-23T20:24:31Z" />
- <p ns="0" title="SSRI discontinuation syndrome" snippet="paroxetine having the highest number of withdrawal syndrome reports and <span class='searchmatch'>fluoxetine</span> the highest number of drug dependence reports; the note <b>...</b> " size="41099" wordcount="5444" timestamp="2010-09-23T06:19:55Z" />
- </search>
- </query>
- <query-continue>
- <search sroffset="10" />
- </query-continue>
- </api>
2、列举wikipedia 的 category:
http://en.wikipedia.org/w/api.php?action=query&list=allcategories&acprefix=drug&aclimit=10
返回10条以drug开头的category;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <allcategories>
- <c xml:space="preserve">Drug-induced Suicide</c>
- <c xml:space="preserve">Drug-realted suicides</c>
- <c xml:space="preserve">Drug-related Films</c>
- <c xml:space="preserve">Drug-related Suicides</c>
- <c xml:space="preserve">Drug-related death in California</c>
- <c xml:space="preserve">Drug-related deaths</c>
- <c xml:space="preserve">Drug-related deaths by country</c>
- <c xml:space="preserve">Drug-related deaths in Alabama</c>
- <c xml:space="preserve">Drug-related deaths in Alaska</c>
- <c xml:space="preserve">Drug-related deaths in Arizona</c>
- </allcategories>
- </query>
- <query-continue>
- <allcategories acfrom="Drug-related deaths in Arkansas" />
- </query-continue>
- </api>
3、返回具有相应title页面的timestamp|user|comment|content 信息;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <pages>
- <page pageid="27697087" ns="0" title="API">
- <revisions>
- <rev user="Graham87" timestamp="2010-06-13T08:41:17Z" comment="Protected API: restore protection ([edit=sysop] (indefinite) [move=sysop] (indefinite))" xml:space="preserve">#REDIRECT [[Application programming interface]]{{R from abbreviation}}</rev>
- </revisions>
- </page>
- </pages>
- </query>
- </api>
4、解析页面:
http://en.wikipedia.org/w/api.php?action=parse&format=xml&page=fluoxetine
用上面的查询返回的[content]是wikipedia的标记格式,这个api返回的是html格式的文本:
可以用xpath="api/parse/text" 返回html内容。
* action=parse *
This module parses wikitext and returns parser output
This module requires read rights.
Parameters:
title - Title of page the text belongs to
Default: API
text - Wikitext to parse
summary - Summary to parse
page - Parse the content of this page. Cannot be used together with text and title
redirects - If the page parameter is set to a redirect, resolve it
oldid - Parse the content of this revision. Overrides page
prop - Which pieces of information to get.
NOTE: Section tree is only generated if there are more than 4 sections, or if the __TOC__ keyword is present
Values (separate with '|'): text, langlinks, categories, links, templates, images, externallinks, sections, revid, displaytitle, headitems, headhtml
Default: text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle
pst - Do a pre-save transform on the input before parsing it.
Ignored if page or oldid is used.
onlypst - Do a PST on the input, but don't parse it.
Returns PSTed wikitext. Ignored if page or oldid is used.
Example:
api.php?action=parse&text={{Project:Sandbox}}
来源:http://john2007.iteye.com/blog/800446
利用wikipedia 的API实现对其内容的查询的更多相关文章
- 利用百度地图API实现地址和经纬度互换查询
import json import requests def baiduMap(input_para): headers = { 'User-Agent': 'Mozilla/5.0 (Window ...
- 利用百度词典API和Volley网络库开发的android词典应用
关于百度词典API的说明,地址在这里:百度词典API介绍 关于android网络库Volley的介绍说明,地址在这里:Android网络通信库Volley 首先我们看下大体的界面布局!
- 利用Google Speech API实现Speech To Text
很久很久以前, 网上流传着一个免费的,识别率暴高的,稳定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的时候,总是返回500 Error. 后来 ...
- 利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)
利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)作者:Tuuzed(土仔) 发表于:2008年3月3日23:12:38 版权声明:可以任意转载,转载时请 ...
- 【百度地图API】建立全国银行位置查询系统(五)——如何更改百度地图的信息窗口内容?
原文:[百度地图API]建立全国银行位置查询系统(五)--如何更改百度地图的信息窗口内容? 摘要: 酷讯.搜房.去哪儿网等大型房产.旅游酒店网站,用的是百度的数据库,却显示了自定义的信息窗口内容,这是 ...
- ASP.NET Core Web APi获取原始请求内容
前言 我们讲过ASP.NET Core Web APi路由绑定,本节我们来讲讲如何获取客户端请求过来的内容. ASP.NET Core Web APi捕获Request.Body内容 [HttpPos ...
- 利用WordPress REST API 开发微信小程序从入门到放弃
自从我发布并开源WordPress版微信小程序以来,很多WordPress网站的站长问有关程序开发的问题,其实在文章:<用微信小程序连接WordPress网站>讲述过一些基本的要点,不过仍 ...
- 利用百度翻译API,获取翻译结果
利用百度翻译API,获取翻译结果 translate.py #!/usr/bin/python #-*- coding:utf-8 -*- import sys reload(sys) sys.set ...
- 白话SpringCloud | 第十一章:路由网关(Zuul):利用swagger2聚合API文档
前言 通过之前的两篇文章,可以简单的搭建一个路由网关了.而我们知道,现在都奉行前后端分离开发,前后端开发的沟通成本就增加了,所以一般上我们都是通过swagger进行api文档生成的.现在由于使用了统一 ...
随机推荐
- 项目vue2.0仿外卖APP(二)
vue-cli开启vue.js项目 github地址:https://github.com/vuejs/vue-cli Vue.js开发利器vue-cli,是vue的脚手架工具. 在工地上,脚手架是工 ...
- Javascript模块化编程(一):模块的写法
Javascript模块化编程(一):模块的写法 作者: 阮一峰 原文链接:http://www.ruanyifeng.com/blog/2012/10/javascript_module.html ...
- matlab更改打开时候默认路径
每次打开matlab都会的修改默认路径,是一件有些烦恼的事情.所以,就想尝试更改默认路径 方法如下: 1.在matlab安装目录,找到toolbox文件夹,打开local文件件,打开matlabrc. ...
- 解决Mysql连接池被关闭 ,hibernate尝试连接不能连接的问题。 (默认mysql连接池可以访问的时间为8小时,如果超过8小时没有连接,mysql会自动关闭连接池。系统发布第二天访问链接关闭问题。
解决Mysql连接池被关闭 ,hibernate尝试连接不能连接的问题. (默认MySQL连接池可以访问的时间为8小时,如果超过8小时没有连接,mysql会自动关闭连接池. 所以系统发布第二天访问会 ...
- 转载:《.NET 编程结构》专题汇总(C#)
<.NET 编程结构>专题汇总(C#) - M守护神 - 博客园http://www.cnblogs.com/liusuqi/p/3213597.html 前言 掌握一门技术,首要 ...
- 升级python
一开始有这个需求,是因为用 YaH3C 替代 iNode 进行校园网认证时,一直编译错误,提示找不到 Python 的某个模块,百度了一下,此模块是在 Python2.7 以上才有的,但是系统的自带的 ...
- es6要用严格模式
实验let的块级作用域,在sublime的Tools--Babel--Babel Transform检测未出现错误,在html中也未出现错误,唯在控制台中一直报错. //js名为es6.js ---* ...
- microsoft office professional plus2013激活
激活工具一般使用KMS8,KMS8不支持零售版的激活, 而office professional plus2013零售版,需要先转化为VOL版 需要以下两步: 1.将word转化为vol版 链接: h ...
- windows
1.拷贝远程文件 net use \\10.130.80.62\ipc$ 密码 /user:用户名 xcopy "\\10.130.80.62\G$\yt\apache-tomcat-7.0 ...
- Lintcode 75.寻找峰值
--------------------------------------- 按照给定的峰值定义,峰值的左半部分一定是递增的,所以只要找到不递增的即可. AC代码: class Solution { ...