利用wikipedia 的API实现对其内容的查询
wikipedia提供了api可以供我们对其内容进行操作。其API文档地址为:
http://en.wikipedia.org/w/api.php
列举一些常见用法:
1、全文搜索
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=fluoxetine
srsearch为要检索的内容
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <searchinfo totalhits="224" />
- <search>
- <p ns="0" title="Fluoxetine" snippet="<span class='searchmatch'>Fluoxetine</span> (also known by the tradenames Prozac, Sarafem) is an antidepressant of the selective serotonin reuptake inhibitor (SSRI) class <b>...</b> " size="53978" wordcount="7052" timestamp="2010-10-31T23:22:00Z" />
- <p ns="0" title="Olanzapine/fluoxetine" snippet="The drug combination olanzapine/<span class='searchmatch'>fluoxetine</span> (trade name Symbyax, created by Eli Lilly and Company ) is a single capsule containing the <b>...</b> " size="5703" wordcount="629" timestamp="2010-09-21T09:10:34Z" />
- <p ns="0" title="Sertraline" snippet="Evidence suggests that sertraline may work better than <span class='searchmatch'>fluoxetine</span> (Prozac) for some subtypes of depression. Sertraline is highly <b>...</b> " size="104510" wordcount="13933" timestamp="2010-10-28T22:13:04Z" />
- <p ns="0" title="Antidepressant" snippet="The first such compound to be patented was zimelidine in 1971, while the first released clinically was indalpine . <span class='searchmatch'>Fluoxetine</span> was <b>...</b> " size="128712" wordcount="17532" timestamp="2010-10-30T08:05:06Z" />
- <p ns="0" title="Selective serotonin reuptake inhibitor" snippet="four newer antidepressants (including the SSRIs paroxetine and <span class='searchmatch'>fluoxetine</span> , and two non-SSRI antidepressants nefazodone and venlafaxine ). <b>...</b> " size="78327" wordcount="10398" timestamp="2010-11-01T00:11:30Z" />
- <p ns="0" title="Paroxetine" snippet="Unlike two other popular SSRI antidepressants, <span class='searchmatch'>fluoxetine</span> and sertraline , paroxetine is associated with clinically significant weight <b>...</b> " size="48886" wordcount="6491" timestamp="2010-10-31T23:11:12Z" />
- <p ns="0" title="Venlafaxine" snippet="Its efficacy is similar to or better than sertraline (Zoloft) and <span class='searchmatch'>fluoxetine</span> (Prozac), depending on the criteria and rating scales used <b>...</b> " size="49655" wordcount="6574" timestamp="2010-11-01T00:38:00Z" />
- <p ns="0" title="Olanzapine" snippet="Olanzapine (trade names Zyprexa, Zalasta, Zolafren, Olzapin, Oferta, Zypadhera or in combination with <span class='searchmatch'>fluoxetine</span> Symbyax ) is an atypical <b>...</b> " size="34028" wordcount="4540" timestamp="2010-10-30T17:45:42Z" />
- <p ns="0" title="Prozac (disambiguation)" snippet="Prozac is a proprietary name for the antidepressant drug <span class='searchmatch'>fluoxetine</span>. Prozac may also refer to: Prozac+ , an Italian punk band <b>...</b> " size="581" wordcount="78" timestamp="2010-04-23T20:24:31Z" />
- <p ns="0" title="SSRI discontinuation syndrome" snippet="paroxetine having the highest number of withdrawal syndrome reports and <span class='searchmatch'>fluoxetine</span> the highest number of drug dependence reports; the note <b>...</b> " size="41099" wordcount="5444" timestamp="2010-09-23T06:19:55Z" />
- </search>
- </query>
- <query-continue>
- <search sroffset="10" />
- </query-continue>
- </api>
2、列举wikipedia 的 category:
http://en.wikipedia.org/w/api.php?action=query&list=allcategories&acprefix=drug&aclimit=10
返回10条以drug开头的category;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <allcategories>
- <c xml:space="preserve">Drug-induced Suicide</c>
- <c xml:space="preserve">Drug-realted suicides</c>
- <c xml:space="preserve">Drug-related Films</c>
- <c xml:space="preserve">Drug-related Suicides</c>
- <c xml:space="preserve">Drug-related death in California</c>
- <c xml:space="preserve">Drug-related deaths</c>
- <c xml:space="preserve">Drug-related deaths by country</c>
- <c xml:space="preserve">Drug-related deaths in Alabama</c>
- <c xml:space="preserve">Drug-related deaths in Alaska</c>
- <c xml:space="preserve">Drug-related deaths in Arizona</c>
- </allcategories>
- </query>
- <query-continue>
- <allcategories acfrom="Drug-related deaths in Arkansas" />
- </query-continue>
- </api>
3、返回具有相应title页面的timestamp|user|comment|content 信息;
结果:
- <?xml version="1.0"?>
- <api>
- <query>
- <pages>
- <page pageid="27697087" ns="0" title="API">
- <revisions>
- <rev user="Graham87" timestamp="2010-06-13T08:41:17Z" comment="Protected API: restore protection ([edit=sysop] (indefinite) [move=sysop] (indefinite))" xml:space="preserve">#REDIRECT [[Application programming interface]]{{R from abbreviation}}</rev>
- </revisions>
- </page>
- </pages>
- </query>
- </api>
4、解析页面:
http://en.wikipedia.org/w/api.php?action=parse&format=xml&page=fluoxetine
用上面的查询返回的[content]是wikipedia的标记格式,这个api返回的是html格式的文本:
可以用xpath="api/parse/text" 返回html内容。
* action=parse *
This module parses wikitext and returns parser output
This module requires read rights.
Parameters:
title - Title of page the text belongs to
Default: API
text - Wikitext to parse
summary - Summary to parse
page - Parse the content of this page. Cannot be used together with text and title
redirects - If the page parameter is set to a redirect, resolve it
oldid - Parse the content of this revision. Overrides page
prop - Which pieces of information to get.
NOTE: Section tree is only generated if there are more than 4 sections, or if the __TOC__ keyword is present
Values (separate with '|'): text, langlinks, categories, links, templates, images, externallinks, sections, revid, displaytitle, headitems, headhtml
Default: text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle
pst - Do a pre-save transform on the input before parsing it.
Ignored if page or oldid is used.
onlypst - Do a PST on the input, but don't parse it.
Returns PSTed wikitext. Ignored if page or oldid is used.
Example:
api.php?action=parse&text={{Project:Sandbox}}
来源:http://john2007.iteye.com/blog/800446
利用wikipedia 的API实现对其内容的查询的更多相关文章
- 利用百度地图API实现地址和经纬度互换查询
import json import requests def baiduMap(input_para): headers = { 'User-Agent': 'Mozilla/5.0 (Window ...
- 利用百度词典API和Volley网络库开发的android词典应用
关于百度词典API的说明,地址在这里:百度词典API介绍 关于android网络库Volley的介绍说明,地址在这里:Android网络通信库Volley 首先我们看下大体的界面布局!
- 利用Google Speech API实现Speech To Text
很久很久以前, 网上流传着一个免费的,识别率暴高的,稳定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的时候,总是返回500 Error. 后来 ...
- 利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)
利用未公开API获取终端会话闲置时间(Idle Time)和登入时间(Logon Time)作者:Tuuzed(土仔) 发表于:2008年3月3日23:12:38 版权声明:可以任意转载,转载时请 ...
- 【百度地图API】建立全国银行位置查询系统(五)——如何更改百度地图的信息窗口内容?
原文:[百度地图API]建立全国银行位置查询系统(五)--如何更改百度地图的信息窗口内容? 摘要: 酷讯.搜房.去哪儿网等大型房产.旅游酒店网站,用的是百度的数据库,却显示了自定义的信息窗口内容,这是 ...
- ASP.NET Core Web APi获取原始请求内容
前言 我们讲过ASP.NET Core Web APi路由绑定,本节我们来讲讲如何获取客户端请求过来的内容. ASP.NET Core Web APi捕获Request.Body内容 [HttpPos ...
- 利用WordPress REST API 开发微信小程序从入门到放弃
自从我发布并开源WordPress版微信小程序以来,很多WordPress网站的站长问有关程序开发的问题,其实在文章:<用微信小程序连接WordPress网站>讲述过一些基本的要点,不过仍 ...
- 利用百度翻译API,获取翻译结果
利用百度翻译API,获取翻译结果 translate.py #!/usr/bin/python #-*- coding:utf-8 -*- import sys reload(sys) sys.set ...
- 白话SpringCloud | 第十一章:路由网关(Zuul):利用swagger2聚合API文档
前言 通过之前的两篇文章,可以简单的搭建一个路由网关了.而我们知道,现在都奉行前后端分离开发,前后端开发的沟通成本就增加了,所以一般上我们都是通过swagger进行api文档生成的.现在由于使用了统一 ...
随机推荐
- JavaBean的用法
JavaBean是一个可重复使用的软件组件,是用Java语言编写的.遵循一定标准的类. JavaBean是Java Web的重要组件,它封装了数据和操作的功能类,供JSP和Servlet调用,完成数据 ...
- Django (2)
一.Django基本 程序编写 a. url.py /index/ -> func b. views.py def func(request): # 包含所有 ...
- firefox插件HTTP-Tool的使用方法
2016年11月3日 14:32:01 星期四 chrome 有postman很强大 我比较懒, 不想FQ, 经常用firefox, 试了几款模拟post请求的插件, 觉得http-tool挺简洁的 ...
- Linux下编译安装MariaDB
MariaDB是MySQL的一个开源分支,主要是社区在维护,并且完全兼容MySQL,并且可以很方便的称为MySQL的替代,MariaDB的诞生正是出自MySQL创始人Michael Widenius之 ...
- Debian-based Linux distributions 安装 virtualbox
Add the following line to your /etc/apt/sources.list: deb http://download.virtualbox.org/virtualbox/ ...
- mmap为什么比read/write快(兼论buffercache和pagecache)
参考文献: <从内核文件系统看文件读写过程>http://www.cnblogs.com/huxiao-tee/p/4660352.html?utm_source=tuicool& ...
- MongoDB学习笔记
MongoDB的学习目标(v.3.4.0) 1.MongoDB的概念,非关系型数据库NOSQL 2.学会MongoDB的搭建 3.熟悉MongoDB使用 最基本的文档的读写更新删除 各种不同类型的索引 ...
- lvs+keepalived
一.简介 VS/NAT原理图: 二.系统环境 实验拓扑: 系统平台:CentOS 6.3 Kernel:2.6.32-279.el6.i686 LVS版本:ipvsadm-1.26 keepalive ...
- tp框架,访问方式、空方法
访问MVC模式 方法: 在Application文件夹里新建一个文件夹Admin,在Admin里面新建与Home文件夹内相同的5个文件夹,分别为:Common.Conf.Controller.Mode ...
- Flask 框架入门
Flask Flask是一个使用 Python 编写的轻量级 Web 应用框架.其 WSGI 工具箱采用 Werkzeug ,模板引擎则使用 Jinja2 . 安装 Flask 依赖两个外部库, We ...