URL Parsing

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:


Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.


The scheme argument gives the default addressing scheme.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.

urllib.parse.urlunparse(parts)

Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL.

urllib.parse.urlunsplit(parts)

Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.

urllib.parse.urldefrag(url)

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章

  1. w3 parse a url

     最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...

  2. GO语言的开源库

    Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...

  3. Python Web Crawler

    Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...

  4. Curl的编译

    下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...

  5. Go语言(golang)开源项目大全

    转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...

  6. Go by Example

    Go by Example Go is an open source programming language designed for building simple, fast, and reli ...

  7. Node.js API

    Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...

  8. [转]Go语言(golang)开源项目大全

    内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...

  9. python爬虫从入门到放弃(三)之 Urllib库的基本使用

    官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...

随机推荐

  1. Do It Wrong, Get It Right

    Do It Wrong, Get It Right Time Limit: 5000ms, Special Time Limit:12500ms, Memory Limit:65536KB Total ...

  2. 一些java方法,一些感想

    String.valueof(),string,toString; 都是转化为字符串,如果为null的话用tostring会报空指针异常,String的效率最高,string.valuof效果最好. ...

  3. 论文笔记之:Speed Up Tracking by Ignoring Features

    Speed Up Tracking by Ignoring Features CVPR 2014 Abstract:本文提出一种特征选择的算法,来实现用最"精简"的特征以进行目标跟 ...

  4. MQTT服务器搭建-mosquitto1.4.4安装指南

    Mosquitto mosquitto是一款实现了 MQTT v3.1 协议的开源的消息代理服务软件. 其提供了非常轻量级的消息数据传输协议,采用发布/订阅模式进行工作,可用于物联设备.中间件.APP ...

  5. linux服务之maven

    curl -O http://mirrors.noc.im/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip [root@d ...

  6. ubuntu 编译安装 srilm

    Ubuntu 64bit系统下SRILM的配置 依赖软件包(先进行): 1.c/c++ compiler:编译器gcc 3.4.3及以上版本,我的是gcc 4.4 2.GNU make:构建和管理工程 ...

  7. SQL2005:SQL Server 2005还原数据库时出现“不能选择文件或文件组XXX_log用于此操作的解决办法

    SQL2005 还原数据库失败,提示如下: SQL Server 2005还原数据库时出现“不能选择文件或文件组XXX_log用于此操作的解决办法 出现错误时操作步骤为:右击数据库--->任务- ...

  8. 建工财务搬家NC变更|rman各种测试|

    1,使用全备份之后的还原不需要建立表空间. 2,归档日志备份之后,使用delete all input,在backup database plus achivelog之后,会在完成备份之后自动删除归档 ...

  9. JavaScript常用小技巧

    1.获取访问地址URL的参数 <script type="text/javascript"> var param = ""; var nowUrl ...

  10. 【学】CSS3基础实例1 - 用CSS3做网页中的小三角,以及transition的用法

    自开了博客园已经有2周了吧,虽然转载了一些觉得比较有用的文章之外还没有开始写自己的一些学习记录,那就从今天开始. 目前看了妙味的不少视频,有css+html,js的基础和中级也都看完了,作业也都做了, ...