URL Parsing
【URL Parsing】
urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)
Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:
Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
The scheme argument gives the default addressing scheme.
If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.
urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.
urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.
urllib.parse.urlunparse(parts)
Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the URL.
urllib.parse.urlunsplit(parts)
Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.
urllib.parse.urldefrag(url)
If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章
- w3 parse a url
最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...
- GO语言的开源库
Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...
- Python Web Crawler
Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...
- Curl的编译
下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...
- Go语言(golang)开源项目大全
转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...
- Go by Example
Go by Example Go is an open source programming language designed for building simple, fast, and reli ...
- Node.js API
Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...
- [转]Go语言(golang)开源项目大全
内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...
- python爬虫从入门到放弃(三)之 Urllib库的基本使用
官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...
随机推荐
- .jre下的lib和jdk下的lib的区别
jre是JDK的一个子集.提供一个运行环境.JDK的lib目录是给JDK用的,例如JDK下有一些工具,可能要用该目录中的文件.例如,编译器等.JRE的lib目录是为JVM,运行时候用的.包括所有的标准 ...
- POJ 3321 Apple Tree(树状数组)
Apple Tree Time Limit: 2000MS Memory Lim ...
- ASP.NET MVC+EF框架+EasyUI实现权限管理系列
http://www.cnblogs.com/hanyinglong/archive/2013/03/22/2976478.html ASP.NET MVC+EF框架+EasyUI实现权限管理系列之开 ...
- WebJars 进行 css js 资源文件管理
WebJars是将这些通用的Web前端资源打包成Java的Jar包,然后借助Maven工具对其管理,保证这些Web资源版本唯一性,升级也比较容易.关于webjars资源,有一个专门的网站http:// ...
- Linux(Debian)+Apache+Django 配置
配置Apache和Django连接的过程可谓是一波三折,在此记录. 零.基本的安装 软件环境 l Linux-3.2.0-4-amd64-x86_64-with-debian-7.7 l py ...
- 50 道 Java 线程面试题(转载自牛客网)
下面是 Java 线程相关的热门面试题,你可以用它来好好准备面试. 1) 什么是线程? 线程是操作系统能够进行运算调度的最小单位,它被包含在进程之中,是进程中的实际运作单位.程序员可以通过它进行多处理 ...
- Intellij IDEA连接Git@OSC
错误提示:fatal: remote origin already exists. 解决办法:$ git remote rm origin http://my.oschina.net/lujianin ...
- 各种数据分析图demo
极地蛛网图:http://www.hcharts.cn/demo/index.php?p=61 各种数据分析图demo: http://www.hcharts.cn/demo/index.php?p= ...
- Spring AOP 简单理解
AOP技术即(面向切面编程)技术是在面向对象编程基础上的发展,AOP技术是对所有对象或一类对象编程.核心是在不增加代码的基础上,还增加了新的功能.AOP编程在开发框架本身用的比较多,而实际项目中,用的 ...
- <<精益创业>>读书笔记
不要以严格地职能部门来组成公司,而是要以人们在各自专长的领域做出表现,组建跨部门的团队 在普通的管理中,如果无法实现目标,要么是计划不足,要么是技术不足 这点我感触比较深,以前在老东家的时间,刚开始是 ...