URL Parsing

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:


Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.


The scheme argument gives the default addressing scheme.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.

urllib.parse.urlunparse(parts)

Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL.

urllib.parse.urlunsplit(parts)

Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.

urllib.parse.urldefrag(url)

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章

  1. w3 parse a url

     最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...

  2. GO语言的开源库

    Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...

  3. Python Web Crawler

    Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...

  4. Curl的编译

    下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...

  5. Go语言(golang)开源项目大全

    转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...

  6. Go by Example

    Go by Example Go is an open source programming language designed for building simple, fast, and reli ...

  7. Node.js API

    Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...

  8. [转]Go语言(golang)开源项目大全

    内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...

  9. python爬虫从入门到放弃(三)之 Urllib库的基本使用

    官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...

随机推荐

  1. 转--Eclipse中ctrl+shift+r与ctrl+shift+t的区别

    Eclipse中ctrl+shift+r与ctrl+shift+t的区别 标签: 快捷键工作区文件搜索打开资源文件 2016-09-13 15:44 292人阅读 评论(0) 收藏 举报  分类: e ...

  2. Morphia 学习一 注解

    http://blog.csdn.net/liumm0000/article/details/7535858 生命周期方法注解(delete没有生命周期事件)@PrePersist save之前被调用 ...

  3. WinForm中使用XML文件存储用户配置及操作本地Config配置文件

    大家都开发winform程序时候会大量用到配置App.config作为保持用户设置的基本信息,比如记住用户名,这样的弊端就是每个人一些个性化的设置每次更新程序的时候会被覆盖. 故将配置文件分两大类: ...

  4. SSIS 参数与环境

    微软 BI 系列随笔 - SSIS 基础 - 参数与环境 简介 在上一篇博客中,主要讲述了如何实现SSIS的项目部署,参见 微软 BI 系列随笔 - SSIS 2012 基础 - SSIS 项目部署模 ...

  5. MySQL查询今天/昨天/本周、上周、本月、上个月份数据的sql代码

    MySQL查询本周.上周.本月.上个月份数据的sql代码 作者: 字体:[增加 减小] 类型:转载 时间:2012-11-29我要评论 MySQL查询的方式很多,下面为您介绍的MySQL查询实现的是查 ...

  6. python中获取当前所有的logger

    获得当前所有logger的列表的程序如下: import logging for name in logging.Logger.manager.loggerDict.keys(): logger = ...

  7. OAF_JDBC系列1 - 数据库交互取值方式(案例)

    2014-06-15 Created By BaoXinjian

  8. (DP)3.Longest Substring Without Repeating Characters

    Given a string, find the length of the longest substring without repeating characters. For example, ...

  9. 19. Palindrome Partitioning && Palindrome Partitioning II (回文分割)

    Palindrome Partitioning Given a string s, partition s such that every substring of the partition is ...

  10. 飞凌OK6410开发板SDIO无线8189WIFI模块驱动移植

    为什么要移植?开发板不是已经提供了无线驱动吗? 貌似是这样的..本来是好用的.加入自己第三方驱动后发现WIFI用不了...最后发现飞凌提供的内核里面没有8189芯片的代码...问售后他们说那边是好的. ...