URL Parsing

urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True)

Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:


Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.


The scheme argument gives the default addressing scheme.

If the allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and fragment is set to the empty string in the return value.

urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.

urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a list of name, value pairs.

urllib.parse.urlunparse(parts)

Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL.

urllib.parse.urlunsplit(parts)

Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.

urllib.parse.urldefrag(url)

If url contains a fragment identifier, return a modified version of url with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in url, return url unmodified and an empty string.

URL Parsing的更多相关文章

  1. w3 parse a url

     最新链接:https://www.w3.org/TR/html53/ 2.6 URLs — HTML5 li, dd li { margin: 1em 0; } dt, dfn { font-wei ...

  2. GO语言的开源库

    Indexes and search engines These sites provide indexes and search engines for Go packages: godoc.org ...

  3. Python Web Crawler

    Python版本:3.5.2 pycharm URL Parsing¶ https://docs.python.org/3.5/library/urllib.parse.html?highlight= ...

  4. Curl的编译

    下载 curl的官网:https://curl.haxx.se/ libcurl就是一个库,curl就是使用libcurl实现的. curl是一个exe,也可以说是整个项目的名字,而libcurl就是 ...

  5. Go语言(golang)开源项目大全

    转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析 ...

  6. Go by Example

    Go by Example Go is an open source programming language designed for building simple, fast, and reli ...

  7. Node.js API

    Node.js v4.4.7 Documentation(官方文档) Buffer Prior to the introduction of TypedArray in ECMAScript 2015 ...

  8. [转]Go语言(golang)开源项目大全

    内容目录 Astronomy 构建工具 缓存 云计算 命令行选项解析器 命令行工具 压缩 配置文件解析器 控制台用户界面 加密 数据处理 数据结构 数据库和存储 开发工具 分布式/网格计算 文档 编辑 ...

  9. python爬虫从入门到放弃(三)之 Urllib库的基本使用

    官方文档地址:https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块urllib.r ...

随机推荐

  1. vs2010编译出的exe“应用程序无法正常启动(0xc0150002)”

    今天编译出一个使用ogre1.6.5动态库的应用程序,启动时报"应用程序无法正常启动(0xc0150002)"的错误提示. 编译环境是Win10+VS2010.这个错误可以在Win ...

  2. Linux-记录一次被当肉鸡行为

    转自:http://huoding.com/2016/03/07/495 话说从前些天开始,我的某台服务器不时会出现外网访问响应速度变慢的情况,不过内网访问倒是一直正常.因为并不是核心服务器,所以一开 ...

  3. dubbo 转

      http://blog.csdn.net/zhiguozhu/article/details/50517513 背景 随着互联网的发展,网站应用的规模不断扩大,常规的垂直应用架构已无法应对,分布式 ...

  4. 5分钟实现VS2010整合NUnit进行单元测试

    本文转载自:http://www.cnblogs.com/jeffwongishandsome/archive/2012/03/18/2404845.html 1.下载安装NUnit(最新win版本为 ...

  5. Apache Shiro 使用手册(五)Shiro 配置说明

    Apache Shiro的配置主要分为四部分:  对象和属性的定义与配置 URL的过滤器配置 静态用户配置 静态角色配置 其中,由于用户.角色一般由后台进行操作的动态数据,因此Shiro配置一般仅包含 ...

  6. mysql:字符串转换为日期类型

    函数:DATE_FORMAT http://www.w3school.com.cn/sql/func_date_format.asp

  7. ASP.NET MVC+EF5 开发常用代码

      Asp.Net Mvc,EF 技术常用点总结 1.Asp.Net MVC a)获得当前控制器名和当前操作的名称(action) 1.Action 中 RouteData.Values[" ...

  8. java io流 对文件操作

    检查文件是否存在 获取文件路径 获取文件大小 ...... 更多参考手册 //对文件的操作 //检查文件是否存在 //获取文件路径 //获取文件大小 //文件是否可读 //文件是否可写 //.... ...

  9. java 线程的使用

    java 线程的使用 //线程的使用 //需要记三个单词 //1.Thread 线程的类名 //2. Runnable 线程的接口 //3. start 执行线程 //使用继承线程类的方式实现线程 c ...

  10. 鸟哥的linux私房菜学习

    cat /etc/shells 系统拥有的shellcat /etc/passwd 记录用户使用的shell按两次 tab 键可显示所有可执行的指令alias 查看所有命令的别名alias lm='l ...