What is URL Encoding and How does it work?
Introduction
A URL (Uniform Resource Locator) is the address of a resource in the world wide web. URLs have a well-defined structure which was formulated in RFC 1738 by Tim Berners-Lee, the inventor of the world wide web.
Every URL confirms to a generic syntax which looks like this -
scheme:[//[user:password@]host[:port]]path[?query][#fragment]
Some parts of the URL syntax like [user:password@] are deprecated and seldom used due to security reasons. Following is an example of a URL that you see more often on the internet -
https://www.google.com/search?q=hello+world#brs
There have been many improvements done to the initial RFC defining the syntax of Uniform Resource Locators (URLs). The current RFC that defines the Generic URI syntax is RFC 3986. This post contains information from the latest RFC document.
URL Encoding (Percent Encoding)
A URL is composed from a limited set of characters belonging to the US-ASCII character set. These characters include digits (0-9), letters(A-Z, a-z), and a few special characters ("-", ".", "_", "~").
ASCII control characters (e.g. backspace, vertical tab, horizontal tab, line feed etc), unsafe characters like space, \, <, >, {, } etc, and any character outside the ASCII charset is not allowed to be placed directly within URLs.
Moreover, there are some characters that have special meaning within URLs. These characters are called reserved characters. Some examples of reserved characters are ?, /, #, : etc. Any data transmitted as part of the URL, whether in query string or path segment, must not contain these characters.
So what do we do when we need to transmit any data in the URL that contain these disallowed characters? Well, we encode them!
URL Encoding converts reserved, unsafe, and non-ASCII characters in URLs to a format that is universally accepted and understood by all web browsers and servers. It first converts the character to one or more bytes. Then each byte is represented by two hexadecimal digits preceded by a percent sign (
%) - (e.g.%xy). The percent sign is used as an escape character.
URL encoding is also called percent encoding since it uses percent sign (%) as an escape character.
URL Encoding Example
Space: One of the most frequent URL Encoded character you’re likely to encounter is space. The ASCII value of space character in decimal is 32, which when converted to hex comes out to be 20. Now we just precede the hexadecimal representation with a percent sign (%), which gives us the URL encoded value - %20.
ASCII Character Encoding Reference
The following table is a reference of ASCII characters to their corresponding URL Encoded form.
Note that, Encoding alphanumeric ASCII characters are not required. For example, you don’t need to encode the character
'0'to%30as shown in the following table. It can be transmitted as is. But the encoding is still valid as per the RFC. All the characters that are safe to be transmitted inside URLs are colored green in the table.
The following table uses rules defined in RFC 3986 for URL encoding.
| Decimal | Character | URL Encoding (UTF-8) |
|---|---|---|
| 0 | NUL(null character) | %00 |
| 1 | SOH(start of header) | %01 |
| 2 | STX(start of text) | %02 |
| 3 | ETX(end of text) | %03 |
| 4 | EOT(end of transmission) | %04 |
| 5 | ENQ(enquiry) | %05 |
| 6 | ACK(acknowledge) | %06 |
| 7 | BEL(bell (ring)) | %07 |
| 8 | BS(backspace) | %08 |
| 9 | HT(horizontal tab) | %09 |
| 10 | LF(line feed) | %0A |
| 11 | VT(vertical tab) | %0B |
| 12 | FF(form feed) | %0C |
| 13 | CR(carriage return) | %0D |
| 14 | SO(shift out) | %0E |
| 15 | SI(shift in) | %0F |
| 16 | DLE(data link escape) | %10 |
| 17 | DC1(device control 1) | %11 |
| 18 | DC2(device control 2) | %12 |
| 19 | DC3(device control 3) | %13 |
| 20 | DC4(device control 4) | %14 |
| 21 | NAK(negative acknowledge) | %15 |
| 22 | SYN(synchronize) | %16 |
| 23 | ETB(end transmission block) | %17 |
| 24 | CAN(cancel) | %18 |
| 25 | EM(end of medium) | %19 |
| 26 | SUB(substitute) | %1A |
| 27 | ESC(escape) | %1B |
| 28 | FS(file separator) | %1C |
| 29 | GS(group separator) | %1D |
| 30 | RS(record separator) | %1E |
| 31 | US(unit separator) | %1F |
| 32 | space | %20 |
| 33 | ! | %21 |
| 34 | " | %22 |
| 35 | # | %23 |
| 36 | $ | %24 |
| 37 | % | %25 |
| 38 | & | %26 |
| 39 | ' | %27 |
| 40 | ( | %28 |
| 41 | ) | %29 |
| 42 | * | %2A |
| 43 | + | %2B |
| 44 | , | %2C |
| 45 | - | %2D |
| 46 | . | %2E |
| 47 | / | %2F |
| 48 | 0 | %30 |
| 49 | 1 | %31 |
| 50 | 2 | %32 |
| 51 | 3 | %33 |
| 52 | 4 | %34 |
| 53 | 5 | %35 |
| 54 | 6 | %36 |
| 55 | 7 | %37 |
| 56 | 8 | %38 |
| 57 | 9 | %39 |
| 58 | : | %3A |
| 59 | ; | %3B |
| 60 | < | %3C |
| 61 | = | %3D |
| 62 | > | %3E |
| 63 | ? | %3F |
| 64 | @ | %40 |
| 65 | A | %41 |
| 66 | B | %42 |
| 67 | C | %43 |
| 68 | D | %44 |
| 69 | E | %45 |
| 70 | F | %46 |
| 71 | G | %47 |
| 72 | H | %48 |
| 73 | I | %49 |
| 74 | J | %4A |
| 75 | K | %4B |
| 76 | L | %4C |
| 77 | M | %4D |
| 78 | N | %4E |
| 79 | O | %4F |
| 80 | P | %50 |
| 81 | Q | %51 |
| 82 | R | %52 |
| 83 | S | %53 |
| 84 | T | %54 |
| 85 | U | %55 |
| 86 | V | %56 |
| 87 | W | %57 |
| 88 | X | %58 |
| 89 | Y | %59 |
| 90 | Z | %5A |
| 91 | [ | %5B |
| 92 | \ | %5C |
| 93 | ] | %5D |
| 94 | ^ | %5E |
| 95 | _ | %5F |
| 96 | ` | %60 |
| 97 | a | %61 |
| 98 | b | %62 |
| 99 | c | %63 |
| 100 | d | %64 |
| 101 | e | %65 |
| 102 | f | %66 |
| 103 | g | %67 |
| 104 | h | %68 |
| 105 | i | %69 |
| 106 | j | %6A |
| 107 | k | %6B |
| 108 | l | %6C |
| 109 | m | %6D |
| 110 | n | %6E |
| 111 | o | %6F |
| 112 | p | %70 |
| 113 | q | %71 |
| 114 | r | %72 |
| 115 | s | %73 |
| 116 | t | %74 |
| 117 | u | %75 |
| 118 | v | %76 |
| 119 | w | %77 |
| 120 | x | %78 |
| 121 | y | %79 |
| 122 | z | %7A |
| 123 | { | %7B |
| 124 | | | %7C |
| 125 | } | %7D |
| 126 | ~ | %7E |
| 127 | DEL(delete (rubout)) | %7F |
Footnotes
What is URL Encoding and How does it work?的更多相关文章
- 关于Unicode和URL encoding入门的一切以及注意事项
本文同时也发表在我另一篇独立博客 <关于Unicode和URL encoding入门的一切以及注意事项>(管理员请注意!这两个都是我自己的原创博客!不要踢出首页!不是转载!已经误会三次了! ...
- rest-assured的根路径(root path)和URL编码(URL Encoding)
一.根路径(Root path) 为了避免在body方法中使用重复的路径来断言,我们可以指定一个根路径(root path),比如: 我们以前的写法是: when(). get("/some ...
- URL encoding(URL编码)
URL encoding(URL编码),也称作百分号编码(Percent-encoding),是指特定上下文的统一资源定位符(URL)编码机制UrlEncode:将字符串以URL编码返回值:字符串函数 ...
- url 编码(percentcode 百分号编码)(转载)
原文地址:http://www.cnblogs.com/leaven/archive/2012/07/12/2588746.html http://www.imkevinyang.com/2009 ...
- URL中的保留和不安全字符
书写URL时要使用US-ASCII字符集可以显示的字符. http://www.google.com 如果需要在URL中使用不属于此字符集的字符,就要使用特殊的符号对该字符进行编码. 如:最常使用的空 ...
- URL编码知识摘抄备忘
网页工具 http://www.107000.com/T-UrlEncode/ 参考: 维基百科http://zh.wikipedia.org/zh/%E7%99%BE%E5%88%86%E5%8F% ...
- Swift3.0语言教程字符串与URL的数据转换与自由转换
Swift3.0语言教程字符串与URL的数据转换与自由转换 Swift3.0语言教程字符串与URL的数据转换 Swift3.0语言教程字符串与URL的数据转换与自由转换,字符串中的字符永久保存除了可以 ...
- Web开发须知:URL编码与解码
通常如果一样东西需要编码,说明这样东西并不适合传输.原因多种多样,如Size过大,包含隐私数据,对于Url来说,之所以要进行编码,是因为Url中有些字符会引起歧义. 例如,Url参数字符串中使用key ...
- 详解JavaScript中的Url编码/解码,表单提交中网址编码
本文主要针对URI编解码的相关问题做了介绍,对Url编码中哪些字符需要编码.为什么需要编码做了详细的说明,并对比分析了Javascript 中和 编解码相关的几对函数escape / unescape ...
随机推荐
- Charles 抓包配置
本文参考:charles 抓包配置 proxy setting (代理设置) 设置的主界面如下: 动态端口 启用动态端口选项来监听动态端口,每次查询启动时选择.这样可以避免与计算机上可能运行的其他网络 ...
- Winsock.简单UDP
PS:vs2017 编译C++代码 支持 XP:项目属性-->链接器-->系统-->需要的最小版本--> 输入 "5.1" 1.ZC:测试:c向s 发送长度 ...
- SQL优化————Insert
1.如果是非生产环境,可以先将索引和约束删掉,等数据插入完之后,再建立索引和约束. 2.如果一次性插入数据较大,可以使用游标,每次小批量的插入数据. 3.如果数据表太大,可以构建历史表,老数据通常不会 ...
- python3+django报错testserver
manage.py testserver --addrport 127.0.0.1 报错 查看其它项目 manage.py runserver --addrport 127.0.0.1 正常 查找配置 ...
- 01 IO流(一)—— 流的概念、File类
1 流的概念理解(重要) 理解流的概念非常重要. 流,就是程序到数据源或目的地的一个通道. 我们把这个通道实例化得到一个具体的流,相当于一个数据传输工具,它可以在程序与资源之间进行数据交换. 换言之, ...
- SPI时序
1.串行外围接口 高速.全双工的同步通信总线 一主多从 一般速度几十MHZ,最高可以工作在上百MHZ 2.连接图 3.工作模式
- SAS学习笔记56 ODS ESCAPECHAR
这种内嵌格式独立于style型和table型,它既可以结合二者使用,也可以独立使用.它主要通过下列语句的格式形式来进行调用: ODS ESCAPECHAR ‘^’; 上述符号’^’表示触发条件,如果碰 ...
- mouseenter 与 mouseover 区别于选择
mouseover事件, 箭头在子元素移动会触发冒泡事件, 子元素的鼠标箭头可触父元素方法, 相反,mouseenter事件功能与mouseover类似, 但鼠标进入某个元素不会冒泡触发父元素方法. ...
- 转 winfrom组件圆角
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...
- for_each使用方法详解
for_each使用方法详解[转] Abstract之前在(原創) 如何使用for_each() algorithm? (C/C++) (STL)曾經討論過for_each(),不過當時功力尚淺, ...