如何判断一个Http Message的结束—

HTTP/1.1 默认的连接方式是长连接，不能通过简单的TCP连接关闭判断HttpMessage的结束。

以下是几种判断HttpMessage结束的方式：

1. HTTP协议约定status code 为1xx，204，304的应答消息不能包含消息体（Message Body）, 直接忽略掉消息实体内容。

[适用于应答消息]

Http Message =Http Header

2. 如果请求消息的Method为HEAD，则直接忽略其消息体。[适用于请求消息]

Http Message =Http Header

3. 如果Http消息头部有“Transfer-Encoding:chunked”，则通过chunk size判断长度。

4. 如果Http消息头部有Content-Length且没有Transfer-Encoding（如果同时有Content-Length和Transfer-Encoding,则忽略Content-Length），

则通过Content-Length判断消息体长度。

5. 如果采用短连接（Http Message头部Connection:close），则直接可以通过服务器关闭连接来确定消息的传输长度。

[适用于应答消息，Http请求消息不能以这种方式确定长度]

6. 还可以通过接收消息超时判断，但是不可靠。Python Proxy实现的http代理服务器用到了超时机制，源码地址见References[7]，仅100多行。

HTTP协议规范RFC 2616的4.4 Message Length中对相关内容有较多的描述(https://tools.ietf.org/html/rfc2616#section-4.4)。

一个实例，Python标准库httplib.py源码解读（http协议客户端的实现）

httplib最简单的使用方法：

import httplib

conn = httplib.HTTPConnection("google.com")

conn.request('GET', '/')

print conn.getresponse().read()

conn.close()

但是一般不直接使用httplib,而是使用更高层的封装urllib,urllib2

conn = httplib.HTTPConnection("google.com")创建HTTPConnection对象，指定要请求的webserver.

conn.request('GET', '/')向google.com发送http请求，Method为GET

conn.getresponse()创建HTTPResponse对象，接收并读取http应答消息头，read()读取应答消息体。

函数调用关系:

getresponse()->[创建HTTPResponse对象response]-> response.begin()->response.read()

重点是begin()和read()，begin()完成了4件事：

(1)创建HTTPMessage对象并解析Http应答消息的头部。

(2)查看头部是否有“Transfer-Encoding:chunked”。

(3)查看接收完应答消息后是否关闭TCP连接(调用_check_close())。

(4)如果头部有“Content-Length”并且没有“Transfer-Encoding:chunked”，则获取消息体长度。

_check_close()判断若Http应答消息头部有“Connection:close”则接收完应答消息后关闭TCP连接，同时还有一些向后兼容HTTP/1.0的代码。HTTP/1.1默认是“Connection:Keep-Alive”，即使头部中没有。

read()根据Content-Length或chunked分块方式读取Http应答消息体，可一次全部读取也可以指定要读取的字节数。如果是chunked方式，调用_read_chunked()读取。

_read_chunked()根据chunksize读取chunks,当读取完最后一个chunk(最后一个chunk的chunksize = 0)后就完成了Http应答消息的接收。相关的HTTP协议规范参考RFC2616 3.6.1，RFC2616 19.4.6

RFC 2616 19.4.6
有一段如何解析
chunked
方式的
Http
消息的伪代码：

length:= 0

readchunk-size, chunk-extension (if any) and CRLF

while(chunk-size > 0) {

read chunk-data and CRLF

append chunk-data to entity-body

length := length + chunk-size

read chunk-size and CRLF

}

readentity-header

while(entity-header not empty) {

append entity-header to existing headerfields

read entity-header

}

Content-Length:= length

Remove"chunked" from Transfer-Encoding

来看一下begin()，_check_close()，read()，_read_chunked()的主要代码：

(1)
begin():

 def begin(self):

		......

        self.msg = HTTPMessage(self.fp, 0)

        # don't let the msg keep an fp

        self.msg.fp = None

        # are we using the chunked-style of transfer encoding?

        tr_enc = self.msg.getheader('transfer-encoding')

        if tr_enc and tr_enc.lower() == "chunked":

            self.chunked = 1

            self.chunk_left = None

        else:

            self.chunked = 0

        # will the connection close at the end of the response?

        self.will_close = self._check_close()

        # do we have a Content-Length?

        # NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked"

        length = self.msg.getheader('content-length')

        if length and not self.chunked:

            try:

                self.length = int(length)

            except ValueError:

                self.length = None

            else:

                if self.length < 0:  # ignore nonsensical negative lengths

                    self.length = None

        else:

            self.length = None

        # does the body have a fixed length? (of zero)

        # NO_CONTENT = 204, NOT_MODIFIED = 304

        #判断Http Response Message 结束,见本文开头总结的第1点

        if (status == NO_CONTENT or status == NOT_MODIFIED or

            100 <= status < 200 or      # 1xx codes

            self._method == 'HEAD'):

            self.length = 0

        # if the connection remains open, and we aren't using chunked, and

        # a content-length was not provided, then assume that the connection

        # WILL close.

        #判断Http Response Message 结束，如果没有chunked和Content-Length都没有使用，就关闭连接

        if not self.will_close and \

           not self.chunked and \

           self.length is None:

            self.will_close = 1

(2)_check_close():

    def _check_close(self):

        #判断Http Response Message 结束,见本文开头总结的第5点

        conn = self.msg.getheader('connection')

        if self.version == 11:

            # An HTTP/1.1 proxy is assumed to stay open unless

            # explicitly closed.

            conn = self.msg.getheader('connection')

            if conn and "close" in conn.lower():

                return True

            return False

        # Some HTTP/1.0 implementations have support for persistent

        # connections, using rules different than HTTP/1.1.

        # For older HTTP, Keep-Alive indicates persistent connection.

        if self.msg.getheader('keep-alive'):

            return False

        # At least Akamai returns a "Connection: Keep-Alive" header,

        # which was supposed to be sent by the client.

        if conn and "keep-alive" in conn.lower():

            return False

        # Proxy-Connection is a netscape hack.

        pconn = self.msg.getheader('proxy-connection')

        if pconn and "keep-alive" in pconn.lower():

            return False

        # otherwise, assume it will close

        return True

(3)
read():

    def read(self, amt=None):

        if self.fp is None:

            return ''

        if self._method == 'HEAD':

            self.close()

            return ''

        if self.chunked:

            return self._read_chunked(amt)

        if amt is None:

            # unbounded read

            if self.length is None:

                s = self.fp.read()

            else:

                try:

                    s = self._safe_read(self.length)

                except IncompleteRead:

                    self.close()

                    raise

                self.length = 0

            self.close()        # we read everything

            return s

        if self.length is not None:

            if amt > self.length:

                # clip the read to the "end of response"

                amt = self.length

        # we do not use _safe_read() here because this may be a .will_close

        # connection, and the user is reading more bytes than will be provided

        # (for example, reading in 1k chunks)

        s = self.fp.read(amt)

        if not s:

            # Ideally, we would raise IncompleteRead if the content-length

            # wasn't satisfied, but it might break compatibility.

            self.close()

        if self.length is not None:

            #计算剩余长度，供下次读取

            self.length -= len(s)

            if not self.length:

                self.close()

        return s

(4)
_read_chunked():

def _read_chunked(self, amt):

        assert self.chunked != _UNKNOWN

        # self.chunk_left is None when reading chunk for the first time(see self.begin())

        #chunk_left :bytes left in certain chunk

        #chunk_left = None means that reading hasn't been started.

        chunk_left = self.chunk_left

        value = []

        while True:

            if chunk_left is None:

                # read a new chunk

                line = self.fp.readline(_MAXLINE + 1)

                if len(line) > _MAXLINE:

                    raise LineTooLong("chunk size")

                i = line.find(';')

                if i >= 0:

                    line = line[:i] # strip chunk-extensions

                try:

                    chunk_left = int(line, 16)

                except ValueError:

                    # close the connection as protocol synchronisation is

                    # probably lost

                    self.close()

                    raise IncompleteRead(''.join(value))

                if chunk_left == 0:

                    ##RFC 2661 3.6.1 last-chunk chunk_left = 0

                    break

            if amt is None:

                value.append(self._safe_read(chunk_left))

            elif amt < chunk_left:

                value.append(self._safe_read(amt))

                self.chunk_left = chunk_left - amt

                return ''.join(value)

            elif amt == chunk_left:

                value.append(self._safe_read(amt))

                self._safe_read(2)  # toss the CRLF at the end of the chunk

                self.chunk_left = None

                return ''.join(value)

            else:

                value.append(self._safe_read(chunk_left))

                amt -= chunk_left

            # we read the whole chunk, get another

            self._safe_read(2)      # toss the CRLF at the end of the chunk

            chunk_left = None

        ......

        # we read everything; close the "file"

        self.close()

        return ''.join(value)

另一个实际的源码，PythonProxy中，到达超时时间后停止接收消息。_read_write()读取和写入已打开的socket。

def _read_write(self):

        time_out_max = self.timeout/3

        socs = [self.client, self.target]

        count = 0

        while 1:

            count += 1

            # time_out = 3

            (recv, _, error) = select.select(socs, [], socs, 3)

            if error:

                break

            if recv:

                for in_ in recv:

                    data = in_.recv(BUFLEN)

                    if in_ is self.client:

                        out = self.target

                    else:

                        out = self.client

                    if data:

                        out.send(data)

                        count = 0

            #连续time_out_max次未接收到数据就停止接收和发送[超时了]

            if count == time_out_max:

                break

有了上面的分析和源码，这个问题应该很好回答了：

当HTTP采用keepalive模式，当服务器响应客户端的请求后，客户端如何判断接收到的Http ResponseMessage已经接收完成?

最后，再附上stackoverflow上一个关于如何判断Http Message结束的回答：

References

[1]Hypertext Transfer Protocol -- HTTP/1.1

https://tools.ietf.org/html/rfc2616

[2]Detect end of HTTP request body

http://stackoverflow.com/questions/4824451/detect-end-of-http-request-body

[3]Detect the end of a HTTP packet

http://stackoverflow.com/questions/3718158/detect-the-end-of-a-http-packet

[4] 判断Keep-Alive模式的HTTP请求的结束

http://blog.quanhz.com/archives/141

[5] 这样被判了死刑！

http://www.cnblogs.com/skynet/archive/2010/12/11/1903347.html

[6]杂谈Nginx与HTTP协议

http://blog.xiuwz.com/tag/content-length/

[7]Python Proxy- A Fast HTTP proxy

https://code.google.com/p/python-proxy/

[8] python基于http协议编程：httplib,urllib和urllib2

http://www.cnblogs.com/chenzehe/archive/2010/08/30/1812995.html

如何判断一个Http Message的结束——python源码解读的更多相关文章

selenium之python源码解读-expected_conditions
一.expected_conditions 之前在 selenium之python源码解读-WebDriverWait 中说到,until方法中method参数,需要传入一个function对象,如果 ...
一个类似植物大战僵尸的python源码
# 1 - Import library import pygame from pygame.locals import * import math import random # 2 - Initi ...
python 源码解读2
http://www.jianshu.com/users/4d4a2f26740b/latest_articles http://blog.csdn.net/ssjhust123/article/ca ...
selenium之python源码解读-WebDriverWait
一.显示等待所谓显示等待,是针对某一个特定的元素设置等待时间,如果在规定的时间内找到了该元素,就执行相关的操作,如果在规定的时间内没有找到该元素,在抛出异常 PS:注意显示等待和隐身等待的区别,隐身 ...
selenium之python源码解读-webdriver继承关系
一.webdriver继承关系在selenium中,无论是常用的Firefox Driver 还是Chrome Driver和Ie Drive,他们都继承至selenium\webdriver\re ...
Python源码读后小结
Python 笔记前言(还是叫杂记吧) 在python中一切皆对象, python中的对象体系大致包含了"类型对象", "Mapping对象(dict)", ...
分享linux系统more基本命令python源码
此python源码是linux系统more基本命令的实现. 实现linux中more的基本功能,当more后加一个文件名参数时候,分屏显示按空格换页,按回车换行',在左下角显示百分比; 以处理管道参数 ...
Python源码剖析——02虚拟机
<Python源码剖析>笔记第七章:编译结果 1.大概过程运行一个Python程序会经历以下几个步骤: 由解释器对源文件(.py)进行编译,得到字节码(.pyc文件) 然后由虚拟机按照 ...
Python源码剖析——01内建对象
<Python源码剖析>笔记第一章:对象初识对象是Python中的核心概念,面向对象中的"类"和"对象"在Python中的概念都为对象,具体分为 ...

随机推荐

Face-landmarks-detection-benchmark 人脸特征定位网站汇总
源地址:https://www.douban.com/note/525032729/ https://github.com/delphifirst/FaceXhttps://github.com/ ...
调用一个系统命令，并读取它的输出值（使用QProcess.readAll）
下面我们再看一个更复杂的例子,调用一个系统命令,这里我使用的是 Windows,因此需要调用 dir:如果你是在 Linux 进行编译,就需要改成 ls 了. mainwindow.h #ifndef ...
基于Chrome开源提取的界面开发框架开篇--转
初衷一直希望VC开发者能够方便的开发出细腻高品质的用户界面.我喜欢C++,选择的平台是Windows,所以大部分时间用VC.我自身不排斥其他技术或者开发语言或者开发工具,都去了解,了解的目的是想吸取 ...
在浏览器中通过bartender，调用条码打印机的active控件代码的实现
系统中须要在浏览器,直接调用条码打印机,打印出产品条码. 现实中的条码打印机,品种繁多,非常难在一个程序中实现, 于是我们用已经支持全部条码打印机的bartender软件调用它的api ,来实如今浏 ...
delphi 文件的读取（二进制文件和文本文件）
http://blog.csdn.net/earbao/article/details/9174033
Go的String转码包
https://github.com/qiniu/iconv https://github.com/djimenez/iconv-go 这是与go不相干的转码包:https://github.com/ ...
thinkphp 3.2.3 入门示例
原文:thinkphp3.2 1.安装WAMPServer,到D:\wamp\. 2.下载ThinkPHP3.2.3核心版.解压缩后,放到D:\wamp\www\MyWeb\.打开浏览器,输入网址:h ...
[Android学习笔记]ShareSDK的使用
ShareSDK使用方便,集成简单,正式客户端开发人员的首选组件集成步骤,使用说明见官方文档: http://wiki.sharesdk.cn/Android_快速集成指南记录: 直接使用官方De ...
主流JavaScript框架（Dojo、Google Closure、jQuery、Prototype、Mootools和YUI）的分析和对比
本文主要选取了目前比较流行的JavaScript框架Dojo.Google Closure.jQuery.Prototype.Mootools和YUI进行对比,主要是根据网上的资料整理而成,希望可以供 ...
下载jdk文件后缀是.gz而不是.tar.gz怎么办
用chrom浏览器下载了linux版的jdk,发现文件后缀是.gz,没看过这玩意,一打开,还是一个.gz文件,原本以为是新文件后缀呢.那个百度google啊. . ..最后都没发现有这方面的资料啊.. ...

如何判断一个Http Message的结束——python源码解读

如何判断一个Http Message的结束——python源码解读的更多相关文章

随机推荐

热门专题