Down to the TLP: How PCI express devices talk (Part II)
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2
Data Link Layer Packets
Aside from wrapping TLPs with its header (2 bytes) and adding a CRC at the end (LCRC actually, 4 bytes), the Data Link layer runs packets of its own for maintaining reliable transmission. These special packets are Data Link Layer Packets (DLLPs). We’ll list them shortly:
- Ack DLLP for acknowledging successfully received TLPs.
 - Nack DLLP for indicating that a TLP arrived corrupted, and that a retransmit is due. Note that there’s also a timeout mechanism in case nothing that looks like a TLP arrives.
 - Flow Control DLLPs: InitFC1, InitFC2 and UpdateFC, used to announce credits, as described below.
 - Power Management DLLPs.
 
Flow control
As mentioned before, the data link layer has a Flow Control (FC) mechanism, which makes sure that a TLP is transmitted only when the link partner has enough buffer space to accept it.
I used the term “link partner” and not “destination” deliberately. For example, when a peripheral is connected to the Root Complex through a switch, it runs its flow control mechanism against the switch and not the final destination. In other words, once the TLP is transmitted from the peripheral, it’s still subject to the flow control mechanism between the switch and the Root Complex. If there are more switches on the way, each leg has its own flow control.
The mechanism is not the simplest, and its description in the spec will give you goosebumps. So I’ll try to put it fairly clear.
The flow control mechanism runs independent accounting for 6 (six!) distinct buffer consumers:
- Posted Requests TLP’s headers
 - Posted Requests TLP’s data
 - Non-Posted Requests TLP’s headers
 - Non-Posted Requests TLP’s data
 - Completion TLP’s headers
 - Completion TLP’s data
 
These are the six credit types.
The accounting is done in flow control units, which correspond to 4 DWs of traffic (16 bytes), always rounded up to the nearest integer. Since headers are always 3 or 4 DWs in length, every TLP transmitted consumes one unit from the respective header credit. When data is transmitted, the number of consumed units is the number of data DWs in the TLP, divided by four, rounded upwards. So we can imagine data buckets at the receiver of 16 bytes each, on which we are not allowed to mix data from different TLPs. Each bucket is a flow control unit.
Now lets imagine that there’s a doorkeeper at the transmitter, which counts the total number of flow control units consumed since the link establishment, separately for each credit type. This is six numbers to keep track of. This doorkeeper also has the information about the maximum number each of these credit types is allowed to reach. If a certain TLP for transmission would make any of these counted units exceed its limit, it’s not allowed through. Another TLP may be transmitted instead (subject to reordering rules) or the doorkeeper simply waits for the limit to rise.
This is the way the flow control works. When the link is established, both sides exchange their initial limits. As each receiver processes incoming packets, it updates the limits for its link partner, so it can use the buffer space released. UpdateFC FLLP packets are sent periodically to announce the new credit limits.
Well, I overlooked a small detail: Since we’re counting the total number of units since the link started, there’s always a potential for overflow. The PCIe standard allocates a certain number of bits for each credit type counter and its limit (8 bits for header credits, 12 bits for data credits), knowing that they will overflow pretty soon. This overflow is worked around by making the comparison between each counter and its limit with straightforward modulo arithmetic. So given some restrictions on not setting the limit too high above the counter, the flow control mechanism implements the doorkeeper described above.
Bus entities are allowed to announce an infinite credit limit for any or all of the six credit types, meaning that flow control for that specific credit type is disabled. As a matter of fact, endpoints (as opposed to switches and the Root Complex) must advertise an infinite credit for completion headers and data. In other words, an endpoint can’t refuse to accept a completion TLP based upon flow control. So the Requester of a non-posted transactions must take responsibility for being able to accept the completion by verifying that it has enough buffer space when making the request. This also applies to root complexes not allowing peer-to-peer transactions.
Virtual channels
In part I of this guide, I marked the TC fields in the example TLPs green, saying that those fields are almost always zero. TC stands for Traffic Class and is an identifier used to create Virtual Channels. These Virtual Channels are merely separate sets of data buffers having a separate flow control credits and counters. So by choosing a TC other than zero (and setting up the bus entities accordingly) one can have TLPs being subject to independent flow control systems, preventing TLPs belonging to one channel block the traffic of TLPs belonging to another.
The mapping from TC’s to Virtual Channels is done by software for each bus entity. Anyhow, the real-life PCIe elements I’ve seen so far support only one Virtual Channel, VC0, and hence only TC0 is used, which is the minimum required by spec. So unless some special application requires this, TC will remain zero in all TLPs, and this whole issue can be disregarded.
Packet reordering
One of the issues that comes to mind in a packet network, is to what extent the TLPs may arrive in an order different from how they were sent. The Internet Protocol (IP, as in TCP/IP) for example, allows any packet reshuffling on the way. The PCIe specification allows a certain extent of TLP reordering, and in fact in some cases reordering is mandatory to avoid deadlocks.
Fortunately, the legacy PCI compatibility concern was taken into account in this issue as well, unless the “relaxed ordering” bit is set in the TLP, which it rarely is. This is one of the bits in the Attr field, marked green in the TLP examples in part I of this guide. So all in all, one can trust that things will work as if there was a good old bus we were talking with. Those of us who write to a few registers, and then trigger an event by writing to another one, can go on doing it. I turn off the BAR’s Prefetch bit to be on the safe side, even though there’s nothing to imply that it has anything to do with writes.
The spec defines reordering rules in full detail, but it’s not easy to get the bottom line. So I’ll mention a few results of those rules. All here is said assuming relaxed ordering bit is cleared in all transactions. I’m also ignoring I/O space completely (why use it?):
- Posted writes and MSI’s arrive in the order they were sent. Now, all memory writes are posted, and MSIs are in fact (posted) memory writes. So we know for sure that memory writes are executed in order, and that if we issued an MSI after filling a buffer (writes…) it will arrive after the buffer was actually written to.
 - A read request will never arrive before a write request or MSI sent before it. As a matter of fact, performing a Read Request is a safe way to wait for a write to complete.
 - Write requests may very well come before read requests sent before them. This mechanism prevents deadlock in certain exotic scenarios. Don’t write to a certain memory area while waiting for the read completion to come in.
 - Read completions for a certain request (i.e. with the same Tag and Requester ID) arrive in the order they were sent (so they arrive in order with rising addresses). Read completions of different request may be reordered (but who cares).
 
Other than that, anything can change order or arrival, including read requests which may be reordered among themselves and with read completions.
To relieve any paranoia about an interrupt message arriving before the write operations that preceded it, section 2.2.7 in the spec spells it out:
The Request format used for MSI/MSI-X transactions is identical to the Memory Write Request format defined above, and MSI/MSI-X Requests are indistinguishable from memory writes with regard to ordering, Flow Control, and data integrity.
Zero-length read request
As just mentioned, reading from a bus entity after writing to it, is a safe way to wait for the write operation to finish for real. But why read anything, if we’re not interested in the data? So they made up a zero-length request, which reads nothing. All four Byte Enables are assigned zeroes, meaning nothing is read. As for the completion, section 2.2.5 in the spec says:
If a Read Request of 1 DW specifies that no bytes are enabled to be read (1st DW BE[3:0] field = 0000b), the corresponding Completion must specify a Length of 1 DW, and include a data payload of 1 DW
So we have one DW of rubbish data in the completion. That’s fair enough.
Payload sizes and boundaries
Every TLP carrying data must limit the number of payload data DWs to Max_Payload_Size, which is a number allocated during configuration (typically 128 bytes). This number applies only to payloads, and not to the Length field itself: Memory Read Requests are not restricted in length by Max_Payload_Size (per spec 2.2.2), but are restricted by Max_Read_Request_Size (per spec 2.2.7).
So a Memory Read Request may ask for more data than is allowed in one TLP, and hence multiple TLP completions are inevitable.
Regardless of the Max_Payload_Size restrictions, completions of (memory) read requests may be split into several completion TLPs. The cuts must be in addresses aligned by RCB bytes (Request Completion Boundary, 128 bytes, for Root Complex possibly 64) per spec 2.3.11. If the Request doesn’t cross such an alignment boundary, only a single Completion TLP is allowed. Multiple Memory Read Completions for a single Read Request must return data in increasing address order (which will be kept by the switching network).
And a last remark, citing the spec 2.2.7: Requests must not specify an Address/Length combination which causes a Memory Space access to cross a 4-KB boundary.
That’s it. I hope reading through the PCI Express specification will be easier now. There’s still a lot to read…
Questions & Comments
If you have a remark, would like to ask a question or discuss something, please post a new topic here. Posting is anonymous; no registration is required.
Down to the TLP: How PCI express devices talk (Part II)的更多相关文章
- Down to the TLP: How PCI express devices talk (Part I)
		
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1 Down to the TLP: How PCI ...
 - PCI Express(四) - The transaction layer
		
原文出处:http://www.fpga4fun.com/PCI-Express4.html 感觉没什么好翻译的,都比较简单,主要讲了TLP的帧结构 In the transaction layer, ...
 - Ubuntu 16.04 RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller” 不能上网
		
来源:http://forum.ubuntu.org.cn/viewtopic.php?f=116&t=463646 1.执行如下命令 uname -a sudo lspci -knn sud ...
 - [中英对照]How PCI Express Works | PCIe工作原理
		
How PCI Express Works | PCIe工作原理 PCI Express is a high-speed serial connection that operates more li ...
 - PCI Express
		
1.1课题研究背景 在目前高速发展的计算机平台上,应用软件的开发越来越依赖于硬件平台,尤其是随着大数据.云计算的提出,人们对计算机在各个领域的性能有更高的需求.日常生活中的视频和图像信息包含大量的数据 ...
 - 了解PCI Express的Posted传输与Non-Posted传输
		
0.写在前面 本文首发于公众号[两猿社],后续将在公众号内持续更新~ 其实算下来接触PCIe很久了,但是由于之前换工作,一直没有系统的学习和练手项目,现在新项目买了Synopsys的PCIe IP,总 ...
 - PCI Express(六) - Simple transactions
		
原文地址:http://www.fpga4fun.com/PCI-Express6.html Let's try to control LEDs from the PCI Express bus. X ...
 - PCI Express(五) - Xilinx wizard
		
原文地址:http://www.fpga4fun.com/PCI-Express5.html Xilinx makes using PCI express easy - they provide a ...
 - PCI Express(三) - A story of packets, stack and network
		
原文出处:http://www.fpga4fun.com/PCI-Express3.html Packetized transactions PCI express is a serial bus. ...
 
随机推荐
- 【安德鲁斯】基于脚本的数据库"增量更新",如果不改变,每次更新java代码、!
			
思维: 1.当然,它是基于SQLiteOpenHelper.onCreate(第一个呼叫建立).onUpdate(当所谓的升级计划) 2.用"脚本"(脚本制作详细方法问度娘)做数据 ...
 - 关于Core Location-ios定位
			
IOS中的core location提供了定位功能,能定位装置的当前坐标,同一时候能得到装置移动信息.由于对定位装置的轮询是非常耗电的,所以最好仅仅在非常必要的前提下启动. 当中,最重要的类是CLLo ...
 - android删除文件出错
			
当删除一个文件,再又一次下载这个同名文件,保存到sdcard时出现error,部分手机出现 Caused by: libcore.io.ErrnoException: open failed: EBU ...
 - datagrid标题头粗体
			
//标题头粗体 //$("#R_datagrid .datagrid-header-row td div span").each(function (i, th) { ...
 - 在Apk应用程序内,查找某个Activity。
			
转载请注明出处:http://blog.csdn.net/droyon/article/details/39933677 Intent intent = new Intent(Intent.ACTIO ...
 - List、Map和Set实现类
			
List.Map和Set实现类 1.List实现类 (1)ArrayList (2)Vector 2.Map实现类 (1)HashMap (2)Hashtable 3.Set实现类 (1)HashSe ...
 - node.js基础:数据存储
			
无服务器的数据存储 内存存储 var http = require('http'); var count = 0; //服务器访问次数存储在内存中 http.createServer(function ...
 - CF(427D-Match & Catch)后缀数组应用
			
题意:给两个字符串,求一个最短的子串.使得这个子串在两个字符串中出现的次数都等于1.出现的定义为:能够重叠的出现. 解法:后缀数组的应用.从小枚举长度.假设一个长度len合法的话:则一定存在这个样的s ...
 - 一个小的日常实践——高速Fibonacci数算法
			
上得厅堂.下得厨房.写得代码,翻得围墙,欢迎来到睿不可挡的每日一小练! 题目:高速Fibonacci数算法 内容:先说说Fibonacci数列,它的定义是数列:f1,f2....fn有例如以下规律: ...
 - [原创].NET 分布式架构开发实战五 Framework改进篇
			
原文:[原创].NET 分布式架构开发实战五 Framework改进篇 .NET 分布式架构开发实战五 Framework改进篇 前言:本来打算这篇文章来写DAL的重构的,现在计划有点改变.之前的文章 ...