In an embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.

FIELD OF THE INVENTION

[0001] Present invention relates to out of order processor architecture, specifically to read-after-write (RAW) bypass in the out of order processor.

DESCRIPTION OF THE RELATED ART

[0002] Generally, in out of order processors, when an instruction attempts to read a location that has been modified, it creates a condition called read-after-write (RAW). In most out of order processors, RAW condition is detected at the Store Queue boundary. Typically, the address (physical or virtual) of a store instruction is compared against the address (physical or virtual) of a load instruction. If a match is found, the data from store instruction is forwarded to the load instruction.

[0003] Typically, the RAW condition is detected before accessing the main memory for data in a cache (e.g., data cache unit or the like). The cache unit includes load and store queues. The load/store addresses are compared in the cache before accessing the main memory. Detecting the RAW condition late in the instruction pipeline (e.g., in the cache, before accessing the main memory or the like) degrades the performance of out of order processors. Further, it requires additional multiplexers to forward data from the store queue to the load instruction which complicates store queue design. Adding additional devices (e.g., multiplexers, comparator logic with each entry of store queue or the like) results in additional power dissipation in the out of order processors. A method and an apparatus are needed to detect the RAW condition earlier in the instruction pipeline.

SUMMARY

[0004] In one embodiment, a method and apparatus for detecting RAW condition earlier in an instruction pipeline is described. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation (e.g., immediate fields, register fields or the like) of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found between the 'load' and the 'store' instruction, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. The IDU transfers the instruction fields' pointers from the 'store' instruction to the 'load' instruction and the 'load' instruction is then scheduled for execution with newly assigned instruction fields. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU by comparing the actual address of the load instruction and the store instruction. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.

[0005] In some embodiment, a method for determining dependency of a load instruction is described. The method includes identifying one or more instruction fields of the load instruction, comparing the one or more instruction fields of the load instruction with one or more instruction field of one or more store instructions and determining whether the one or more instruction fields of the load instruction match with the one or more instruction fields of the one or more store instructions. According to some variations of the present invention, the comparing is done using one or more instruction field identifications of the one or more instruction fields. In some variation the comparing is done using contents of the one or more instruction fields.

[0006] In some variation, the method includes declaring the dependency of the load instruction on one of the one or more store instructions if the one or more instruction fields of the load instruction match with the one or more instruction fields of one of the one or more store instructions. In some variation, the method includes declaring the dependency of the load instruction on a most recently fetched store instruction from one of the one or more store instruction whose one or more instruction fields matched with one or more instruction fields of the load instruction if the one or more instruction fields of the load instruction match with the one or more instruction fields of more than one of the one or more store instructions. According to an embodiment of the present invention, one of the instruction fields is an immediate field. According to some variations, one of the instruction fields is a register field. In some variation, the identifying, comparing and determining is done during instruction decoding.

[0007] The method includes fetching a group of instructions. In some variation, the load instruction is one of the group of instructions. According to other variation the present invention, the one or more store instructions are from the group of instructions. According to an embodiment of the present invention, the one or more store instructions are stored in a store bypass buffer.

[0008] The method includes performing a data bypass between the load instruction and one of the one or more store instructions whose one or more instruction fields matched with one or more instruction fields of the load instruction. According to some embodiment of the present invention, performing the data bypass further comprises assigning the one or more instruction field identifications of the one or more instruction fields of the one of the one or more store instructions to the load instruction. The method includes storing the one or more store instructions in the store bypass buffer if the group of instructions includes the one or more store instructions. The method includes forwarding the load instruction for execution.

[0009] The method includes validating the dependency of the load instruction upon one of the one or more store instructions. In some variations, the validating comprises comparing the physical addresses of one or more instruction fields of the load instruction with physical address of the one or more instruction fields of the one or more store instructions. The method includes, if the contents of the one or more instruction fields of the load instruction do not match with the one or more instruction fields of one or more store instructions, requesting a re-fetch of the load instruction.

[0010] The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.

[0012]FIG. 1 illustrates an example of processor architecture according to an embodiment of the present invention.

[0013]FIG. 2 is a flow diagram illustrating an exemplary sequence of operations performed during a process detecting instruction dependency according to an embodiment of the present invention.

[0014]FIG. 3 is a flow diagram illustrating an exemplary sequence of operations performed when a 'dependent' load instruction is received according to an embodiment of the present invention.

[0015] The use of the same reference symbols in different drawings indicates similar or identical items.

DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

[0016] The following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention which is defined in the claims following the description.

[0017] In addition, the following detailed description has been divided into sections in order to highlight the invention described herein; however, those skilled in the art will appreciate that such sections are merely for illustrative focus, and that the invention herein disclosed typically draws its support from multiple sections. Consequently, it is to be understood that the division of the detailed description into separate sections is merely done as an aid to understanding and is in no way intended to be limiting.

[0018] Introduction

[0019] In some embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation (e.g., immediate fields, register fields or the like) of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found between the 'load' and the 'store' instruction, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. The IDU transfers the instruction fields' pointers from the 'store' instruction to the 'load' instruction and the 'load' instruction is then scheduled for execution with newly assigned instruction fields. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU by comparing the actual address of the load instruction and the store instruction. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.

DETAILED ARCHITECTURAL DESCRIPTION

[0020]FIG. 1 illustrates an example of a core 100 for an out of order processor according to an embodiment of the present invention. A system may include one or more cores such as core 100. Core 100 includes an Instruction Fetch Unit (IFU) 110. IFU 110 is coupled via a link 115 to an Instruction Decode Unit (ID U)120. IFU 110 fetches a group of instructions to be executed (e.g., in one cycle or the like) and forwards the group of instructions to IDU 120. IDU 120 decodes the group of instructions fetched by IFU 110. IDU 120 includes a Store Bypass Buffer (SBB) 125. SBB125 can be configured as any storage element (e.g., first-in-first-out, first-in-last-out or the like). The size of SBB 125 can be of any length (e.g., same as the number of instructions fetched in a group or the like). In the present example, SBB 125 is configured as first-in-first-out storage element.

[0021] IDU 120 identifies 'store' instructions from the group of instructions fetched by IFU 110 and stores them into SBB 125. IDU assigns a temporary scratch register for each one of the 'store' and 'load' instructions. According to an embodiment of the present invention, IDU assigns a temporary scratch register 'rd' to each 'store' instruction, extends the instruction fields of each load instruction using a temporary register and assigns a temporary register 'rs' to each one of 'load' instruction. The 'rs' field is used to identify the source register for data bypass using 'rd' field of the store instruction once a dependency between a 'store' and a 'load' instruction is declared valid. The use of temporary register fields is one of several means that can be used to bypass data between dependent instructions. Other means such as, indirect addressing, transfer of data pointers between instructions or the like can be used to bypass data between dependent instructions.

[0022] IDU 120 identifies 'load' instructions from the group of fetched instructions and compares the instruction fields of the 'load' instructions with the instruction fields of 'store' instructions within the group of fetched instructions and with 'store' instructions stored in SBB 125. When a match of instruction fields between a load instruction and a 'store' instruction is found, IDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and declares the dependency of the 'load' instruction upon the 'store' instruction by forwarding the 'rd' field of the matching 'store' on to the newly assigned 'rs' field of the 'load.' IDU 120 then forwards the group of fetched instructions via a link 127 to a Rename Issue Unit (RIU) 130. RIU 130 renames the instruction fields (e.g., the source registers of the instructions or the like), checks the dependencies of instructions and when instructions are ready to be issued, issues the instructions via a link 135 to an Execution Unit (EXU) 140. IDU 120 renames the destination registers of 'store' and 'load' instructions.

[0023] EXU 140 includes a Working Register File (WRF) 142 and an Architectural Register File (ARF) 145. WRF 142 and ARF 145 can be any storage elements. In the present example, ARF 145 includes temporary scratch registers (e.g., register 'rd' or the like). EXU140 executes instructions and stores the results into WRF 142. EXU 140 is coupled to a Commit Unit (CMU) 150 via a common link 160. Link 160 couples various elements of core 100 as shown in FIG. 1. CMU 150 monitors instructions and determines whether the instructions are ready to be committed. When an instruction is ready to be committed, CMU150 writes the associated results from WRF 142 into ARF 145. The functions of RIU 130, WRF 142, ARF 145 and CMU 150 are known in art. CMU 150 is coupled to a Data Cache Unit (DCU) 170 via a link 180. DCU 170 is further coupled to various elements of core 100 via a link 190 as shown in FIG. 1. DCU170 further includes a Load Queue (LDQ) 172 and a Store Queue (STQ) 175. LDQ 172 is responsible for managing load and store requests and STQ 175 is responsible for managing store requests. DCU 170 is coupled via a link 192 to a memory sub-system 195. The functions of DCU 170 are known in art. Conventionally, DCU 170 performs load/store bypass after comparing the physical addresses of load and store destinations.

[0024] Determining Instruction Dependency

[0025] Initially, in the first fetch cycle, IFU 110 fetches a group of instructions. For purposes of illustration, in the present example, IFU 110 fetches a group of three instructions. However, IFU 150 can fetch any number of instructions supported by the architecture of core 100. IDU 120 decodes the group of fetched instructions and determines whether there are any 'load' and 'store' instructions in the group of fetched instructions. If there are 'load' and 'store' instructions in the group of fetched instructions, IDU 120 compares the instruction fields of the 'load' instruction (e.g., fields used in address generation such as register field, immediate field or the like) with the instruction fields of the 'store' instruction in the fetch group and writes the 'store' instructions into SBB125. If a match is found between the instruction fields of the 'load' and a 'store' instruction, IDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and identifies the 'load' instruction as dependent upon the 'store' instruction. The dependency of 'load' instructions can be identified using various techniques known in art. If there are no 'load' instructions but 'store' instructions in the fetch group, IDU 120 stores those 'store' instructions in SBB125. If there are 'load' instructions but no 'store' instructions in the fetch group then IDU does not force any dependency as SSB 125 in this case is empty. In the present example, the size of SBB 125 is same as the size of the group of fetched instructions (e.g., three or the like). However, SBB 125 can be of any size supported by core 100 architecture.

[0026] IDU 120 then determines whether there are any 'load' instructions in the group of fetched instructions. If there are 'load' instructions in the group of fetched instructions, IDU 120 compares the instruction fields of the 'load' instructions (e.g., register field, immediate field or the like) with the instruction fields of the 'store' instructions in SBB 125. If a match is found between the instruction fields of the 'load' and a 'store' instruction, EDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and identifies the 'load' instruction as dependent upon the 'store' instruction. The dependency of 'load' instructions can be identified using various techniques known in art.

[0027] Typically, IDU 120 does not analyze the instruction fields of 'load' and 'store' instructions (e.g., actual contents of the instruction fields, physical addresses of the instruction fields or the like). Because the contents of the instruction fields are not analyzed, the dependency of the 'load' instruction is a 'speculation' by IDU 120. 'Speculating' a dependency in IDU 120 and performing a data bypass in RIU 130 and EXU 140 allows a younger dependent instruction to be issued sooner than the conventional execution in the out of order processors. Conventionally, the younger dependent instructions are not issued until the comparison of addresses and data bypass is done in DCU 170.

[0028] In the following fetch cycle, before storing 'store' instructions in SBB 125, IDU 120 first identifies and compares the instruction fields of 'load' instructions with 'store' instructions in the group of fetched instructions and then with the instruction fields of 'store' instructions stored in SBB 125. Thus, in every following fetch cycle, load instructions are compared against 'store' instructions of incoming fetch group and previous fetch group in SBB 125.

[0029] Once IDU 120 identifies a 'load' instruction as dependent upon a 'store' instruction, IDU 120 forwards the assigned 'rd' field of the 'store' instruction on to the newly assigned instruction field 'rs' of the 'load' instruction and the load instruction is executed using the newly assigned instruction fields. The data between the 'load' instruction and the 'store' instruction is by-passed in RIU 130and EXU 140 using 'rd' and 'rs' register fields. Upon receiving the 'dependent' 'load' instruction, DCU 170 determines whether the dependency of the 'load' instruction was validly 'speculated' by IDU 120. The determination of dependency validity can be made using various techniques known in art (e.g., by comparing the physical addresses or the like). If the dependency of the 'load' instruction was validly 'speculated' by IDU 120, DCU 170 maintains the dependency of the 'load' instruction. If DCU 170 determines that dependency of the 'load' instruction was not validly 'speculated' by IDU 120 then DCU 170 forces a re-fetch of the 'load' instruction.

[0030] When IDU 120 identifies a 'load' instruction having its instruction fields (e.g., register, immediate, or the like) common with more than one 'store' instructions (i.e. multiple dependencies), IDU 120 forces the 'load' instruction to be dependent on the most recently fetched ('youngest') 'store' instruction prior to this 'load'. When a 'load' instruction is dependent upon a 'store' instruction that is not part of the group of instructions fetched by IFU 110 or stored in SBB 125 then the dependency is identified by DCU 170 using conventional means.

[0031]FIG. 2 is a flow diagram illustrating an exemplary sequence of operations performed during a process of detecting instruction dependency in an instruction decoding unit (e.g., IDU 120) according to an embodiment of the present invention. While the operations are described in a particular order, the operations described herein can be performed in other sequential orders (or in parallel) as long as dependencies between operations allow. In general, a particular sequence of operations is a matter of design choice and a variety of sequences can be appreciated by persons of skill in art based on the description herein.

[0032] Initially, the process identifies 'load' instructions from a group of instructions fetched by an instruction fetch unit (e.g., IFU 110 or the like) (205). The process then first compares the instruction fields of 'load' instructions (e.g., immediate field, register field or the like) with one or more 'store' instructions within the group of fetched instructions, if any (210). The process then compares the instruction fields of 'load' instructions with the instruction fields of 'store' instructions stored in a 'store' instruction buffer (e.g., SBB 125) (215).

[0033] The process then determines whether there was a match between the instruction fields of 'load' instructions and one or more 'store' instructions (220). If there was no match between the instruction fields of 'load' instructions and 'store' instructions, the process proceeds to store the 'store' instructions in the store bypass buffer (240). If there was match between the instruction fields of 'load' instructions and 'store' instructions, the process determines whether the instruction fields of the 'load' instructions matched with the instruction fields of more than one 'store' instructions within the group of fetched instructions and 'store' instructions stored in the store bypass buffer (225). If the instruction fields of the 'load' instructions matched with instruction fields of more than one 'store' instructions within the group of fetched instructions and 'store' instructions stored in the store bypass buffer, the process forces the 'load' instruction to be dependent on the 'youngest' 'store' instruction prior to this 'load' (230). The process then proceeds to store the 'store' instructions in the store bypass buffer (240).

[0034] If the instruction fields of the 'load' instruction did not match with instruction fields of more than one 'store' instructions from the group of fetched instructions and 'store' instructions stored in the store bypass buffer, the process 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and 'forces' a dependency of the 'load' instruction upon the 'store' instruction that matched the instruction fields of the 'load' instruction (235). According to one embodiment, after forcing the dependency, the process forwards the 'rd' field of the 'store' instruction to the newly assigned 'rs' field of the 'load' instruction.

[0035] The process stores the 'store' instructions, if any, from the group of fetched instructions, into the store bypass buffer (step240). The process then forwards the instruction for execution (e.g., forwarding the instruction to RIU 130 or the like) (250). The process executes the 'load' instruction using the newly assigned instruction fields. The process then performs the data bypass for the 'load' instruction (260). The data bypass can be performed using various techniques known in art.

[0036]FIG. 3 is a flow diagram illustrating an exemplary sequence of operations performed when a 'dependent load' instruction is received according to an embodiment of the present invention. The steps described herein can be performed in any order (e.g., sequentially, parallel or the like). Initially, the process receives a 'dependent' load instruction (e.g., at DCU170 or the like) (310). The dependency of the load instruction can be determined using the process such as the one described in FIG. 2. The process then determines whether the dependency of the 'load' instruction is valid (320). The validity of the dependencies can be determined using various techniques known in art. According to an embodiment of the present invention, the process determines the validity of the dependency by comparing the contents of instruction fields of the 'load' instructions and the 'store' instruction (e.g., actual comparison of immediate instruction fields, physical addresses or the like).

[0037] If dependency of the 'load' instruction is valid, the process completes the execution of instructions (330). If the dependency of the 'load' instruction is not valid, the process forces a re-fetch of the instruction (340). When the process forces a re-fetch, the 'load' instruction is fetched again (e.g., by IFU 110 or the like) and is processed again according to the instruction execution process.

SRC=https://www.google.com.hk/patents/US20040044881

Method and system for early speculative store-load bypass的更多相关文章

  1. Speculative store buffer

    A speculative store buffer is speculatively updated in response to speculative store memory operatio ...

  2. Failed to issue method call: Unit httpd.service failed to load: No such file or directory.

    centos7修改httpd.service后运行systemctl restart httpd.service提示 Failed to issue method call: Unit httpd.s ...

  3. 关于.ToList(): LINQ to Entities does not recognize the method ‘xxx’ method, and this method cannot be translated into a store expression.

    LINQ to Entities works by translating LINQ queries to SQL queries, then executing the resulting quer ...

  4. Failed to issue method call: Unit mysql.service failed to load: No such file or directory解决的方式

    Failed to issue method call: Unit mysql.service failed to load: No such file or directory解决的方式 作者:ch ...

  5. PatentTips - Method and system for browsing things of internet of things on ip using web platform

    BACKGROUND The following disclosure relates to a method and system for enabling a user to browse phy ...

  6. Ext.store.load callback

    var paramsReceivable = {};                paramsReceivable.querytext = Ext.getCmp('hiddquerytext').g ...

  7. Ext JS 5 关于Store load返回json错误信息或异常的处理

    关于在store load的时候服务器返回错误信息或服务器出错的处理.ExtJS4应该也能用,没测试过. 这里有两种情况: 服务器返回错误json错误信息,状态为200 服务器异常,状态为500 一. ...

  8. Method and system for providing security policy for linux-based security operating system

    A system for providing security policy for a Linux-based security operating system, which includes a ...

  9. Method and system for public-key-based secure authentication to distributed legacy applications

    A method, a system, an apparatus, and a computer program product are presented for an authentication ...

随机推荐

  1. iframe及其引出的页面跳转问题

    前提:在前一段的工作中碰到了一些页面跳转,子页面跳到父页面上的等等问题,当时页面总是跳不对,或者跳错,要不就是不需要重新打开窗口,却又重新打开一个了,特此搜寻网上各大博客论坛,加上项目经验整理一篇文章 ...

  2. Deprecated: Assigning the return value of new by reference is deprecated in报错

    出现了Deprecated: Assigning the return value of new by reference is deprecated in wwwroot\common.inc.ph ...

  3. js图片轮播效果常见的产品无缝轮播

    <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <m ...

  4. EF为什么向我的数据库再次插入已有对象?(ZT)

    最近做了个多对多对实体对象,结果发现每次只要增加一个子实体,就会自动添加一个父实体进去,而不管该父实体是否已经存在. 找了好久,终于找到这篇文章,照文章内容来看,应该是断开连接导致的. 原文地址:ht ...

  5. uva1442 Cav

    连通器向左向右扫描两次即可每一段有水的连通区域,高度必须相同,且不超过最低天花板高度if(p[i] > level) level = p[i]; 被隔断,要上升(隔断后,之前的就不变了,之后的从 ...

  6. CNN完成mnist数据集手写数字识别

    # coding: utf-8 import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data d ...

  7. GetForgroundWindow函数的不确定性——BUG笔记

    HWND GetForgoundWindows() 获取当前前置窗口在windows 7和windows 10下虚拟桌面切换后表现不同. 所以强烈不建议使用此函数!

  8. Error:Failed to resolve: com.afollestad:material-dialogs:

    http://www.chenruixuan.com/archives/1068.html 背景: 同事把Android项目直接考给了我...我在Android Studio上运行,然后提示: Err ...

  9. bzoj 1098 [POI2007] 办公楼 biu

    # 解题思路 画画图可以发现,只要是两个点之间没有相互连边,那么就必须将这两个人安排到同一个办公楼内,如图所示: 那,我们可以建立补图,就是先建一张完全图,然后把题目中给出的边都删掉,这就是一张补图, ...

  10. [JOYOI] 1071 LCIS

    拖了好久的LCIS f[i][j]表示a串前i个,b串以b[j]结尾的LCIS长度. 转移时考虑a[i]和b[j]是否相等,如果不等: 那么既然是以j结尾,说明a串前i-1位有一个字符和b匹配了,所以 ...