Several techniques have been proposed to reduce the number of pipeline stages. We categorize them into two types based on their design philosophy: stage merging and stage bypassing. Stage merging combines multiple stages of the traditional design int…
BACKGROUND Many algorithms on a graphics processing unit (GPU) may benefit from doing a query in a hierarchical tree structure (including quad-trees, oct-trees, kd-trees, R-trees, and so forth). However, these trees can be very deep, whereby traversi…
A processor employing a post-cache (LS2) buffer. Loads are stored into the LS2buffer after probing the data cache. The load/store unit snoops the loads in the LS2 buffer against snoop requests received from an external bus. If a snoop invalidate requ…
A method and apparatus for preserving memory ordering in a cache coherent link based interconnect in light of partial and non-coherent memory accesses is herein described. In one embodiment, partial memory accesses, such as a partial read, is impleme…
Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-12-19 13:02:45 This blog is copied from: https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/ Deep learning neural ne…
Welcome back to what’s going to be the last “official” part of this series – I’ll do more GPU-related posts in the future, but this series is long enough already. We’ve been touring all the regular parts of the graphics pipeline, down to different le…
In this installment, I’ll be talking about the (early) Z pipeline and how it interacts with rasterization. Like the last part, the text won’t proceed in actual pipeline order; again, I’ll describe the underlying algorithms first, and then fill in the…
Hybrid transaction memory systems and accompanying methods. A transaction to be executed is received, and an initial attempt is made to execute the transaction in a hardware path. Upon a failure to successfully execute the transaction in the hardware…