前面几个blog对DRM驱动、显卡的显存管理机制、中断机制都进行了一些描述,现在阅读AMD drm驱动的初始化过程应该会轻松许多。

  下面是一AMD的开发人员编写的文章(先暂时放在这里,后续有时间再添加自己的看法)。

Understanding GPUs from the ground up

I get asked a lot about learning how to program GPUs.  Bringing up evergreen kms support seems like a good place to start, so I figured I write a series of articles detailing the process based on the actual evergreen patches.  First, to get a better understanding of how GPUs work, take a look at the radeon drm.  This article assumes a basic understanding of C and computer architectures.  The basic process is that the driver loads, initializes the hardware, sets up non-hw specific things like the memory manager, and sets up the displays.  This first article describes the basic driver flow when the drm loads in kms mode.

radeon_driver_load_kms() (in radeon_kms.c) is where everything starts.  It calls radeon_device_init() to initialize the non-display hardware and radeon_modeset_init() (in radeon_display.c) to initialize the display hardware.

The main workhorse of the driver initialization is radeon_device_init() found in radeon_device.c.  First we initialize a bunch of the structs used in the driver.  Then radeon_asic_init() is called. This function sets up the asic specific function pointers for various things such as suspend/resume callbacks, asic reset, set/process irqs, set/get engine clocks, etc.  The common code then uses these callbacks to call the asic specific code to achieve the requested functionality.  For example, enabling and processing interrupts works differently on a RV100 vs. a RV770.  Since functionality changes in stages, some routines are used for multiple asic families.  This lets us mix and match the appropriate functions for the specifics of how the chip is programmed.  For example, both R1xx and R3xx chips both use the same interrupt scheme (as defined in r100_irq_set()/r100_irq_process()), but they have different initialization routines (r100_init() vs. r300_init()).

Next we set up the DMA masks for the driver.  These let the kernel know what size address space the the card is able to address.  In the case of radeons, it’s used for GPU access to graphics buffers stored in system memory which are accessed via a GART (Graphics Address Remapping Table).  AGP and the older on-chip GART mechanisms are limited to 32 bits.  Newer on-chip GART mechanisms have larger address spaces.

After DMA masks, we set up the MMIO aperture.  PCI/PCIE/AGP devices are programmed via apertures called BARs (Base Address Register).  There apertures provide access to resources on the card such as registers, framebuffers, and roms.  GPUs are configured via registers, if you want to access those registers, you’d map the register BAR.  If you want to write to the framebuffer (some of which may be displayed on your screen), you would map the framebuffer BAR.  In this case we map the register BAR; this register mapping is then used by the driver to configure the card.

vga_client_register() comes next, and is beyond the scope of this article.  It’s basically a way to work around the limitations of VGA on PCI buses with multiple VGA devices.

Next up is radeon_init().  This is actually a macro defined in radeon.h that references the asic init callback we initialized in  radeon_asic_init() several steps ago.  The asic specific init function is called.  For an RV100, it would be r100_init() defined in r100.c, for RV770, it’s rv770_init().

That’s pretty much it for  radeon_device_init().  Next let’s look at what happens in the asic specific init functions.  They all follow the same pattern, although some asics may do more or less depending on the functionality.  Let’s take a look at r100_init() in r100.c.  First we initialize debugfs; this is a kernel debugging framework and outside the scope of this article.  Next we call r100_vga_render_disable() this disables the VGA engine on the card.  The VGA engine provides VGA compatibility; since we are going to be programming the card directly, we disable it.

Following that, we set up the GPU scratch registers (radeon_scratch_init() defined in radeon_device.c).  These are scratch registers used by the CP (Command Processor) to to signal graphics events.  In general they are used for what we call fences.  A write to one of these scratch registers can be added to the command stream sent to the GPU.  When it encounters that command, it writes the value specified to that scratch register.  The driver can then check the value of the scratch register to determine whether that fence has come up or not.  For example, if you want to know if the GPU is done rendering to a buffer, you’d insert a fence after the rendering commands.  You can then check the scratch register to determine if that fence has passed (and hence the rendering is done).

radeon_get_bios() loads the video bios from the PCI ROM BAR.  The video bios contains data and command tables.  The data tables define things like the number and type of connectors on the card and how those connectors are mapped to encoders, the GPIO registers and bitfields used for DDC and other i2c buses, LVDS panel information for laptops, display and engine PLL limits, etc.  The command tables are used for initializing the hardware (normally done by the system bios during post, but required for things like suspend/resume and initializing secondary cards), and on systems with ATOM bios the command tables are used for setting up the displays and changing things like engine and memory clocks.

Next, we initialize the bios scratch registers (radeon_combios_initialize_bios_scratch_regs() via radeon_combios_init()).  These registers are a way for the firmware on the system to communicate state to the graphics driver.  They contain things like connected outputs, whether the driver or the firmware will handle things like lid or mode change events, etc.

radeon_boot_test_post_card() checks to see whether the system bios has posted the card or not.  This is used to determine whether the card needs to be initialized by the driver using the bios command tables or if the system bios as already done it.

radeon_get_clock_info() gets the PLL (Phase Locked Loop, used to generate clocks) information from the bios tables.  This includes the display PLLs, engine and memory PLLs and the reference clock that the PLLs use to generate their final clocks.

radeon_pm_init() initializes the power management features of the chip.

Next the MC (Memory Controller) is initialized (r100_mc_init()).  The GPU has it’s own address space similar to the CPU.  Within that address space you map VRAM and GART.  The blocks on the chip (2D, 3D engines, display controllers, etc.) access these resources via the GPU’s address space.  VRAM is mapped at one offset and GART at another.  If you want to read from a texture located in GART memory, you’d point the texture base address at some offset in the GART aperture in the GPU’s address space.  If you want to display a buffer in VRAM on your monitor, you’d point one of your crtc base addresses to an address in the VRAM aperture in the GPU’s address space.  The MC init function determines how much VRAM is on the card where to place VRAM and GART in the GPU’s address space.

radeon_fence_driver_init() initializes the common code used for fences.  See above for more on fences.

radeon_irq_kms_init() initializes the common code used for irqs.

radeon_bo_init() initializes the memory manager.

r100_pci_gart_init() sets up the on board GART mechanism and radeon_agp_init() initializes AGP GART.  This allows the GPU to access buffers in system memory.  Since system memory is paged, large allocations are not contiguous.  The GART provides a way to make many disparate pages look like one contiguous block by using address remapping.  With AGP, the northbridge provides the the address remapping, and you just point the GPU’s AGP aperture at the one provided by the northbridge.  The on-board GART provides the same functionality for non-AGP systems (PCI or PCIE).

Next up we have  r100_set_safe_registers().  This function sets the list of registers that command buffers from userspace are allowed to access.  When a userspace driver like the ddx (2D) or mesa (3D) sends commands to the GPU, the drm checks those command buffers to prevent access to unauthorized registers or memory.

Finally, r100_startup() programs the hardware with everything set up in r100_init().  It’s a separate function since it’s also called when resuming from suspend as the current hardware configuration needs to be restored in that case as well.  The VRAM and GART setup is programmed in r100_mc_program() and r100_pci_gart_enable(); irqs are setup in r100_irq_set().

r100_cp_init() initializes the CP and sets up the ring buffer.  The CP is the part of the chip that feeds acceleration commands to the GPU.  It’s fed by a ring buffer that the driver (CPU) writes to and the GPU reads from.  Besides commands, you can also write pointers to command buffers stored elsewhere in the GPU’s address space (called an indirect buffer).  For example, the 3D driver might send a command buffer to the drm; after checking it, the drm would put a pointer to that command buffer on the ring, followed by a fence.  When the CP gets to the pointer in the ring, it fetches the command buffer and processes the commands in it, then returns to where it left off in the ring.  Buffers referenced by the command buffer are “locked”until the fence passes since the GPU is accessing them in the execution of those commands.

r100_wb_init() initializes scratch register writeback which is a feature that lets the GPU update copies of the scratch registers in GART memory.  This allows the driver (running on the CPU) to access the content of those registers without having to read them from the MMIO register aperture which requires a trip across the bus.

r100_ib_init initializes the indirect buffers used for feeding command buffers to the CP from userspace drivers like the 3D driver.

The display side is set up in  radeon_modeset_init().  First we set up the display limits and mode callbacks, then we set up the output properties (radeon_modeset_create_props()) that are exposed via xrandr properties when X is running.

Next, we initialize the crtcs in radeon_crtc_init().  crtcs (also called display controllers) are the blocks on the chip that provide the display timing and determine where in the framebuffer a particular monitor points to.  A crtc provides an independent “head.”  Most radeon asics have two crtcs; the new evergreen chips have six.

radeon_setup_enc_conn() sets up the connector and encoder mappings based on video bios data tables.  Encoders are things like DACs for analog outputs like VGA and TV, and TMDS or LVDS encoders for things like digital DVI or LVDS panels.  An encoder can be tied to one or more connectors (e.g., the TV DAC is often tied to both the S-video and a VGA port or the analog portion of a DVI-I port).  The mapping is important as you need to know what encoders are in use and what they are tied to in order to program the displays properly.

radeon_hpd_init() is a macro that points to the asic specific function to initializes the HPD (Hot Plug Detect) hardware for digital monitors. HPD allows you to get an interrupt when a digital monitor is connected or disconnected.  When this happens the driver will take appropriate action and generate an event which userspace apps can listen for.  The app can then display a message asking the user what they want to do, etc.

Finally,  radeon_fbdev_init() sets up the drm kernel fb interface.  This provides a kernel fb interface on top of the drm for the console or other kernel fb apps.

When the driver is unloaded the whole process happens in reverse; this time all the *_fini() functions are called to tear down the driver.

The next set of articles will walk through the evergreen patches available here which have already been applied upstream and explain what each patch does to bring up support for evergreen chips.

【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程的更多相关文章

  1. 【原创】Linux环境下的图形系统和AMD R600显卡编程(1)——Linux环境下的图形系统简介

    Linux/Unix环境下最早的图形系统是Xorg图形系统,Xorg图形系统通过扩展的方式以适应显卡和桌面图形发展的需要,然而随着软硬件的发展,特别是嵌入式系统的发展,Xorg显得庞大而落后.开源社区 ...

  2. Linux环境下的图形系统和AMD R600显卡编程(1)——Linux环境下的图形系统简介

    转:https://www.cnblogs.com/shoemaker/p/linux_graphics01.html Linux/Unix环境下最早的图形系统是Xorg图形系统,Xorg图形系统通过 ...

  3. 【原创】Linux环境下的图形系统和AMD R600显卡编程(2)——Framebuffer、DRM、EXA和Mesa简介【转】

    转自:http://www.cnblogs.com/shoemaker/p/linux_graphics02.html 1. Framebuffer Framebuffer驱动提供基本的显示,fram ...

  4. 【原创】Linux环境下的图形系统和AMD R600显卡编程(5)——AMD显卡显命令处理机制

    通常通过读写设备寄存器对设备进行编程,在X86系统上,有专门的IO指令进行编程,在其他诸如MIPS.SPARC这类系统上,通过将设备的寄存器映射到内存地址空间直接使用读写内存的方式对设备进行编程. R ...

  5. 【原创】Linux环境下的图形系统和AMD R600显卡编程(3)——AMD显卡简介

    早期的显卡仅用于显示,后来显卡中加入了2D加速部件,这些部件用于做拷屏,画点,画线等操作.随着游戏.三维模拟以及科学计算可视化等需要,对3D的需求逐渐增加,早期图形绘制工作由CPU来完成,要达到真实感 ...

  6. 【原创】Linux环境下的图形系统和AMD R600显卡编程(9)——R600显卡的3D引擎和图形流水线

    1. R600 3D引擎 R600核心是AMD一款非常重要的GPU核心,这个核心引入了统一处理器架构,其寄存器和指令集同以前的GPU 都完全不同,对其编程也有比较大的区别. 图1显示了R600 GPU ...

  7. 【原创】Linux环境下的图形系统和AMD R600显卡编程(10)——R600显卡的3D引擎编程

    3D图形处理流水线需要流经多个硬件单元才能得到最后的渲染结果,流水线上的所有的硬件单元必须被正确编程,才能得到正确的结果. 总体上看,从图形处理流水线的源头开始,需要准备好vertex和index,在 ...

  8. 【原创】Linux环境下的图形系统和AMD R600显卡编程(7)——AMD显卡的软件中断

    CPU上处理的中断可以分成“硬件中断”和“软件中断”两类,比如网卡产生的中断称为硬件中断,而如果是软件使用诸如"int 0x10"(X86平台上)这样的指令产生中断称为软件中断,硬 ...

  9. Linux环境下的图形系统和AMD R600显卡编程(2)——Framebuffer、DRM、EXA和Mesa简介

    转:https://www.cnblogs.com/shoemaker/p/linux_graphics02.html 1. Framebuffer Framebuffer驱动提供基本的显示,fram ...

  10. 【原创】Linux环境下的图形系统和AMD R600显卡编程(6)——AMD显卡GPU命令格式

    前面一篇blog里面描述了命令环缓冲区机制,在命令环机制下,驱动写入PM4(不知道为何会取这样一个名字)包格式的命令对显卡进行配置.这一篇blog将详细介绍命令包的格式. 当前定义了4中命令包,分别是 ...

随机推荐

  1. POJ:3273-Monthly Expense

    Monthly Expense Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 32067 Accepted: 12081 Des ...

  2. CodeForces 547E Mike and Friends AC自动机 主席树

    题意: 给出\(n\)个字符串\(s_i\)和\(q\)个询问: \(l,r,k\):\(\sum\limits_{i=l}^{r}count(i, k)\),其中\(count(i,j)\)表示\( ...

  3. Trident整合Kafka

    首先编写一个打印函数KafkaPrintFunction import org.apache.storm.trident.operation.BaseFunction; import org.apac ...

  4. Android学习记录(7)—Intent中显示意图和隐式意图的用法

    Intent(意图)主要是解决Android应用的各项组件之间的通讯. Intent负责对应用中一次操作的动作.动作涉及数据.附加数据进行描述,Android则根据此Intent的描述,负责找到对应的 ...

  5. 运用Pascal来破坏DLL的一个实例

    运用Pascal来破坏DLL文件的一个实例 关于Pascal静态调用和动态的调用DLL的学习您可以看Delphi/Lazarus栏目. Uses Dos; {调用DOS库} Const Root='C ...

  6. IIS Express mime type 列表。

    C:\Users\Administrator\Documents\IISExpress\config\applicationhost.config -------------------------- ...

  7. js学习日记-各种宽高总结(配图)

    1.窗口和浏览器 window.innerWidth.window.innerHeight   浏览器内部可用宽高 window.outerWidth.window.outerHeight   浏览器 ...

  8. selenium + python之元素定位

    selenium对web各元素的操作首先就要先定位元素,定位元素的方法主要有以下几种:通过id定位元素:find_element_by_id("id_vaule")通过name定位 ...

  9. mongoDB坑

    1 mongodb.cnf文件中有个选项为bind_id:127.0.0.1,如果是测试环境,需要远程访问的话,就先改成0.0.0.1 auth:如果只是学习的话,建议先改成false,否则后面会有各 ...

  10. [类和对象]3 C++面向对象模型初探

    ? C++编译器如何完成面向对象理论到计算机程序的转化? [C++编译器是如何管理类.对象.类和对象之间的关系] 通过下面的代码,我们可以的得出:C++类对象中的成员变量和成员函数是分开存储的 成员变 ...