【原创】Linux环境下的图形系统和AMD R600显卡编程(8)—

　　前面几个blog对DRM驱动、显卡的显存管理机制、中断机制都进行了一些描述，现在阅读AMD drm驱动的初始化过程应该会轻松许多。

　　下面是一AMD的开发人员编写的文章（先暂时放在这里，后续有时间再添加自己的看法）。

Understanding GPUs from the ground up

I get asked a lot about learning how to program GPUs. Bringing up evergreen kms support seems like a good place to start, so I figured I write a series of articles detailing the process based on the actual evergreen patches. First, to get a better understanding of how GPUs work, take a look at the radeon drm. This article assumes a basic understanding of C and computer architectures. The basic process is that the driver loads, initializes the hardware, sets up non-hw specific things like the memory manager, and sets up the displays. This first article describes the basic driver flow when the drm loads in kms mode.

radeon_driver_load_kms() (in radeon_kms.c) is where everything starts. It calls radeon_device_init() to initialize the non-display hardware and radeon_modeset_init() (in radeon_display.c) to initialize the display hardware.

The main workhorse of the driver initialization is radeon_device_init() found in radeon_device.c. First we initialize a bunch of the structs used in the driver. Then radeon_asic_init() is called. This function sets up the asic specific function pointers for various things such as suspend/resume callbacks, asic reset, set/process irqs, set/get engine clocks, etc. The common code then uses these callbacks to call the asic specific code to achieve the requested functionality. For example, enabling and processing interrupts works differently on a RV100 vs. a RV770. Since functionality changes in stages, some routines are used for multiple asic families. This lets us mix and match the appropriate functions for the specifics of how the chip is programmed. For example, both R1xx and R3xx chips both use the same interrupt scheme (as defined in r100_irq_set()/r100_irq_process()), but they have different initialization routines (r100_init() vs. r300_init()).

Next we set up the DMA masks for the driver. These let the kernel know what size address space the the card is able to address. In the case of radeons, it’s used for GPU access to graphics buffers stored in system memory which are accessed via a GART (Graphics Address Remapping Table). AGP and the older on-chip GART mechanisms are limited to 32 bits. Newer on-chip GART mechanisms have larger address spaces.

After DMA masks, we set up the MMIO aperture. PCI/PCIE/AGP devices are programmed via apertures called BARs (Base Address Register). There apertures provide access to resources on the card such as registers, framebuffers, and roms. GPUs are configured via registers, if you want to access those registers, you’d map the register BAR. If you want to write to the framebuffer (some of which may be displayed on your screen), you would map the framebuffer BAR. In this case we map the register BAR; this register mapping is then used by the driver to configure the card.

vga_client_register() comes next, and is beyond the scope of this article. It’s basically a way to work around the limitations of VGA on PCI buses with multiple VGA devices.

Next up is radeon_init(). This is actually a macro defined in radeon.h that references the asic init callback we initialized in radeon_asic_init() several steps ago. The asic specific init function is called. For an RV100, it would be r100_init() defined in r100.c, for RV770, it’s rv770_init().

That’s pretty much it for radeon_device_init(). Next let’s look at what happens in the asic specific init functions. They all follow the same pattern, although some asics may do more or less depending on the functionality. Let’s take a look at r100_init() in r100.c. First we initialize debugfs; this is a kernel debugging framework and outside the scope of this article. Next we call r100_vga_render_disable() this disables the VGA engine on the card. The VGA engine provides VGA compatibility; since we are going to be programming the card directly, we disable it.

Following that, we set up the GPU scratch registers (radeon_scratch_init() defined in radeon_device.c). These are scratch registers used by the CP (Command Processor) to to signal graphics events. In general they are used for what we call fences. A write to one of these scratch registers can be added to the command stream sent to the GPU. When it encounters that command, it writes the value specified to that scratch register. The driver can then check the value of the scratch register to determine whether that fence has come up or not. For example, if you want to know if the GPU is done rendering to a buffer, you’d insert a fence after the rendering commands. You can then check the scratch register to determine if that fence has passed (and hence the rendering is done).

radeon_get_bios() loads the video bios from the PCI ROM BAR. The video bios contains data and command tables. The data tables define things like the number and type of connectors on the card and how those connectors are mapped to encoders, the GPIO registers and bitfields used for DDC and other i2c buses, LVDS panel information for laptops, display and engine PLL limits, etc. The command tables are used for initializing the hardware (normally done by the system bios during post, but required for things like suspend/resume and initializing secondary cards), and on systems with ATOM bios the command tables are used for setting up the displays and changing things like engine and memory clocks.

Next, we initialize the bios scratch registers (radeon_combios_initialize_bios_scratch_regs() via radeon_combios_init()). These registers are a way for the firmware on the system to communicate state to the graphics driver. They contain things like connected outputs, whether the driver or the firmware will handle things like lid or mode change events, etc.

radeon_boot_test_post_card() checks to see whether the system bios has posted the card or not. This is used to determine whether the card needs to be initialized by the driver using the bios command tables or if the system bios as already done it.

radeon_get_clock_info() gets the PLL (Phase Locked Loop, used to generate clocks) information from the bios tables. This includes the display PLLs, engine and memory PLLs and the reference clock that the PLLs use to generate their final clocks.

radeon_pm_init() initializes the power management features of the chip.

Next the MC (Memory Controller) is initialized (r100_mc_init()). The GPU has it’s own address space similar to the CPU. Within that address space you map VRAM and GART. The blocks on the chip (2D, 3D engines, display controllers, etc.) access these resources via the GPU’s address space. VRAM is mapped at one offset and GART at another. If you want to read from a texture located in GART memory, you’d point the texture base address at some offset in the GART aperture in the GPU’s address space. If you want to display a buffer in VRAM on your monitor, you’d point one of your crtc base addresses to an address in the VRAM aperture in the GPU’s address space. The MC init function determines how much VRAM is on the card where to place VRAM and GART in the GPU’s address space.

radeon_fence_driver_init() initializes the common code used for fences. See above for more on fences.

radeon_irq_kms_init() initializes the common code used for irqs.

radeon_bo_init() initializes the memory manager.

r100_pci_gart_init() sets up the on board GART mechanism and radeon_agp_init() initializes AGP GART. This allows the GPU to access buffers in system memory. Since system memory is paged, large allocations are not contiguous. The GART provides a way to make many disparate pages look like one contiguous block by using address remapping. With AGP, the northbridge provides the the address remapping, and you just point the GPU’s AGP aperture at the one provided by the northbridge. The on-board GART provides the same functionality for non-AGP systems (PCI or PCIE).

Next up we have r100_set_safe_registers(). This function sets the list of registers that command buffers from userspace are allowed to access. When a userspace driver like the ddx (2D) or mesa (3D) sends commands to the GPU, the drm checks those command buffers to prevent access to unauthorized registers or memory.

Finally, r100_startup() programs the hardware with everything set up in r100_init(). It’s a separate function since it’s also called when resuming from suspend as the current hardware configuration needs to be restored in that case as well. The VRAM and GART setup is programmed in r100_mc_program() and r100_pci_gart_enable(); irqs are setup in r100_irq_set().

r100_cp_init() initializes the CP and sets up the ring buffer. The CP is the part of the chip that feeds acceleration commands to the GPU. It’s fed by a ring buffer that the driver (CPU) writes to and the GPU reads from. Besides commands, you can also write pointers to command buffers stored elsewhere in the GPU’s address space (called an indirect buffer). For example, the 3D driver might send a command buffer to the drm; after checking it, the drm would put a pointer to that command buffer on the ring, followed by a fence. When the CP gets to the pointer in the ring, it fetches the command buffer and processes the commands in it, then returns to where it left off in the ring. Buffers referenced by the command buffer are “locked”until the fence passes since the GPU is accessing them in the execution of those commands.

r100_wb_init() initializes scratch register writeback which is a feature that lets the GPU update copies of the scratch registers in GART memory. This allows the driver (running on the CPU) to access the content of those registers without having to read them from the MMIO register aperture which requires a trip across the bus.

r100_ib_init initializes the indirect buffers used for feeding command buffers to the CP from userspace drivers like the 3D driver.

The display side is set up in radeon_modeset_init(). First we set up the display limits and mode callbacks, then we set up the output properties (radeon_modeset_create_props()) that are exposed via xrandr properties when X is running.

Next, we initialize the crtcs in radeon_crtc_init(). crtcs (also called display controllers) are the blocks on the chip that provide the display timing and determine where in the framebuffer a particular monitor points to. A crtc provides an independent “head.” Most radeon asics have two crtcs; the new evergreen chips have six.

radeon_setup_enc_conn() sets up the connector and encoder mappings based on video bios data tables. Encoders are things like DACs for analog outputs like VGA and TV, and TMDS or LVDS encoders for things like digital DVI or LVDS panels. An encoder can be tied to one or more connectors (e.g., the TV DAC is often tied to both the S-video and a VGA port or the analog portion of a DVI-I port). The mapping is important as you need to know what encoders are in use and what they are tied to in order to program the displays properly.

radeon_hpd_init() is a macro that points to the asic specific function to initializes the HPD (Hot Plug Detect) hardware for digital monitors. HPD allows you to get an interrupt when a digital monitor is connected or disconnected. When this happens the driver will take appropriate action and generate an event which userspace apps can listen for. The app can then display a message asking the user what they want to do, etc.

Finally, radeon_fbdev_init() sets up the drm kernel fb interface. This provides a kernel fb interface on top of the drm for the console or other kernel fb apps.

When the driver is unloaded the whole process happens in reverse; this time all the *_fini() functions are called to tear down the driver.

The next set of articles will walk through the evergreen patches available here which have already been applied upstream and explain what each patch does to bring up support for evergreen chips.

【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程的更多相关文章

【原创】Linux环境下的图形系统和AMD R600显卡编程(1)——Linux环境下的图形系统简介
Linux/Unix环境下最早的图形系统是Xorg图形系统,Xorg图形系统通过扩展的方式以适应显卡和桌面图形发展的需要,然而随着软硬件的发展,特别是嵌入式系统的发展,Xorg显得庞大而落后.开源社区 ...
Linux环境下的图形系统和AMD R600显卡编程(1)——Linux环境下的图形系统简介
转:https://www.cnblogs.com/shoemaker/p/linux_graphics01.html Linux/Unix环境下最早的图形系统是Xorg图形系统,Xorg图形系统通过 ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(2)——Framebuffer、DRM、EXA和Mesa简介【转】
转自:http://www.cnblogs.com/shoemaker/p/linux_graphics02.html 1. Framebuffer Framebuffer驱动提供基本的显示,fram ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(5)——AMD显卡显命令处理机制
通常通过读写设备寄存器对设备进行编程,在X86系统上,有专门的IO指令进行编程,在其他诸如MIPS.SPARC这类系统上,通过将设备的寄存器映射到内存地址空间直接使用读写内存的方式对设备进行编程. R ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(3)——AMD显卡简介
早期的显卡仅用于显示,后来显卡中加入了2D加速部件,这些部件用于做拷屏,画点,画线等操作.随着游戏.三维模拟以及科学计算可视化等需要,对3D的需求逐渐增加,早期图形绘制工作由CPU来完成,要达到真实感 ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(9)——R600显卡的3D引擎和图形流水线
1. R600 3D引擎 R600核心是AMD一款非常重要的GPU核心,这个核心引入了统一处理器架构,其寄存器和指令集同以前的GPU 都完全不同,对其编程也有比较大的区别. 图1显示了R600 GPU ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(10)——R600显卡的3D引擎编程
3D图形处理流水线需要流经多个硬件单元才能得到最后的渲染结果,流水线上的所有的硬件单元必须被正确编程,才能得到正确的结果. 总体上看,从图形处理流水线的源头开始,需要准备好vertex和index,在 ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(7)——AMD显卡的软件中断
CPU上处理的中断可以分成“硬件中断”和“软件中断”两类,比如网卡产生的中断称为硬件中断,而如果是软件使用诸如"int 0x10"(X86平台上)这样的指令产生中断称为软件中断,硬 ...
Linux环境下的图形系统和AMD R600显卡编程(2)——Framebuffer、DRM、EXA和Mesa简介
转:https://www.cnblogs.com/shoemaker/p/linux_graphics02.html 1. Framebuffer Framebuffer驱动提供基本的显示,fram ...
【原创】Linux环境下的图形系统和AMD R600显卡编程(6)——AMD显卡GPU命令格式
前面一篇blog里面描述了命令环缓冲区机制,在命令环机制下,驱动写入PM4(不知道为何会取这样一个名字)包格式的命令对显卡进行配置.这一篇blog将详细介绍命令包的格式. 当前定义了4中命令包,分别是 ...

随机推荐

菜鸟教程perl总结
数据类型有: 标量$, 数组@,哈希% 数组声明 : @hits = (25, 30, 40); 或者 @sites = qw/google taobao runoob/; 数组操作 pop, ...
笔记-python-多线程-深入-1
笔记-python-多线程-深入-1 1. 线程池 1.1. 线程池:控制同时存在的线程数量 threading没有线程池,只能自己控制线程数量. 基本有两种方式: 每间隔一段时间创建 ...
luogu4196 [CQOI2006]凸多边形半平面交
据说pkusc出了好几年半平面交了,我也来水一发 ref #include <algorithm> #include <iostream> #include <cstdi ...
laravel5.5任务调度
目录 1. 定义调度 1.1 使用Closure 1.2 Artisan 命令调度 1.3 队列任务调度 1.4 Shell 命令调度 1.5 调度频率设置 1.6 闭包测试限制 1.7 避免任务重复 ...
Github上最受关注的前端大牛快来膜拜把！
Github上最受关注的前端大牛快来膜拜吧! 来源:csdn 发布时间:2014-08-06 阅读次数:4058 14 本文列出了Github上最受关注的10位前端大牛.看看他们负责的项目和提交 ...
考拉Android统一弹框
作者:钱成杰背景在快速开发的背景下,经历了n个版本后的考拉Android App中已经存在了各种各样看似相同却各有差别的弹框样式.其中包括系统弹框和自定义弹框,并且在线上时常会出现IllegalA ...
IIS相关服务和无法启动服务W3SVC错误提示
首先,打开“服务”查看下面的服务是否启动.(下面的两个服务就是跟IIS相关的服务,计算机(右键)->管理->服务和应用程序->服务) Windows Process Activati ...
19、AngularJs知识点总结 part-1
1.AngularJs AngularJs是一款JavaScript开源库,由Google维护,用来协助单一页面应用程序: AngularJs的目标是通过MVC模式增强基于浏览器的应用,使开发和测试变 ...
leetcode 【 Linked List Swap Nodes in Pairs 】 python 实现
题目: Swap Nodes in Pairs Given a linked list, swap every two adjacent nodes and return its head. For ...
IOS测试，打不开要测试的APP怎么办？设置信任
步骤:设置-->通用-->设备管理-->企业级应用-->信任具体教程:http://jingyan.baidu.com/article/ab69b27085ab002ca71 ...

【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程

Understanding GPUs from the ground up

【原创】Linux环境下的图形系统和AMD R600显卡编程(8)——AMD显卡DRM驱动初始化过程的更多相关文章

随机推荐

热门专题