Minimizing the overall power conservation in a symmetric multiprocessor system disposed in a system-on-chip (SoC) depends on using voltage islands operated at different voltages such that similar circuits will perform at significantly different levels because of the voltage differences. Otherwise identical processor cores of a symmetric multiprocessor system are placed in various ones of the voltage islands such that they consume differing operating powers because of the voltage differences. The slack-time experienced by each of a variety of tasks during execution by the processor cores is calculated and tabulated. Thereafter, each task is scheduled for a particular processor core according to its performance level as determined by its operating voltage, and depending on the corresponding slack-time calculation for the task. Such scheduling will minimize power consumption by selecting a minimum power processor core, and still allow for complete execution in the time available.

Field of the Invention

The present invention relates to symmetric multiprocessor systems implemented with voltage islands, and more particularly to methods and circuits to conserve system-on-chip (SoC) power consumption by measuring performance and estimation for run-time execution phases to automatically adapt to varying application workloads .

Description of Related Art

Symmetric multiprocessing (SMP) connects identical processors a single shared main memory. Most common multiprocessor systems today use an SMP architecture. Any task can be assigned for execution by any processor, because the data for that task is located in shared memory. Conventional SMP operating systems can move tasks between processors to balance the work load efficiently. The many individual processors in an SMP implementation can starve for data, because memory is typically much slower than each of processors accessing it. SMP has many applications in science, industry, and business where the software is custom programmed for multithreaded processing. But, word processors, computer games, and other most consumer products are not written in a way that they can benefit from SMP. Programs running on SMP systems can perform better even when they have been written for uniprocessor systems.

Hardware interrupts usually suspend program execution, while in an SMP system the kernel hands them off to be run on an idle processor instead. Given proper operating system (OS) support, some applications like software compilers and distributed computing projects, will see speed-ups nearly multiply by the number of additional processors.

When many jobs are being processed in an SMP environment, a loss of hardware efficiency can result. Software programs have been developed to schedule jobs in various ways so that processor utilization is optimized.

A given processor can generally increase its performance if clocked at a higher rate. The highest clocking rates are limited by the semiconductor technology and the operating voltage. For a given operating voltage and semiconductor technology, there will be a maximum clock rate. If the operating voltage is increased, the clocks can be proportionately increased to step up performance, within practical limits. But, the power consumption

will go up as the square of the voltage increase,
 o

K increased performance is gained at a cost of much increased power consumption.

The prior art has used dynamic voltage scheduling based on the task execution phases computed off-line. What would be better would be to use a performance measurement and estimation unit to analyze the execution phases during run-time. That way the system could automatically adapt to varying application workloads. Static mapping of the tasks to a single processor is also not desirable. It would be best to schedule tasks dynamically onto a set of symmetric multiprocessors based on their estimated execution times . The prior art has also dealt with scheduling tasks onto a processor that supports different voltages, keeping the power consumption as the cost function. But such do not use DVS on a per task basis. Such conventional devices sort tasks by deadlines in ascending order. They are put in a reservation list of the target processing element. The slack time of the first

task is computed with both the highest and lowest voltage levels supported by the target processing element. The computation time of the task at the highest and lowest voltages is calculated. The voltage level of the target processing element is found at which the task can meet the deadline. Then DVS is used to change its voltage level and the task is scheduled onto it.

What is needed is a system in which different cores of an SoC are provided with different supply voltages. Voltage islands enable core-level power optimization for SoC design by utilizing a unique supply voltage for each core. In an SoC with homogeneous cores and voltage islands, then all tasks can be kept on one list. Each task is analyzed for its performance requirements. Then a particular processor whose supply voltage can already guarantee the performance requirement of the task is chosen to run the task. Next time, it is analyzed again, and the same task can be mapped onto a different processor having a different supply voltage that can guarantee its performance requirements .

SUMMARY OF THE INVENTION

Briefly, symmetric multiprocessor systems with voltage islands include dynamic scheduling of the application tasks onto the available processors. The voltage islands enable each processor to be supplied with different independent operating voltages and corresponding maximum clock frequencies. The difference between the available and actual processor task execution times are measured so the slack times of recurrences of these tasks can be estimated. The processing capabilities of the individual processors are cataloged according to particular operating voltages and frequencies. A dynamic scheduler chooses an appropriate processor and voltage island setting for the tasks

based on the apparent processor required. This yields optimal power consumption.

An advantage of the present invention is that a SoC implementation of a symmetric multiprocessor system is provided which will conserve power.

A further advantage of the present invention is a SoC implementation of a symmetric multiprocessor system is provided that uses voltage islands to avoid real time overheads associated with DVS power controls. A still further advantage of the present invention is that a SoC implementation of a symmetric multiprocessor system is provided that schedules tasks according to a slack-time calculator .

The above and still further objects, features, and advantages of the present invention will become apparent upon consideration of the following detailed description of specific embodiments thereof, especially when taken in conjunction with the accompanying drawings .

DETAILED DESCRIPTION OF THE INVENTION

Fig. 1 represents a symmetric multiprocessor system embodiment of the present invention, and is referred to herein by the general reference numeral 100. The symmetric multiprocessor system 100 comprises a plurality of voltage islands 102-104 disposed in a common semiconductor chip 106. These voltage islands are provided three different operating voltages, hi, mid, and lo-voltage. A plurality of identical processor cores 108-110 are respective disposed in the voltage islands 102-104, but the differing operating voltages results in processor core 108 having relatively high performance (hi-MIPS) , processor core 109 having medium performance (mid-MIPS) , and processor core 110 having relatively low performance (lo-MIPS) . All three processor cores 108-110 each have access to a shared memory 112.

Since all three processor cores 108-110 are operated at different voltage levels, their respective power consumptions also differ. The power consumption will go up as the square of the voltage increase. E.g.,
. So increased performance

K is gained, e.g., in processor core 108, at a cost of much increased power consumption. In modern battery-operated portable devices, all power consumption waste must be strictly eliminated.

An operating system (OS) 114 provides for the selection of particular software tasks 116-120 to be executed by various ones of the processor cores 108-110. A slack time calculator 122 is connected to compare how much time an assigned task 116-120 took to be executed on a corresponding processor core 108-110. A processing capabilities and task loading table 124 is constructed from data obtained by the slack time calculator 122. It catalogs the processing capabilities of each of the processor cores 108- 110. A relative measure for the processing loads imposed by tasks previously executed is entered.

During operation, the performance required by a "task-a:e" can be tabulated, like in Table 124, in terms of millions- instruction-per-second (MIPS) . For example, heuristic evidence can be gathered to show "task-d" requires a medium MIPS performance. The second chart can be indexed to show "CPU-2" would be the appropriate choice for scheduling task-d. A message is sent from table 124 to scheduler 126 so that when OS 114 wants task-d 119 executed, it will be scheduled for dispatch to processor core 109. The slack-time calculator 122 can re-verify this was a good choice after the task executes. Since the higher power-consuming processor core 108 was avoided, power was saved by using a lower powered alternative processor core.

Different processor cores of the SoC are provided with different supply voltages. The voltage islands enable core-level power optimization for SoC design by providing a unique fixed supply voltage for each processor core. SoC ' s with voltage islands and homogeneous processor cores allow all the tasks to be kept in a single list that can be applied to all the cores in the SoC. Prior art devices prepare separate task lists for each processor core. Each task is gauged according to its performance requirements .

One-by-one, the tasks are matched up to a particular processor that can guarantee execution completion within the allotted time frame because its supply voltage is high enough. Every time the task presents itself, it is again analyzed and dynamically scheduled for the most appropriate processor core. In this way the dynamic scheduling is performed. DVS techniques have inherent overheads that may be too costly to employ on a task-by-task basis. The symmetric multiprocessor system 100 instead selects processor cores whose supply voltage can produce the necessary levels of performance in given window of time.

The slack time calculator and processing capabilities and task loading table 124 provide for performance measurement and estimation to capture the execution times of tasks in an application software. The performance measurements can be trace-based or accumulated from the execution times and the deadline requirement of individual tasks. The average/median values for slack times is computed, and multiple values of slack times in a certain time window are used to characterizes each slack time per task. Such characteristic information is thereafter used to predict a future execution and slack time of the task, thus dynamically scheduling tasks to the appropriate processors. The slack time calculator and processing capabilities and task loading table further provides for resetting of measurement information for a tasks when they have been newly scheduled on a different processor core.

Fig. 2 represents a symmetric multiprocessor system embodiment of the present invention that makes each voltage island adjustable with DVS, and is referred to herein by the general reference numeral 200. The symmetric multiprocessor system 200 comprises a plurality of voltage islands 202-204 disposed in a common semiconductor chip 206. These provide for independent and controllable dynamic voltage scaling (DVS) for each voltage island. Dynamic voltage scaling (DVS) has become a universal way to adjust the operating voltage and clocks to balance a circuit's performance with its concomitant power consumption. Voltage islands have been used in conventional designs to allow portions of a chip's circuitry to be subjected to independent DVS controls. But after the DVS has changed to a new value, settling times can be as long as 200-milliseconds . So it would be best not to change the DVS settings very often, or not at all after being initialized.

A plurality of processor cores 208-210 each have access to a shared memory 212. Each is disposed in a respective one of the voltage islands such that changes in the DVS will proportionately affect each processor core's performance and power consumption. An operating system (OS) 214 provides for the selection of particular software tasks 216-220 to be executed by various ones of the processor cores 208-210. A slack time calculator 222 is

connected to compare how much time an assigned task 216-120 took to be executed on a corresponding processor core 208-110 at a particular DVS setting.

A processing capabilities and task loading table 224 is constructed from data obtained by the slack time calculator 222. It catalogs the processing capabilities of each of the processor cores 208-210 at a variety of DVS settings. A relative measure for the processing loads imposed by tasks previously executed is entered. A scheduler 226 is controlled by the OS 214. It looks for an entry in the processing capabilities and task loading table 224 for each new next task to be scheduled. If previously characterized, then it sends the task to an appropriate, available processor core 208-110. A DVS control is issued for the corresponding voltage island to a value that will minimize power consumption and still allow for complete execution in the time available.

The dynamic scheduling assigns application tasks onto the available voltage island isolated processors of the symmetric multiprocessor system. Each of the processors is supplied with different independent operating voltages/frequencies. The slack times are measured by taking the time difference between the deadline and actual execution time of the task. These are placed in an application task graph. These measurement traces allow future slack times of tasks to be estimated. The individual processor processing capabilities are statically maintained. Such dynamic scheduling basically chooses the appropriate processor for the tasks based on their slack times to yield optimal power consumption. The processing capabilities table indicates the performance in MIPS provided by the individual processors. The MIPS offered by individual processors depends on the operating voltage and frequency of the processors. The table can be statically configured by looking at the voltage sources of the processors to

then estimate the expected performance. It can also compute the average slack times of tasks by making several, on-going observations .

The dynamic scheduler maps tasks onto the available processors using static estimates and profiling results. It gets slack information for all the application tasks from the performance measurement and estimation unit. Considering the first tasks mapped on the first processor, if the measured slack time of a first task on a first processor deviates from its average slack time, then such first task can be dynamically scheduled onto a second processor that matches the performance requirement. The mapping of tasks onto other processors is based on a processing capabilities table lookup.

In general, embodiments of the present invention are such that task assignment to an available processor assumes they are symmetrical but employ different supply voltages which make them capable of providing different throughputs. The task scheduling is based on the slack times of the task. Using voltage islands and dynamically scheduling the tasks based on their estimated performance requirements, or slack periods, provides substantial power savings.

SRC=https://www.google.com.hk/patents/WO2007077516A1

Power aware dynamic scheduling in multiprocessor system employing voltage islands的更多相关文章

  1. Thermally driven workload scheduling in a heterogeneous multi-processor system on a chip

    Various embodiments of methods and systems for thermally aware scheduling of workloads in a portable ...

  2. PatentTips - Enhanced I/O Performance in a Multi-Processor System Via Interrupt Affinity Schemes

    BACKGROUND OF THE INVENTION This relates to Input/Output (I/O) performance in a host system having m ...

  3. Parallelized coherent read and writeback transaction processing system for use in a packet switched cache coherent multiprocessor system

    A multiprocessor computer system is provided having a multiplicity of sub-systems and a main memory ...

  4. Power control within a coherent multi-processing system

    Within a multi-processing system including a plurality of processor cores 4, 6operating in accordanc ...

  5. Multiprocessing system employing pending tags to maintain cache coherence

    A pending tag system and method to maintain data coherence in a processing node during pending trans ...

  6. Method and apparatus for providing total and partial store ordering for a memory in multi-processor system

    An improved memory model and implementation is disclosed. The memory model includes a Total Store Or ...

  7. Unified shader model

    https://en.wikipedia.org/wiki/Unified_shader_model In the field of 3D computer graphics, the Unified ...

  8. Extended symmetrical multiprocessor architecture

    An architecture for an extended multiprocessor (XMP) computer system is provided. The XMP computer s ...

  9. A multiprocessing system including an apparatus for optimizing spin-lock operations

    A multiprocessing system having a plurality of processing nodes interconnected by an interconnect ne ...

随机推荐

  1. 原生js大总结九

    81.ES6的Symbol的作用是什么?   ES6引入了一种新的原始数据类型Symbol,表示独一无二的值   82.ES6中字符串和数组新增了那些方法   字符串       1.字符串模板    ...

  2. Altium Designer线如何跟着原件走

  3. 编程一一C语言问题,指针函数与函数指针

    资料来源于网上: 一.指针函数:指返回值是指针的函数      类型标识符    *函数名(参数表)       int *f(x,y); 首先它是一个函数,只不过这个函数的返回值是一个地址值.函数返 ...

  4. JMeter--聚合报告之 90% Line 正确理解

    90% Line 参数正确的含义: 虽然,我的上面理解有一定的道理,显然它(90% 用户的响应时间)是错误的.那看看JMeter 官网是怎么说的? 90% Line - 90% of the samp ...

  5. JSONModel

    JSONModel 一个解析 JSON 数据的开源库,可以将 JSON 数据直接解析成自定义的 model ,其中对数据类型的检查和对数据类型的转换比较贴心.最近在项目中使用了以后觉得确实方便很多,推 ...

  6. SoC编译HEX脚本(基于RISC-V的SoC)

    SoC编译HEX脚本(基于RISC-V的SoC) 脚本使用 ./compile hello 脚本:设置RISC-V工具链riscv_set_env ############## RISC-V #### ...

  7. Android 利用an框架快速实现夜间模式的两种套路

    作者:Bgwan链接:https://zhuanlan.zhihu.com/p/22520818来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处. 网上看到过大多实现夜间模 ...

  8. C标签的使用.md

    <c:set> 设置变量 <c:set var="a" scope="request" value="${'www'}"/ ...

  9. 19、opencv和v4l2的关系

    分析如下: v4L2是针对uvc免驱usb设备的编程框架,而opencv是一种跨平台计算机视觉库,opencv不仅支持v4l2框架,还支持windows.os等操作系统上的摄像头框架 cvCreate ...

  10. [疯狂Java]JDBC:事务管理、中间点、批量更新

    1. 数据库事务的概念:     1) 事务的目的就是为了保证数据库中数据的完整性.     2) 设想一个银行转账的过程,假设分两步,第一步是A的账户-1000,第二步是B的账户+1000.这两个动 ...