GCC提供的几个內建函数
参考
https://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Other-Builtins.html#Other-Builtins
https://en.wikipedia.org/wiki/Find_first_set#CTZ
| Tool/library | Name | Type | Input type(s) | Notes | Result for zero input |
|---|---|---|---|---|---|
| POSIX.1 compliant libc 4.3BSD libc OS X 10.3 libc[2][21] |
ffs |
Library function | int | Includes glibc. POSIX does not supply the complementary log base 2 / clz. |
0 |
| FreeBSD 5.3 libc OS X 10.4 libc[22] |
ffslflsflsl |
Library function | int, long |
fls ("find last set") computes (log base 2) + 1. | 0 |
| FreeBSD 7.1 libc[23] | ffsllflsll |
Library function | long long | 0 | |
| GCC | __builtin_ffs[l,ll,imax] |
Built-in functions | unsigned int, unsigned long, unsigned long long, uintmax_t |
0 | |
| GCC 3.4.0[24][25] | __builtin_clz[l,ll,imax]__builtin_ctz[l,ll,imax] |
undefined | |||
| Visual Studio 2005 | _BitScanForward[28]_BitScanReverse[29] |
Compiler intrinsics | unsigned long, unsigned __int64 |
Separate return value to indicate zero input | 0 |
| Visual Studio 2008 | __lzcnt[30] |
Compiler intrinsic | unsigned short, unsigned int, unsigned __int64 |
Relies on x64-only lzcnt instruction | Input size in bits |
| Intel C++ Compiler | _bit_scan_forward_bit_scan_reverse[31] |
Compiler intrinsics | int | undefined | |
| NVIDIA CUDA[32] | __clz |
Functions | 32-bit, 64-bit | Compiles to fewer instructions on the GeForce 400 Series | 32 |
__ffs |
0 | ||||
| LLVM | llvm.ctlz.*llvm.cttz.*[33] |
Intrinsic | 8, 16, 32, 64, 256 | LLVM assembly language | Input size if arg 2 is 0, else undefined |
GHC 7.10 (base 4.8), in Data.Bits |
countLeadingZeroscountTrailingZeros |
Library function | FiniteBits b => b |
Haskell programming language | Input size in bits |
翻转32位数各字节
Returns x with the order of the bytes reversed; for example,
0xaabbccddbecomes0xddccbbaa. Byte here always means
exactly 8 bits.
翻转64位数各字节
Similar to
__builtin_bswap32, except the argument and return types
are 64-bit.
分支预测
You may use
__builtin_expectto provide the compiler with
branch prediction information. In general, you should prefer to
use actual profile feedback for this (-fprofile-arcs), as
programmers are notoriously bad at predicting how their programs
actually perform. However, there are applications in which this
data is hard to collect.The return value is the value of exp, which should be an integral
expression. The semantics of the built-in are that it is expected that
exp == c. For example:if (__builtin_expect (x, 0))
foo ();would indicate that we do not expect to call
foo, since we expectxto be zero. Since you are limited to integral expressions for exp, you should use constructions such asif (__builtin_expect (ptr != NULL, 1))
error ();when testing pointer or floating-point values.
预取
This function is used to minimize cache-miss latency by moving data into
a cache before it is accessed.
You can insert calls to__builtin_prefetchinto code for which
you know addresses of data in memory that is likely to be accessed soon.
If the target supports them, data prefetch instructions will be generated.
If the prefetch is done early enough before the access then the data will
be in the cache by the time it is accessed.The value of addr is the address of the memory to prefetch.
There are two optional arguments, rw and locality.
The value of rw is a compile-time constant one or zero; one
means that the prefetch is preparing for a write to the memory address
and zero, the default, means that the prefetch is preparing for a read.
The value locality must be a compile-time constant integer between
zero and three. A value of zero means that the data has no temporal
locality, so it need not be left in the cache after the access. A value
of three means that the data has a high degree of temporal locality and
should be left in all levels of cache possible. Values of one and two
mean, respectively, a low or moderate degree of temporal locality. The
default is three.for (i = 0; i < n; i++)
{
a[i] = a[i] + b[i];
__builtin_prefetch (&a[i+j], 1, 1);
__builtin_prefetch (&b[i+j], 0, 1);
/* ... */
}Data prefetch does not generate faults if addr is invalid, but the address expression itself must be valid. For example, a prefetch of
p->nextwill not fault ifp->nextis not a valid address, but evaluation will fault ifpis not a valid address.If the target does not support data prefetch, the address expression is evaluated if it includes side effects but no other code is generated and GCC does not issue a warning.
__builtin_return_address(LEVEL)
—This function returns the return address of the current function,or of one of its callers. The LEVEL argument is number of frames to scan up the call stack. A value of ‘0’ yields the return address of the current function,a value of ‘1’ yields the return address of the caller of the current function,and so forth.
__builtin_alloca (https://linux.die.net/man/3/alloca)
alloca - allocate memory that is automatically freed
Synopsis
#include <alloca.h>
void *alloca(size_t size);
Description
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
Return Value
The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
Conforming to
This function is not in POSIX.1-2001.
There is evidence that the alloca() function appeared in 32V, PWB, PWB.2, 3BSD, and 4BSD. There is a man page for it in 4.3BSD. Linux uses the GNU version.
Notes
The alloca() function is machine- and compiler-dependent. For certain applications, its use can improve efficiency compared to the use of malloc(3) plus free(3). In certain cases, it can also simplify memory deallocation in applications that use longjmp(3) or siglongjmp(3). Otherwise, its use is discouraged.
Because the space allocated by alloca() is allocated within the stack frame, that space is automatically freed if the function return is jumped over by a call to longjmp(3) or siglongjmp(3).
Do not attempt to free(3) space allocated by alloca()!
Notes on the GNU version
Normally, gcc(1) translates calls to alloca() with inlined code. This is not done when either the -ansi, -std=c89, -std=c99, or the -fno-builtin option is given (and the header <alloca.h> is not included). But beware! By default the glibc version of <stdlib.h> includes <alloca.h> and that contains the line:
#define alloca(size) __builtin_alloca (size)with messy consequences if one has a private version of this function.
The fact that the code is inlined means that it is impossible to take the address of this function, or to change its behavior by linking with a different library.
The inlined code often consists of a single instruction adjusting the stack pointer, and does not check for stack overflow. Thus, there is no NULL error return.
GCC提供的几个內建函数的更多相关文章
- python基础(二)字符串內建函数详解
字符串 定义:它是一个有序的字符的集合,用于存储和表示基本的文本信息,''或""或''' '''中间包含的内容称之为字符串特性:1.只能存放一个值2.不可变,只能重新赋值3.按照从 ...
- Python学习进程(8)字符串內建函数
Python字符串內建函数实现了string模块的大部分方法,并包括了对Unicode编码方式的支持. (1)capitalize(): 将字符串的第一个字母变成大写,其他字母变小写. ...
- gcc提供的原子操作函数
gcc从4.1.2提供了__sync_*系列的built-in函数,用于提供加减和逻辑运算的原子操作.其声明如下: type __sync_fetch_and_add (type *ptr, type ...
- 第四章:Python基础の快速认识內置函数和操作实战
本課主題 內置函数介紹和操作实战 装饰器介紹和操作实战 本周作业 內置函数介紹和操作实战 返回Boolean值的內置函数 all( ): 接受一個可以被迭代的對象,如果函数裡所有為真,才會真:有一個是 ...
- GCC 提供的原子操作
gcc从4.1.2提供了__sync_*系列的built-in函数,用于提供加减和逻辑运算的原子操作. 其声明如下: type __sync_fetch_and_add (type *ptr, typ ...
- 转载:GCC 提供的原子操作
转载自:GCC 提供的原子操作 GCC 提供的原子操作 gcc从4.1.2提供了__sync_*系列的built-in函数,用于提供加减和逻辑运算的原子操作. 其声明如下: type __sync_f ...
- Atomic Builtins - Using the GNU Compiler Collection (GCC) GCC 提供的原子操作
http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html gcc从4.1.2提供了__sync_*系列的built-in函数,用 ...
- JS事件之自建函数bind()与兼容性问题解决
JavaScript事件绑定常用方法 对象.事件 = 函数; 它只能同时为一个对象的一个事件绑定一个响应函数 不能绑定多个,如果有多个,后面的会覆盖前面的 addEventListener() 此方法 ...
- python学习-day11-内建函数
python-内建函数 -int:将字符串转换为数字 a = " print(type(a),a) b = int(a) print(type(b),b) num = " v = ...
随机推荐
- ElasticSearch(十八)初识分词器
1.什么是分词器 作用:切分词语,normalization(提升recall召回率),如给你一段句子,然后将这段句子拆分成一个一个的单个的单词,同时对每个单词进行normalization(时态转换 ...
- 我的Java开发学习之旅------>Base64的编码思想以及Java实现
Base64是一种用64个字符来表示任意二进制数据的方法. 用记事本打开exe.jpg.pdf这些文件时,我们都会看到一大堆乱码,因为二进制文件包含很多无法显示和打印的字符,所以,如果要让记事本这样的 ...
- PAT 1058. 选择题(20)
批改多选题是比较麻烦的事情,本题就请你写个程序帮助老师批改多选题,并且指出哪道题错的人最多. 输入格式: 输入在第一行给出两个正整数N(<=1000)和M(<=100),分别是学生人数和多 ...
- Git——基本思想和工作原理(二)
核心知识点: 1.Git关注文件数据的整体是否发生变化,对更新的文件做一个快照,然后保存一个指向快照的索引,而不会关注文件数据的具体变化. 2.Git版本的更新几乎都发生在本地,不会因为没有网络而不能 ...
- Python基础(2)_数字和字符串类型
一.数据类型 1.数字 整型 Python的整型相当于C中的long型,Python中的整数可以用十进制,八进制,十六进制表示. >>> --------->默认十进制 > ...
- Linux平台下贪吃蛇游戏的运行
1.参考资料说明: 这是一个在Linux系统下实现的简单的贪吃蛇游戏,同学找帮忙,我就直接在Red Hat中调试了一下,参考的是百度文库中"maosuhan"仁兄的文章,结合自己的 ...
- c# 文件IO操作 StreamReader StreamWriter Split 使用
StreamWriter(String,Boolean) 若要追加数据到该文件中,则为 true:若要覆盖该文件,则为 false. 如果指定的文件不存在,该参数无效,且构造函数将创建一个新文件. 例 ...
- debian下为stm32f429i-discovery编译uboot
交叉编译器:arm-uclinuxeabi-2010q1 交叉编译器下载下来后解压,然后将其中bin文件夹路径加入到PATH变量中. 先下载uboot和linux源码: git clone https ...
- 分享知识-快乐自己:遍历Map集合
import java.util.HashMap; import java.util.Iterator; import java.util.Map; public class TestMap { pu ...
- Using SMOTEBoost(过采样) and RUSBoost(使用聚类+集成学习) to deal with class imbalance
Using SMOTEBoost and RUSBoost to deal with class imbalance from:https://aitopics.org/doc/news:1B9F7A ...