挖一挖unsigned int和补码

文章要讨论的是两部分：

1. 原码，反码和补码。

2. short, unsigned short, int, unsigned int, long, unsigned long的表示及转换

1. 原码，反码和补码

原码是最直观的表示方式：最高位表示符号(0表示正，1表示负)，其余位表示大小。假设占位为1字节的数，原码表示的范围就是[-127 ~ 127]一共255个数字。理论上8个bit可以表示256个数，我们只能表示255个，是原码的设计让10000000和00000000都可以表示0。[1]

计算机中使用的不是原码，而是补码。这样做的原因在于：为了简化计算，计算机把1-1当作1+(-1)来做，从而只需要设计加法的实现。然而原码的表示无法让1-1和1+(-1)结果相等。反码虽然可以，但是最后的是1-1=+0，1+(-1)=-0，导致了0有两种表示方法。只有补码的设计，让1+(-1)和1-1得到了满意的一致结果(所有位数都为零，我们用它来表示补码的0)。

反码的定义：正数的反码就是其本身；负数的反码表示，是将其除了最高位，其余全部取反。因此，我们依然可以从最高位看出其正负。

补码的定义：正数的补码就是其本身；负数的补码是在其原码的基础上, 符号位不变, 其余各位取反, 最后+1(即在反码的基础上+1)，0的补码表示是唯一的，就是所有位全零。

例子：

[+1] = [00000001]_原 = [00000001]_反 = [00000001]_补

[-1] = [10000001]_原 = [11111110]_反 = [11111111]_补

因为机器使用补码, 所以对于编程中常用到的32位int类型, 可以表示范围是: [-2³¹, 2³¹-1] 因为第一位表示的是符号位。而使用补码表示时又可以多保存一个最小值.

归纳起来，有几个注意点：

(1) 相同位数下，原码和反码可以表示的下限相同，补码可以表示的最小值则比他们还要小1。

以8位为例，原码和反码的下限都是-(2⁷-1)，原码的表示是11111111，反码的表示是10000000，补码的-(2⁷-1)表示方式是10000001(反码+1)，但是补码还可以用10000000表示-2⁷。上限是正数，大家表示方法相同，因此一致。

(2) 补码表示方法中，最小值的表示方法是最高位是1，其余全为0。

2. short, unsigned short, int, unsigned int, long, unsigned long的表示和混用的结果

cpu, OS, complier都可以32位和64位之分。但是决定一种类型占的字节数的，最直接的是complier的位数。(Ultimately the compiler does, but in order for compiled code to play nicely with system libraries, most compilers match the behavior of the compiler[s] used to build the target system.[2])

常用数据类型对应字节数[3]

32位编译器：

char ：1个字节
      char*（即指针变量）: 4个字节（32位的寻址空间是2^32, 即32个bit，也就是4个字节。同理64位编译器）
      short int : 2个字节
      int：  4个字节
      unsigned int : 4个字节
      float:  4个字节
      double:   8个字节
      long:   4个字节
      long long:  8个字节
      unsigned long:  4个字节

64位编译器：

char ：1个字节
      char*(即指针变量): 8个字节
      short int : 2个字节
      int：  4个字节
      unsigned int : 4个字节
      float:  4个字节
      double:   8个字节
      long:   8个字节
      long long:  8个字节
      unsigned long:  8个字节

跨平台时为了避免问题，往往使用__int8， __int16，__int32，__int64。

混用的结果

比如出现：unsigned int a = 3; return a * -1; 结果会如何呢？

首先，不同类型的数在一起运算，必然会让编译器将它们划为同一类型再进行计算。这种类型间的自动转化标准，被称作Usual arithmetic conversions。下面是摘自MSDN上关于它的说明[4]：

If either operand is of type long double, the other operand is converted to type long double.
If the above condition is not met and either operand is of type double, the other operand is converted to type double.
If the above two conditions are not met and either operand is of type float, the other operand is converted to type float.
If the above three conditions are not met (none of the operands are of floating types), then integral conversions are performed on the operands as follows:
- If either operand is of type unsigned long, the other operand is converted to type unsigned long.
- If the above condition is not met and either operand is of type long and the other of type unsigned int, both operands are converted to type unsigned long.
- If the above two conditions are not met, and either operand is of type long, the other operand is converted to type long.
- If the above three conditions are not met, and either operand is of type unsigned int, the other operand is converted to type unsigned int.
- If none of the above conditions are met, both operands are converted to type int.

这样，这个问题就好回答了，-1会被默认为int型，但是int和unsigned int做运算，int会被自动转化为unsigned int。

那么-1转换为unsigned int会是什么？

有了第一节中的讨论，下面的推论就非常明显：计算机中的表示方法是补码，int的字节数是4字节，因此-1在机器中是：0xFFFFFFFF。

这个时候我们将它当作unsigned int识别出来，unsigned int的特点是：最高位不作为符号位，所有位都表示值。

因此32位编译器上，unsigned int的范围是[0, 2³²-1]，int的范围是[-2³¹, 2³¹-1](补码可以多表示一个最小值)

当0xFFFFFFFF的所有位都作为数值位时，其十进制表示就成了2³²-1，再乘以3，毫无疑问超过了32位而出现溢出，unsigned int取前32位，结果就是0xfffffffd，一个接近unsigned int上限的正整数。

这道例题来自http://blog.sina.com.cn/s/blog_4c7fa77b01000a3m.html，据说是微软面试题 :)

在上题的分析中，我们也可以发现一点：

机器中的补码总是不会变的，当我们把它们定义为不同的类型(int, unsigned)编译器将他们解读出来的值就会不同。

举个例子，例子来自[5]的节选：

unsigned b = -10;

if (b) printf("yes\n"); else printf("no\n");

int c = b;

printf("%d\n", c);

unsigned a = 10;

int d = -20;

int e = a + d;
printf("%d\n", e);

答案是：

yes

-10

原因正如上面所说，传值传的是机器中的补码，总不会变(溢出除外)，unsinged int和int只是定义了编译器解读它们的方式。

[5] 中还有一道题也非常有意思，这里就不转过来了，各位看官有兴趣可以移步去看看 :)

参考文章：

http://www.cnblogs.com/zhangziqiu/archive/2011/03/30/ComputerCode.html (这篇文章写的是真好，深入浅出级别。第一部分基本上来自这篇博文)

http://stackoverflow.com/questions/13764892/what-determines-the-size-of-integer-in-c

http://www.cnblogs.com/augellis/archive/2009/09/29/1576501.html

http://msdn.microsoft.com/en-us/library/3t4w2bkb.aspx

http://www.cnblogs.com/krythur/archive/2012/10/29/2744398.html