【CSAPP#0x01】ICS:Lab1 DataLab

正式上ICS了，使用教材CSAPP。
第一个Lab DataLab，助教们在CMU原版Lab基础上加以升级改进，本篇是我的受难记（悲）。

通关截图

实验分析

P1 bitNor

//P1
/* 
 * bitNor - ~(x|y) using only ~ and & 
 *   Example: bitNor(4, 5) = -6, bitNor(-1,-2) = 0
 *   Legal ops: ~ &
 */
int bitNor(int x, int y) {
  // demorgan's law: ~(x|y) == (~x & ~y)
  return (~x & ~y);
}

P2 tmax

//P2
/* 
 * tmax - return maximum two's complement integer (0x7fffffff)
 *   Legal ops: ! ~ & ^ | + << >>
 */
int tmax(void) {
  // 0x7fffffff == ~(0x80000000)
  // 0x80000000 == 1 << 31
  return ~(1 << 31);
}

P3 isTmin

//P3
/*
 * isTmin - returns 1 if x is the minimum two's complement number,
 *     and 0 otherwise 
 *   Legal ops: ! ~ & ^ | +
 */
int isTmin(int x) {
  // 1st Try:
  // Tmin = 0x80000000
  // it should be !(x+x) & !!x
  // but int overflow is an ub...
  // I have to turn it to: return !(2 + ~x + ~x + !x);
  //
  // 2nd Try:
  // Tmin-1 = 0x7fffffff
  // Tmin ^ (Tmin-1) = 0xffffffff
  // ~0xffffffff = 0
  // but minus is baned...
  // so i change it to: return !(~(x + (x + ~0)) + !x);
  // 
  // 3rd Try:
  // ~Tmin+1 == Tmin
  // Tmin ^ (~Tmin+1) == 0

  return !((x ^ (~x+1)) + !x);
}

由于一开始想到的方法涉及到编译器的undefined behavior，所以后来又想到了两种办法，只记录一下其中一种的思路：

由于~Tmin + 1 = Tmin，又a ^ a = 0，所以有Tmin ^ (~Tmin + 1) == 0。
然后再排除0的情况即可。

P4 minusOne

//P4
/*
 * minusOne - return a value of -1
 *   Legal ops: ! ~ & ^ | + << >>
 */
int minusOne(void){
  // -1 == 0x80000000
  return ~0;
}

P5 absVal

//P5
/*
 * absVal - return the absolute value of x
 *   Examples:  absVal(-10) = 10
 *              absVal(5) = 5
 *   Legal ops: ! ~ & ^ | + << >>
 */
int absVal(int x){
  // negative num: ~x+1 == x^0xFFFFFFFF + 1
  // positive num: x    == x^0x00000000 + 0
  int signmask = x>>31;
  return (x^signmask) + (signmask & 1);
}

负数绝对值: ~x+1 == x^0xFFFFFFFF + 1
正数绝对值: x == x^0x00000000 + 0
所以需要想个办法来代替正数还是负数的条件判断。

于是想到了可以用 x >> 31 创建一个mask，如果是负数就是全1，正数就是全0，用 x^mask 来代替 ~x；用 + (mask & 1) 代替 +1 。

P6 leastBitPos

//P6 
/*
 * leastBitPos - return a mask that marks the position of the least significant 1 bit.
 *   Examples: leastBitPos(12) = 4
 *             leastBitPos(-2) = 2
 *             leastBitPos(0)  = 0
 *   Legal ops: ! ~ & ^ | + << >>
 */
int leastBitPos(int x) {
  // what ~x+1 actually do is:
  // retain the LSB and ~all higher bits
  // e.g. 0000_1000b -> 1111_0111b -> 1111_1000b
  return x & (~x+1);
}

题目需要保留最低位，然后把更高的位都置零。

突然发现 ~x+1 和这题十分契合：取反加一会保持最低有效位不变，而只把更高的位取反。因此 x & (~x+1) 就可以只留下最低有效位。

P7 byteSwap

//P7
/*
 * byteSwap - swaps the nth byte and the mth byte
 *  Examples: byteSwap(0x12345678, 1, 3) = 0x56341278
 *            byteSwap(0xDEADBEEF, 0, 2) = 0xDEEFBEAD
 *  You may assume that 0 <= n <= 3, 0 <= m <= 3
 *  Legal ops: ! ~ & ^ | + << >>
 */
int byteSwap(int x, int n, int m){
  // 1. fetch two bits
  // 2. clear two bits
  // 3. assign two bits

  int real_n, real_m, n_val, m_val;

  real_n = n<<3;
  real_m = m<<3;
  n_val = (x >> real_n) & 0xFF;
  m_val = (x >> real_m) & 0xFF;
  x = x & ~(0xFF << real_m);
  x = x & ~(0xFF << real_n);
  x = x | (n_val << real_m);
  x = x | (m_val << real_n);
  
  return x;
}

思路非常简单：

提取出两个待交换的字节
清除所在位置的值
把字节交换填入进去

注意到第 n 个Byte对应第 n*8 个bit，所以为了方便位移，起手就把n和m乘以8（左移3），放在real_n和real_m里面。
后面就是前面那三步。

P8 logicalShift

//P8
/* 
 * logicalShift - shift x to the right by n, using a logical shift
 *   Can assume that 0 <= n <= 31
 *   Examples: logicalShift(0x87654321,4) = 0x08765432
 *   Legal ops: ! ~ & ^ | + << >>
 */
int logicalShift(int x, int n) {
  int mask = ~(((1 << 31) >> n) << 1);
  return (x >> n) & mask;
}

由于限制只能用算数右移，而两者区别就在于：最高位右移的时候会不会带出来一大堆1。
所以只要想办法做一个mask，把最高位都mask掉就行了。

所以这个mask就是 ~(((1 << 31) >> n) << 1)，非常直观易懂。

P9 isLessOrEqual

//P9
/* 
 * isLessOrEqual - if x <= y  then return 1, else return 0 
 *   Example: isLessOrEqual(4,5) = 1.
 *   Legal ops: ! ~ & ^ | + << >>
 */
int isLessOrEqual(int x, int y) {
  // (x <= y) == (x-y < 1) == (x + ~y < 0)
  // but I found that if number is too big(and while x*y<0), errors will happen
  // so I add the x0y and y0x so that the program can judge these special condition directly.

  int xy, isxpos, isypos, x0y, noy0x;

  xy  = ((x+~y) >> 31) & 1; // normal x <= y judge
  isxpos = !(x>>31);
  isypos = !(y>>31);
  x0y = !isxpos & isypos;   //   x < 0 <= y
  noy0x = isypos | !isxpos;   // !(y < 0 <= x)
  return (xy | x0y) & noy0x;
}

首先先想办法把小于等于这种复杂判断转换成好弄一点的判断：
(x <= y) == (x-y < 1) == (x + ~y < 0)

但是我发现这个判断在某些情况下会失效，具体来说：
当xy异号的时候，本应是正数（负数）的左式会溢出变成一个负数（正数）……

所以我就加了两道保险，x0y 用来标识 x < 0 <= y 是否成立，noy0x 用来标识 (y < 0 <= x) 是否不成立。
返回值就是 (xy | x0y) & noy0x ，其中xy是正常x<=y判断。

P10 multFiveEighths

//P10 
/* 
 * multFiveEighths - return floor(x*5/8), for 0 <= x, x is an integer 
 *   Example: multFiveEighths(10) = 6
 *   Legal ops: ! ~ & ^ | + << >>
 */
int multFiveEighths(int x) {
  // normal form: ((x<<2)+x)>>3
  // which is equal to ((x>>3)<<2)+(x>>3)
  // but when calc x>>3, the 3 least significant bits will be discard.
  // so i save them (xl3), and calc them seperately, and add the result together in the end.

  int xl3, xd8;
  xl3 = x & 0x7;  // x lowest 3 bits
  xd8 = x >> 3;   // x divided by 8
  return ((xd8<<2) + xd8) + (((xl3<<2) + xl3)>>3);
}

乘以5除以8，就是 (x<<2)+x 然后再 >>3。
但是不能这么做，因为第一步中的 x<<2 就直接溢出了……

所以改成 ((x>>3)<<2)+(x>>3)，不会溢出。
但是在前面那个括号里面，第一步x>>3直接就丢精度了……

所以我把最低的三个bit用一个变量xl3保存起来，单独做*5/8运算，然后再和((x>>3)<<2)+(x>>3)加起来，就不会溢出也不会丢精度。

P11 bitCount

//P11
/*
 * bitCount - returns count of number of 1's in word
 *   Examples: bitCount(5) = 2, bitCount(7) = 3
 *   Legal ops: ! ~ & ^ | + << >>
 */
int bitCount(int x) {
  // reference: https://en.wikipedia.org/wiki/Hamming_weight
  int mask1, mask2, mask3;

  // the simplest idea is to fetch every bit of x and add them together
  // but the point is to compress the operation into 40 ops
  // so let's do a bunch of add at the same time!

  // 1. define some masks (use 22 ops)
  mask1 = (0x55 << 8) | 0x55;   // 0x55555555
  mask1 = mask1 | (mask1 << 16);
  mask2 = (0x33 << 8) | 0x33;   // 0x33333333
  mask2 = mask2 | (mask2 << 16);
  mask3 = (0x0F << 8) | 0x0F;   // 0x0F0F0F0F
  mask3 = mask3 | (mask3 << 16);

  // // 2. do the operation
  x = (x & mask1) + ((x>>1)  & mask1);
  x = (x & mask2) + ((x>>2)  & mask2); //put count of each 4 bits into those 4 bits 
  x = (x & mask3) + ((x>>4)  & mask3); //put count of each 8 bits into those 8 bits 
  x =  x + (x >> 8 );  //put count of each 16 bits into their lowest 8 bits
  x =  x + (x >> 16);  //put count of each 32 bits into their lowest 8 bits

  return x & 0x7f;
}

这题自个想了好久没想出来，然后google一下，参考Hamming weight - Wikipedia写出来的。

如果用正常的思路去考虑这个问题，显然用一个循环，把所有的bit都加起来就可以了，但是题目限制不能用循环，也不能用大量的计算次数替代循环。
所以需要解决的是如何压缩计算的问题。

网上搜到的解决办法十分聪明，类似于二分法。
数32位的int有多少bit，也就是数2个16位的short有多少个bit，也就是数4个8位有多少bit，……，也就是数32个bit有多少个是1。
但是注意到最后一步是已经完成了的，我们已经知道了每个bit中有多少个bit是1（什么谜语）。
那么进行倒数第二步，数16个2bits里有多少个1，那么只需要把1个bit的结果和相邻的1个bit的结果做一次相加就行。这一步可以通过构造一个mask（0x55555555），再加上右移来完成。
同理进行倒数第三步，数8个4bits里有多少个1，那么只需要把2个bits的结果和相邻2个bits的结果做一次相加就行。这一步可以通过构造一个mask（0x33333333），再加上右移来完成。
……
最后把前16个bits的结果和后16个bits的结果相加，通过构造一个mask（0x0000FFFF），再加上右移就可以完成。
得到32位的int有多少bits。

此外我在参考维基百科上的方法时，发现有些地方不用mask也可以实现，还有构造mask时可以通过复用来减少操作符数量（在思考后面的bitReverse如何通过MAX ops限制的时候发现的），所以这两个优化也加到了这题的代码上。

P12 greatestBitPos

//P12
/*
 * greatestBitPos - return a mask that marks the position of the greatest significant 1 bit.
 *   Example: greatestBitPos(12) = 8
 *            greatestBitPos(-2) = -2147483648
 *            greatestBitPos(0)  = 0
 *   Legal ops: ! ~ & ^ | + << >>
 */
int greatestBitPos(int x) {
  // The quiz is to mask off all bits except the MSB
  // Remove n lower bits: x = (x >> n) << n;

  // For a uint_32: IF there is 1 in higher 16 bits, then we can throw the lower 16 bits,
  // For a uint_16: IF there is 1 in higher  8 bits, then we can throw the lower  8 bits,
  // For a uint_8:  IF there is 1 in higher  4 bits, then we can throw the lower  4 bits,
  // For a uint_4:  IF there is 1 in higher  2 bits, then we can throw the lower  2 bits,
  // For a uint_2:  IF there is 1 in higher  1 bits, then we can throw the lower  1 bits.

  int a1, a2, a3, a4, a5;
  int mask = (0xFF << 8) | 0xFF;

  a1 = (!!(x>>16)) << 4;
  x = x >> a1;
  x = x & mask;   // to tackle with extra 0xFFFFxxxx created by "Arithmetic Right Shift"

  a2 = (!!(x>>8)) << 3;
  x = x >> a2;

  a3 = (!!(x>>4)) << 2;
  x = x >> a3;

  a4 = (!!(x>>2)) << 1;
  x = x >> a4;

  a5 = (x & 2) >> 1;
  x = x >> a5;

  x <<= a1 + a2 + a3 + a4 + a5;
  return x;
}

和leastBitPos相反，要保留MSb（bit），然后把其他低位的bit全部mask掉。
最直观的思路是用一个while循环来判断最高位在哪里，但是题目限制不能用循环，也不能用大量的计算次数替代循环。
和上一题似乎很类似。

所以这题想到了把问题一步一步化简的方法：
把问题首先分解成MSb在高16位还是低16位？如果在高16位，就用一个局部变量a1记录一下，然后把高位移到低位。
这样一来，问题就变成了MSb在高8位还是低8位？
……
一步步拆下来，就可以通过局部变量a1，a2等来知道MSb的位置。

P13 bang

//P13
/* 
 * bang - Compute !x without using !
 *   Examples: bang(3) = 0, bang(0) = 1
 *   Legal ops: ~ & ^ | + << >>
 */
int bang(int x) {
  // the key is to use the sign bit to distinguish between 0 and others.
  // isneg(x): (x >> 31) & 1;
  // ispos(x): ((~x + 1) >> 31) & 1;
  // return ~(ispos | isneg) & 1;
  // and I made some optimization:
  return ~((x | (~x+1)) >> 31) & 1;
}

0和别的数的区别：
正数取反加一会变成一个负数，负数的符号位是1；而0取反加一后还是自身，符号位为0.
最后返回 ~(ispos | isneg) & 1 即可。

P14 bitReverse

//P14
/*
 * bitReverse - Reverse bits in an 32-bit integer
 *   Examples: bitReverse(0x80000004) = 0x20000001
 *             bitReverse(0x7FFFFFFF) = 0xFFFFFFFE
 *   Legal ops: ~ & ^ | + << >>
 */
int bitReverse(int x){
  // simply mask some bits and exchange them
  int mask_1, mask_2, mask_4, mask_8, mask_16;

  mask_1 = 0x55 << 8 | 0x55;
  mask_1 = mask_1 | mask_1 << 16;
  mask_2 = 0x33 << 8 | 0x33;
  mask_2 = mask_2 | mask_2 << 16;
  mask_4 = 0x0f << 8 | 0x0f;
  mask_4 = mask_4 | mask_4 << 16;
  mask_16 = 0xff << 8 | 0xff;
  mask_8 = (mask_16 << 8) ^ mask_16;

  // 24 ops
  x = ((x >> 16) & mask_16) | ( x           << 16);
  x = ((x >>  8) & mask_8)  | ((x & mask_8) <<  8);
  x = ((x >>  4) & mask_4)  | ((x & mask_4) <<  4);
  x = ((x >>  2) & mask_2)  | ((x & mask_2) <<  2);
  x = ((x >>  1) & mask_1)  | ((x & mask_1) <<  1);
  return x;
}

我的思路非常简单，就是采用二分法一步步交换。
先两个bit互换，然后2个2比特互换，……，最后两个16比特互换。

然后花了大量的时间来想怎么才能通过最大操作符数量的限制，在室友的启发下想到了在构造mask的时候可以通过复用来减少操作符数量，才终于过关。

P15 mod3

//P15
/*
 * mod3 - calculate x mod 3 without using %.
 *   Example: mod3(100) = 1
 *            mod3(-100) = -1
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 90
 *   Rating: 4
 */
int mod3(int x){
  // 在偶数位的bit，会给模3的结果贡献2
  // 在奇数位的bit，会给模3的结果贡献1
  int mask2;  int mask4;  int mask16;
  int sign_bits;  int is_neg;
  int xis3;
  
  mask2 = (0x33 << 8) | 0x33;   // 0x33333333
  mask2 = mask2 | (mask2 << 16);
  mask4 = (0x0F << 8) | 0x0F;   // 0x0F0F0F0F
  mask4 = mask4 | (mask4 << 16);
  mask16 = (0xFF << 8) | 0xFF;
  
  // x = abs(x)
  sign_bits = x>>31;
  is_neg = sign_bits & 1;
  x = (x^sign_bits) + is_neg;

  // round 1
  x = ((x>>2) & mask2) + (x & mask2);
  x = ((x>>4) & mask4) + (x & mask4);
  x = (x>>8) + x;
  x = (x>>16) + (x & mask16);   // from 0x???????? to 0b?????? (0~48)

  // round 2, 3, 4
  x = (x>>4) + ((x>>2) & 3) + (x & 3);  // 0b?????? to 0b???? (0~9)
  x = ((x>>2) & 3) + (x & 3);  // 0b???? to 0b??? (0~6)
  x = (x>>2) + x & 3;  // 0b??? to 0b?? (0~3)

  // round 5
  xis3 = x & (x>>1);
  x = ~(xis3 | (xis3 << 1)) & x;

  return (x ^ sign_bits) + is_neg;
}

在偶数位的bit，会对模3结果贡献2；在奇数位的bit，会对模3结果贡献1。
回到了经典的bitCount问题，但是一开始想着分开数，后来发现可以直接加。

注意到：两个奇数位bit相加，若进位到偶数位，其对结果的贡献总数仍为2。
而两个偶数位bit相加，若进位到奇数位，其对结果的贡献总数仍为1（mod 3）。

所以在round1中（用注释标了round1），一步步把所有的bit都加起来，但是不把相邻的2个bit加起来（这样会破坏模3结果）。
这样一来，我们会得到一个6位的数。
然后我用round2、3、4来把这个6位的数压缩到3位，最后在round5当中想办法计算出最后的结果。

另外，对于负数的运算，我自己试了试发现就是相反数的运算结果取负，加了些代码来处理。

P16 float_neg

//P16
/* 
 * float_neg - Return bit-level equivalent of expression -f for
 *   floating point argument f.
 *   Both the argument and result are passed as unsigned int's, but
 *   they are to be interpreted as the bit-level representations of
 *   single-precision floating point values.
 *   When argument is NaN, return argument.
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 */
unsigned float_neg(unsigned uf) {
  unsigned exponent = uf & 0x7f800000;
  if ((exponent == 0x7f800000) && !!(uf & 0x7fffff))  return uf;
  else return uf ^ 0x80000000;
}

符号位异或一个1就行，但是对于NaN（exp全为1，小数部分不全为0）直接返回自身。

P17 float_i2f

//P17
/* 
 * float_i2f - Return bit-level equivalent of expression (float) x
 *   Result is returned as unsigned int, but
 *   it is to be interpreted as the bit-level representation of a
 *   single-precision floating point values.
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 */
unsigned float_i2f(int x) {
  unsigned sign_bit = 0, exponent = 0, significand = 0;
  unsigned s_digits;   // significant digits
  unsigned abs_x;
  unsigned carry;

  if (x & 0x80000000) {
    x = -x;
    sign_bit = 0x80000000;
  }
  abs_x = x;

  s_digits = 32;
  while (--s_digits && !(x & (1 << s_digits)))  ;
  if (!s_digits) {
    if (x)
      return sign_bit | 0x3F800000;
    else
      return 0;
  }

  exponent = (s_digits + 127) << 23;

  significand = abs_x << (32-s_digits);   // remove highest bit
  carry = significand & 0x1FF;            // part to be rounded
  significand = significand >> 9;
  if (carry >= 0x100) {
    if (carry > 0x100)                    // >100.. +1
      significand += 1;
    else
      significand += significand & 1;     // =100.. depend on significand & 1
    if (significand & 0x800000) {
      exponent += 0x00800000;
      significand &= 0x7FFFFF;
    }
  }

  // printf("%0x %0x %0x\n", sign_bit>>31, exponent>>23, significand);
  return sign_bit | exponent | significand;
}

做了比较久。整体框架如下：

取绝对值
计算最高位位置（这里把1和0的情况做了特殊处理，直接返回结果）
移除最高位、计算significant
根据舍入规则，修改significant以及exponent
将sign_bit、exponent、significand组合在一起返回

其中主要难点在于舍入部分，自己尝试的时候并没有尝试出舍去部分为0x10时，应该算进1还是不算。
然后发现是自己没有好好看书，书上说IEEE定义了四种round规则，其中默认使用向偶数舍入。所以如果significant的最低位为1，就进1补全成一个偶数，否则就不进位。

P18 float_twice

//P18
/* 
 * float_twice - Return bit-level equivalent of expression 2*f for
 *   floating point argument f.
 *   Both the argument and result are passed as unsigned int's, but
 *   they are to be interpreted as the bit-level representation of
 *   single-precision floating point values.
 *   When argument is NaN, return argument
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 */
unsigned float_twice(unsigned uf) {
  // turn uf into familiar format
  unsigned sign = uf & 0x80000000;
  unsigned exponent = uf & 0x7f800000;
  unsigned significand = uf & 0x7FFFFF;

  if (exponent == 0x7f800000) return uf;    // When argument is NaN, return argument; When argument is Inf, return Inf
  else if (exponent == 0)             // argument is denorm
    significand <<= 1;                // significand *= 2
  else exponent += 0x00800000;                 // argument is norm

  return sign | exponent | significand;
}

实验感受

如说明pdf中所言，这次实验我确实（大大地）加深了对位运算的理解，以及学到了一些算法知识（或许应该叫做分而治之的思想）。
虽然做题的时候确实是非常地折磨，但是做完了之后，还是很珍惜这次锻炼的机会，很有成就感。

但是有一个小小的吐槽，就是为什么（只有）bitReverse的操作符数量卡得如此地紧，优化的难度好高。在室友启发我之前，我想了好久好久都没有想出来这里还有什么优化的地方（确实是我的思维太局限了）。
不过硬要说的话，这一问卡了这么久也让我深深地记住了复用这个方法在优化时的含金量……

最后，十分感谢ICS的助教大大们带来全新版本的DataLab！