阅读视图

CSAPP Cache Lab II: Optimizing Matrix Transposition

2026年2月5日 08:00

In this part of the Cache Lab, the mission is simple yet devious: optimize matrix transposition for three specific sizes: 32x32, 64x64, and 61x67. Our primary enemy? Cache misses.

Matrix Transposition

A standard transposition swaps rows and columns directly:

void trans(int M, int N, int A[N][M], int B[M][N])
{
    int i, j, tmp;

    for (i = 0; i < N; i++) {
        for (j = 0; j < M; j++) {
            tmp = A[i][j];
            B[j][i] = tmp;
        }
    }    

}

While correct, this approach is a cache-miss nightmare because it ignores how data is actually stored in memory.

Cache Overview

To optimize effectively, we first have to understand our hardware constraints. The lab specifies a directly mapped cache with the following parameters:

Parameter	Value
Sets (S)	32
Block Size (B)	32 bytes
Associativity (E)	1 (Direct-mapped)
Integer Size	4 bytes
Capacity per line	8 integers

We will use Matrix Tiling and Loop Unrolling to optimize the codes.

32x32 Case

In this case, a row of the matrix needs 32/8 = 4 sets of cache to store. And cache conflicts occur every 32/4 = 8 rows. This makes 8x8 tiling the sweet spot.

By processing the matrix in $8 \times 8$ blocks, we ensure that once a line of A is loaded, we use all 8 integers before it gets evicted. We also use loop unrolling with 8 local variables to minimize the overhead of accessing B.

int i,j,k;
int tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8;
for(i = 0; i<N; i+=8){
    for(j = 0; j<M; j+=8){
        for(k = i; k<N && k<i+8; k++) {
          // Read row from A
            tmp1 = A[k][j];
            tmp2 = A[k][j+1];
            tmp3 = A[k][j+2];
            tmp4 = A[k][j+3];
            tmp5 = A[k][j+4];
            tmp6 = A[k][j+5];
            tmp7 = A[k][j+6];
            tmp8 = A[k][j+7];

          // Write to columns of B
            B[j][k] = tmp1;
            B[j+1][k] = tmp2;
            B[j+2][k] = tmp3;
            B[j+3][k] = tmp4;
            B[j+4][k] = tmp5;
            B[j+5][k] = tmp6;
            B[j+6][k] = tmp7;
            B[j+7][k] = tmp8;
        }
    }
}

61x67 Case

Since 61 and 67 are not powers of two, the conflict misses don’t occur in a regular pattern like they do in the square matrices. This “irregularity” is actually a blessing. We can get away with simple tiling. A 16x16 block size typically yields enough performance to pass the miss-count threshold.

int BLOCK_SIZE = 16;
int i,j,k,l,tmp;
int a,b;
for(i = 0; i<N; i+=BLOCK_SIZE){
    for(j = 0; j<M; j+=BLOCK_SIZE){
        a = i+BLOCK_SIZE;
        b = j+BLOCK_SIZE;
        for(k = i; k<N && k<a; k++) {
            for(l = j; l<M && l<b; l++){
                tmp = A[k][l];
                B[l][k] = tmp;
            }
        }
    }
}

64x64 Case

This is the hardest part. In a 64x64 matrix, a row needs 8 sets, but conflict misses occur every $32/8 = 4$ rows. If we use 8x8 tiling, the bottom half of the block will evict the top half.

We can try a 4x4 matrix tiling first.

int BLOCK_SIZE = 4;
int i,j,k,l,tmp;
int a,b;
for(i = 0; i<N; i+=BLOCK_SIZE){
    for(j = 0; j<M; j+=BLOCK_SIZE){
        a = i+BLOCK_SIZE;
        b = j+BLOCK_SIZE;
        for(k = i; k<N && k<a; k++) {
            for(l = j; l<M && l<b; l++){
                tmp = A[k][l];
                B[l][k] = tmp;
            }
        }
    }
}

But this isn’t enough to pass the miss-count threshold.

We try a 8x8 matrix tiling. We solve this by partitioning the $8 \times 8$ block into four $4 \times 4$ sub-blocks and using the upper-right corner of B as a “buffer” to store data temporarily.

$\text{Block } A = \begin{pmatrix} A_{TL} & A_{TR} \\ A_{BL} & A_{BR} \end{pmatrix} \quad \xrightarrow{\text{Transpose}} \quad \text{Block } B = \begin{pmatrix} A_{TL}^T & A_{BL}^T \\ A_{TR}^T & A_{BR}^T \end{pmatrix}$

Here are the steps:

Transpose $A_{TL}$ into $B_{TL}$ while simultaneously moving $A_{TR}$ into $B_{TR}$ (as a temp storage).
Move the stored $A_{TR}$ from $B_{TR}$ to its final position, while moving $A_{BL}$ into its spot.
Transpose $A_{BR}$ into $B_{BR}$ .

int i, j, k;
int tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8;

// Iterate through the matrix in 8x8 blocks to improve spatial locality
for (i = 0; i < N; i += 8) {
    for (j = 0; j < M; j += 8) {
        
        /**
         * STEP 1: Handle the top half of the 8x8 block (rows i to i+3)
         */
        for (k = 0; k < 4; k++) {
            // Read 8 elements from row i+k of matrix A into registers
            tmp1 = A[i + k][j];     tmp2 = A[i + k][j + 1];
            tmp3 = A[i + k][j + 2]; tmp4 = A[i + k][j + 3]; // Top-left 4x4
            tmp5 = A[i + k][j + 4]; tmp6 = A[i + k][j + 5];
            tmp7 = A[i + k][j + 6]; tmp8 = A[i + k][j + 7]; // Top-right 4x4

            // Transpose top-left 4x4 from A directly into top-left of B
            B[j][i + k]     = tmp1;
            B[j + 1][i + k] = tmp2;
            B[j + 2][i + k] = tmp3;
            B[j + 3][i + k] = tmp4;

            // Temporarily store top-right 4x4 of A in the top-right of B
            // This avoids cache misses by using the already-loaded cache line in B
            B[j][i + k + 4]     = tmp5;
            B[j + 1][i + k + 4] = tmp6;
            B[j + 2][i + k + 4] = tmp7;
            B[j + 3][i + k + 4] = tmp8;
        }

        /**
         * STEP 2: Handle the bottom half and fix the temporary placement
         */
        for (k = 0; k < 4; k++) {
            // Read bottom-left 4x4 column-wise from A
            tmp1 = A[i + 4][j + k]; tmp2 = A[i + 5][j + k];
            tmp3 = A[i + 6][j + k]; tmp4 = A[i + 7][j + k];
            
            // Read bottom-right 4x4 column-wise from A
            tmp5 = A[i + 4][j + k + 4]; tmp6 = A[i + 5][j + k + 4];
            tmp7 = A[i + 6][j + k + 4]; tmp8 = A[i + 7][j + k + 4];

            // Retrieve the top-right elements we temporarily stored in B in Step 1
            int t1 = B[j + k][i + 4];
            int t2 = B[j + k][i + 5];
            int t3 = B[j + k][i + 6];
            int t4 = B[j + k][i + 7];

            // Move bottom-left of A into the top-right of B
            B[j + k][i + 4] = tmp1;
            B[j + k][i + 5] = tmp2;
            B[j + k][i + 6] = tmp3;
            B[j + k][i + 7] = tmp4;

            // Move the retrieved temporary values into the bottom-left of B
            B[j + k + 4][i]     = t1;
            B[j + k + 4][i + 1] = t2;
            B[j + k + 4][i + 2] = t3;
            B[j + k + 4][i + 3] = t4;

            // Place bottom-right of A into the bottom-right of B
            B[j + k + 4][i + 4] = tmp5;
            B[j + k + 4][i + 5] = tmp6;
            B[j + k + 4][i + 6] = tmp7;
            B[j + k + 4][i + 7] = tmp8;
        }
    }
}

Note: The key trick here is traversing B by columns where possible (so B stays right in the cache) and utilizing local registers (temporary variables) to bridge the gap between conflicting cache lines.

Conclusion

Optimizing matrix transposition is less about the math and more about mechanical sympathy—understanding the underlying hardware to write code that plays nice with the CPU’s cache.

The jump from the naive version to these optimized versions isn’t just a marginal gain; it’s often a 10x reduction in cache misses. It serves as a stark reminder that in systems programming, how you access your data is just as important as the algorithm itself.

CSAPP Cache Lab I: Let's simulate a cache memory!

Louis Aeilot's Blog

Louis C Deng

2026年2月5日 00:45

For the CSAPP Cache Lab, the students are asked to write a small C program (200~300 lines) that simulates a cache memory.

The full code is here on GitHub.

Understanding a Cache

1. The Anatomy of a Cache ( $S$ , $E$ , $B$ , $m$ )

A cache can be described with the following four parameters:

$S = 2^s$ (Cache Sets): The cache is divided into sets.
$E$ (Cache Lines per set): This is the “associativity.”
- If $E=1$ , it’s a direct-mapped cache. If $E>1$ , it’s set-associative.
- Each line contains a valid bit, a tag, and the actual data block.
$B = 2^b$ (Block Size): The number of bytes stored in each line.
- The $b$ bits at the end of an address tell the cache the offset within that block.
$m$ : The bits of the machine memory address.

2. Address Decomposition

When the CPU wants to access a 64-bit address, the cache doesn’t look at the whole number at once. It slices the address into three distinct fields:

Field	Purpose
Tag	Used to uniquely identify the memory block within a specific set. `t = m - b - s`
Set Index	Determines which set the address maps to.
Block Offset	Identifies the specific byte within the cache line.

3. The “Search and Match” Process

When our simulator receives an address (e.g., from an L or S operation in the trace file), it follows these steps:

Find the Set: Use the set index bits to jump to the correct set in our cache structure.
Search the Lines: Look through all the lines in that set.

Hit: If a line has valid == true AND the tag matches the address tag.
Miss: If no line matches.

Handle the Miss:

Cold Start: If there is an empty line (valid == false), fill it with the new tag and set valid = true.
Eviction: If all lines are full, we must kick one out. This is where the LRU (Least Recently Used) policy comes in: we find the line that hasn’t been touched for the longest time and replace it.

Lab Requirements

For this Lab Project, we will write a cache simulator that takes a valgrind memory trace as an input.

Input

The input looks like:

I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,8

Each line denotes one or two memory accesses. The format of each line is

1	[space]operation address,size

The operation field denotes the type of memory access:

“I” denotes an instruction load, “L” a data load,
“S” a data store
“M” a data modify (i.e., a data load followed by a data store).

Mind you: There is never a space before each “I”. There is always a space before each “M”, “L”, and “S”.

The address field specifies a 64-bit hexadecimal memory address. The size field specifies the number of bytes accessed by the operation.

CLI

Our program should take the following command line arguments:

Usage: ./csim-ref [-hv] -s <s> -E <E> -b <b> -t <tracefile>

-h: Optional help flag that prints usage info
-v: Optional verbose flag that displays trace info
-s <s>: Number of set index bits (S = 2s is the number of sets)
-E <E>: Associativity (number of lines per set)
-b <b>: Number of block bits (B = 2b is the block size)
-t <tracefile>: Name of the valgrind trace to replay

Caveats

For this lab, we ignore all Is (the instruction cache accesses).

We assume that memory accesses are aligned properly, such that a single memory access never crosses block boundaries.

The Codes

We basically start from scratch, given an almost blank csim.c file to fill in. The file comes with only a main function and no header files.

Data Models

// Data Model
char* fileName = NULL;
int set_bit = -1;
long long sets = -1;
int associativity = -1;
int block_bit = -1;
long long block_size = -1;
bool verboseMode = false;

int global_timer = 0; // For LRU

int memory_bit = 64; // Assuming 64-bit addresses
int tag_bit = 0; // Tag bits

Handling Command-Line Arguments

First, we add the int argc, char** argv parameters to the main function. argc stands for argument count, while argv stands for argument values.

We use getopt to parse arguments.

void handleArgs(int argc, char** argv){
    int opt;

    while ((opt = getopt(argc, argv, "hvs:E:b:t:")) != -1) {
        switch(opt) {
            case 'h':
                printUsage(argv);
                exit(0);
            case 'v':
                verboseMode = true;
                break;
            case 't':
                fileName = optarg;
                break;
            case 's':
                set_bit = atoi(optarg);
                break;
            case 'E':
                associativity = atoi(optarg);
                break;
            case 'b':
                block_bit = atoi(optarg);
                break;
            case '?':
                printUsage(argv);
                exit(1);
            default:
                exit(1); 
        }
    }

    if(fileName == NULL || set_bit == -1 || associativity == -1 || block_bit == -1) {
        printf("Missing required command line argument");
        printUsage(argv);
        exit(1);
    }

    sets = 1LL << set_bit;
    block_size = 1LL << block_bit;
    
    tag_bit = memory_bit - (set_bit + block_bit);
}

getopt comes in unistd.h, but the compiler option is set to -std=c99, which hides all POSIX extensions. GNU systems provide a standalone <getopt.h> header. So we include getopt.h instead.

1	opt = getopt(argc, argv, "hvs:E:b:t:")

h and v: These are boolean flags.
s:, E:, b:, and t:: These are required arguments. The colon tells getopt that these flags must be followed by a value (e.g., -s 4).

After parsing the arguments, we set the initial value of our Cache Data Model.

sets = 1LL << set_bit;
block_size = 1LL << block_bit;

tag_bit = memory_bit - (set_bit + block_bit);

Initialize Cache

// Cache Line Structure
typedef struct CacheLine {
    bool valid;
    long long tag;
    /*
        Need LRU stamp to implement LRU eviction policy
    */
    int lru_counter;
} CacheLine;

CacheLine** cache = NULL;

void initCache() {
    // Initialize cache data structures
    cache = (CacheLine**) malloc(sizeof(CacheLine*) * sets);
    for(int i = 0; i<sets; i++){
        cache[i] = (CacheLine*) calloc(associativity, sizeof(CacheLine));
    }   
}

Caution: malloc has to be initialized. Or the data might contain garbage values.

So we use calloc. The calloc (stands for contiguous allocation) function is similar to malloc but it initializes the allocated memory to zero.

And don’t forget to free the allocated memory!

void freeCache() {
    // Free allocated memory for cache
    for(int i = 0; i<sets; i++) free(cache[i]);
    free(cache);
}

Handling File Input

  // Handle trace file
  FILE *traceFile = fopen(fileName, "r");
  if (traceFile == NULL) {
      printf("Error opening file: %s\n", fileName);
      exit(1);
  }
  char operation;
  long long address;
  int size;
  while (fscanf(traceFile, " %c %llx,%d", &operation, &address, &size) == 3) {
      switch (operation) {
          case 'L':
              // Handle load operation
              loadData(address, size);
              break;
          case 'S':
              // Handle store operation
              storeData(address, size);
              break;
          case 'M':
              // Handle modify operation
              modifyData(address, size);
              break;
          default:
              // Ignore other operations
              break;
      }
  }
  // Close trace file
fclose(traceFile);

Caution:

fscanf does not skip spaces before %c, so we add a space before %c in the format string.
!feof(traceFile) does not work correctly here.It only returns true after a read operation has already attempted to go past the end of the file and failed. Using it as a loop condition (e.g., while (!feof(p))) causes an “off-by-one” error, where the loop executes one extra time with garbage data from the last successful read.

Parsing Addresses

// Parse Line Structure
long long getTag(long long address) {
    return address >> (set_bit + block_bit);
}

long long getSetIndex(long long address) {
    long long mask = (1LL << set_bit) - 1;
    return (address >> block_bit) & mask;
}

long long getBlockOffset(long long address) {
    long long mask = (1LL << block_bit) - 1;
    return address & mask;
}

We use bit masks to parse the addresses.

Loading Cache

void loadData(long long address, int size) {
    // Simulate accessing data at the given address
    int s = getSetIndex(address);
    long long t = getTag(address);
    global_timer++;

    for (int i = 0; i < associativity; i++) {
        if (cache[s][i].valid && cache[s][i].tag == t) {
            hit_count++;
            cache[s][i].lru_counter = global_timer;
            if (verboseMode) printf(" hit");
            return;
        }
    }

    miss_count++;
    if (verboseMode) printf(" miss");

    for (int i = 0; i < associativity; i++) {
        if (!cache[s][i].valid) {
            cache[s][i].valid = true;
            cache[s][i].tag = t;
            cache[s][i].lru_counter = global_timer; 
            return;
        }
    }

    eviction_count++;
    if (verboseMode) printf(" eviction");

    int victim_index = 0;
    int min_lru = cache[s][0].lru_counter;

    for (int i = 1; i < associativity; i++) {
        if (cache[s][i].lru_counter < min_lru) {
            min_lru = cache[s][i].lru_counter;
            victim_index = i;
        }
    }

    cache[s][victim_index].tag = t;
    cache[s][victim_index].lru_counter = global_timer;
}

The code simulates the process of loading cache.

We first check if the data already exists in the cache.

If it doesn’t exist, we have to scan for blank lines to load the data.

If blank lines don’t exist, we need to evict a line using the LRU strategy. We replace the victim line with the new line.

Other Operations

void storeData(long long address, int size) {
    // Simulate storing data at the given address
    loadData(address, size);
}

void modifyData(long long address, int size) {
    // Simulate modifying data at the given address
    loadData(address, size);
    hit_count++;
    if (verboseMode) printf(" hit\n");
}

For this simulator, storing data and modifying data are basically the same thing as loading data.

Print Summary

We are asked to output the answer using the printSummary function.

1 2	// Print Summary printSummary(hit_count, miss_count, eviction_count);

And Voila!

                        Your simulator     Reference simulator
Points (s,E,b)    Hits  Misses  Evicts    Hits  Misses  Evicts
     3 (1,1,1)       9       8       6       9       8       6  traces/yi2.trace
     3 (4,2,4)       4       5       2       4       5       2  traces/yi.trace
     3 (2,1,4)       2       3       1       2       3       1  traces/dave.trace
     3 (2,1,3)     167      71      67     167      71      67  traces/trans.trace
     3 (2,2,3)     201      37      29     201      37      29  traces/trans.trace
     3 (2,4,3)     212      26      10     212      26      10  traces/trans.trace
     3 (5,1,5)     231       7       0     231       7       0  traces/trans.trace
     6 (5,1,5)  265189   21775   21743  265189   21775   21743  traces/long.trace
    27

Summary

In this project, we moved from the theory of hierarchy to the practical reality of memory management. By building this simulator, we reinforced several core concepts of computer systems.

With our simulator passing all the trace tests, we’ve effectively mirrored how a CPU “thinks” about memory. The next step is applying these insights to optimize actual code, ensuring our algorithms play nicely with the hardware we’ve just simulated.

CSAPP Bomb Lab 解析

Louis Aeilot's Blog

Louis C Deng

2025年12月21日 02:45

做完了 CSAPP Bomb Lab，寫一篇解析。

題目要求

運行一個二進制文件 bomb，它包括六個"階段(phase)“，每個階段要求學生通過 stdin 輸入一個特定的字串。如果輸入了預期的字串，那麼該階段被"拆除”，進入下一個階段，直到所有炸彈被成功"拆除"。否則，炸彈就會"爆炸"，列印出"BOOM!!!"

環境

這個系統是在 x86_64 Linux 上運行的，而筆者的環境是 ARM 架構的 macOS (Apple Silicon)。

弄了半天 docker，虛擬化一個 x86_64 Ubuntu 出來，結果裡面的 gdb 不能用，不想折騰。

發現 educoder 上面有環境，可以直接用，而且免費，於是就在 educoder 上面完成了本實驗。

地址：https://www.educoder.net/paths/6g398fky

前置知識

本實驗要求掌握 gdb 的一些指令。

1. 啟動與退出 (Startup & Exit)

指令	縮寫	描述
`gdb executable`	-	啟動 GDB 並載入可執行文件。
`run [args]`	`r`	開始運行程序。如果有命令行參數，跟在後面（如 `r input.txt`）。
`quit`	`q`	退出 GDB。
`start`	-	運行程序並在 `main` 函數的第一行自動暫停（省去手動打斷點的麻煩）。
`set args ...`	-	設置運行時的參數（在 `r` 之前使用）。

2. 斷點管理 (Breakpoints)

指令	縮寫	描述	範例
`break <loc>`	`b`	設置斷點。支持函數名、行號、檔案名:行號。	`b main` `b 15` `b file.c:20`
`info breakpoints`	`i b`	查看當前所有斷點及其編號 (Num)。	-
`delete <Num>`	`d`	刪除指定編號的斷點。不加編號則刪除所有。	`d 1`
`disable/enable <Num>`	-	暫時禁用或啟用某個斷點（保留配置但不生效）。	`disable 2`
`break ... if <cond>`	-	條件斷點：僅當條件為真時才暫停（非常有用）。	`b 10 if i==5`

3. 執行控制 (Execution Control)

指令	縮寫	描述	區別點
`next`	`n`	單步跳過。執行下一行程式碼。	如果遇到函數調用，不進入函數內部，直接執行完該函數。
`step`	`s`	單步進入。執行下一行程式碼。	如果遇到函數調用，進入函數內部逐行除錯。
`continue`	`c`	繼續運行，直到遇到下一個斷點或程序結束。	-
`finish`	-	執行直到當前函數返回。	當你不小心 `s` 進了一個不想看的庫函數時，用這個跳出來。
`until <line>`	`u`	運行直到指定行號。	常用於快速跳出循環。

4. 查看數據 (Inspection)

指令	縮寫	描述
`print <var>`	`p`	列印變數的值。支持表達式（如 `p index + 1`）。
`display <var>`	-	持續顯示。每次程序暫停時，自動列印該變數的值（適合跟蹤循環中的變數）。
`info locals`	-	列印當前棧幀中所有局部變數的值。
`whatis <var>`	-	查看變數的數據類型。
`ptype <struct>`	-	查看結構體或類的具體定義（成員列表）。
`x /nfu <addr>`	`x`	查看記憶體。`n`是數量，`f`是格式(x=hex, d=dec, s=str)，`u`是單位(b=byte, w=word)。例如：`x/10xw &array` (以16進制顯示數組前10個word)。

5. 堆棧與上下文 (Stack & Context)

指令	縮寫	描述
`backtrace`	`bt`	查看調用棧。顯示程序崩潰時的函數調用路徑（從 main 到當前函數）。
`frame <Num>`	`f`	切換到指定的堆棧幀（配合 `bt` 看到的編號）。切換後可以用 `p` 查看該層函數的局部變數。
`list`	`l`	顯示當前行附近的原始碼。

6. 提升體驗：TUI 模式 (Text User Interface)

layout src：螢幕分為兩半，上面顯示原始碼和當前執行行，下面是命令窗口。（強烈推薦）
layout asm：顯示匯編代碼。
layout split：同時顯示原始碼和匯編。

反匯編

我們可以使用 objdump 直接進行反匯編，查看匯編原始碼。

1	objdump -d bomb > bomb.asm

我們可以觀察到，幾個 phase 其實是幾個函數，phase_x()。

strings

在終端輸入：

1	strings bomb

這會把 bomb 文件裡所有連續的可列印字元（ASCII）都列印出來。

Phase 1

我們先看看 phase_1 長什麼樣子，disas phase_1

Dump of assembler code for function phase_1:
   0x0000000000400ee0 <+0>:     sub    $0x8,%rsp
   0x0000000000400ee4 <+4>:     mov    $0x402400,%esi
   0x0000000000400ee9 <+9>:     callq  0x401338 <strings_not_equal>
   0x0000000000400eee <+14>:    test   %eax,%eax
   0x0000000000400ef0 <+16>:    je     0x400ef7 <phase_1+23>
   0x0000000000400ef2 <+18>:    callq  0x40143a <explode_bomb>
   0x0000000000400ef7 <+23>:    add    $0x8,%rsp
   0x0000000000400efb <+27>:    retq   
End of assembler dump.

sub $0x8,%rsp 是設置棧幀，在這裡不用管。

mov $0x402400,%esi 和 callq 0x401338 <strings_not_equal> 似乎進行了字串的 strcmp。

接下來 je 0x400ef7 <phase_1+23> 就很明顯了，如果相等跳出炸彈。

設置斷點，b phase_1

之後運行程序，r，隨便輸入一些內容，就可以觸發斷點

以字串形式查看 0x402400 所指向的記憶體：x/s 0x402400

1	0x402400: "Border relations with Canada have never been better."

我們找到了答案。

Phase 2

還是先反匯編：

Dump of assembler code for function phase_2:
   0x0000000000400efc <+0>:     push   %rbp
   0x0000000000400efd <+1>:     push   %rbx
   0x0000000000400efe <+2>:     sub    $0x28,%rsp
   0x0000000000400f02 <+6>:     mov    %rsp,%rsi
   0x0000000000400f05 <+9>:     callq  0x40145c <read_six_numbers>
   0x0000000000400f0a <+14>:    cmpl   $0x1,(%rsp)
   0x0000000000400f0e <+18>:    je     0x400f30 <phase_2+52>
   0x0000000000400f10 <+20>:    callq  0x40143a <explode_bomb>
   0x0000000000400f15 <+25>:    jmp    0x400f30 <phase_2+52>
   0x0000000000400f17 <+27>:    mov    -0x4(%rbx),%eax
   0x0000000000400f1a <+30>:    add    %eax,%eax
   0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
   0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>
   0x0000000000400f20 <+36>:    callq  0x40143a <explode_bomb>
   0x0000000000400f25 <+41>:    add    $0x4,%rbx
   0x0000000000400f29 <+45>:    cmp    %rbp,%rbx
   0x0000000000400f2c <+48>:    jne    0x400f17 <phase_2+27>
   0x0000000000400f2e <+50>:    jmp    0x400f3c <phase_2+64>
   0x0000000000400f30 <+52>:    lea    0x4(%rsp),%rbx
   0x0000000000400f35 <+57>:    lea    0x18(%rsp),%rbp
   0x0000000000400f3a <+62>:    jmp    0x400f17 <phase_2+27>
   0x0000000000400f3c <+64>:    add    $0x28,%rsp
   0x0000000000400f40 <+68>:    pop    %rbx
   0x0000000000400f41 <+69>:    pop    %rbp
   0x0000000000400f42 <+70>:    retq   
End of assembler dump.

0x0000000000400f05 <+9>: callq 0x40145c <read_six_numbers> 這裡看到 read_six_numbers

我們可以反匯編 read_six_numbers

Dump of assembler code for function read_six_numbers:
   0x000000000040145c <+0>:     sub    $0x18,%rsp
   0x0000000000401460 <+4>:     mov    %rsi,%rdx
   0x0000000000401463 <+7>:     lea    0x4(%rsi),%rcx
   0x0000000000401467 <+11>:    lea    0x14(%rsi),%rax
   0x000000000040146b <+15>:    mov    %rax,0x8(%rsp)
   0x0000000000401470 <+20>:    lea    0x10(%rsi),%rax
   0x0000000000401474 <+24>:    mov    %rax,(%rsp)
   0x0000000000401478 <+28>:    lea    0xc(%rsi),%r9
   0x000000000040147c <+32>:    lea    0x8(%rsi),%r8
   0x0000000000401480 <+36>:    mov    $0x4025c3,%esi
   0x0000000000401485 <+41>:    mov    $0x0,%eax
   0x000000000040148a <+46>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x000000000040148f <+51>:    cmp    $0x5,%eax
   0x0000000000401492 <+54>:    jg     0x401499 <read_six_numbers+61>
   0x0000000000401494 <+56>:    callq  0x40143a <explode_bomb>
   0x0000000000401499 <+61>:    add    $0x18,%rsp
   0x000000000040149d <+65>:    retq   
End of assembler dump.

看到有一行 callq 0x400bf0 <__isoc99_sscanf@plt>，調用了 sscanf

我們看一眼 $0x4025c3，x/s 0x4025c3，得到 %d %d %d %d %d %d，確實是讀了六個數字。

函數調用時，參數多於六個，就會丟到棧裡面去。我們看到：

0x0000000000401460 <+4>:     mov    %rsi,%rdx
0x0000000000401463 <+7>:     lea    0x4(%rsi),%rcx
0x0000000000401467 <+11>:    lea    0x14(%rsi),%rax
0x000000000040146b <+15>:    mov    %rax,0x8(%rsp)
0x0000000000401470 <+20>:    lea    0x10(%rsi),%rax
0x0000000000401474 <+24>:    mov    %rax,(%rsp)
0x0000000000401478 <+28>:    lea    0xc(%rsi),%r9
0x000000000040147c <+32>:    lea    0x8(%rsi),%r8

參數順序：rdi, rsi, rdx, rcx, r8, r9，超過了六個參數。rsp 為棧頂指針，多於六個的參數存在棧上。

於是讀取的六個數字依次存為：rsi, rsi+4, rsi+8, rsi+12, rsi+16 (0x10 = 16), rsi+20 (0x14 = 20)

再回到 phase_2

1	0x0000000000400f02 <+6>: mov %rsp,%rsi

棧頂指針作為參數傳入了 read_six_numbers，因此，這六個數字應該是在 phase_2 對應棧幀的棧上

1
2
3

0x0000000000400f0a <+14>:    cmpl   $0x1,(%rsp)
0x0000000000400f0e <+18>:    je     0x400f30 <phase_2+52>
0x0000000000400f10 <+20>:    callq  0x40143a <explode_bomb>

這裡判斷棧頂元素是否是 1，也就是說第一個元素是否是 1

之後跳轉到了 0x400f30

0x0000000000400f17 <+27>:    mov    -0x4(%rbx),%eax
0x0000000000400f1a <+30>:    add    %eax,%eax
0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>
0x0000000000400f20 <+36>:    callq  0x40143a <explode_bomb>
0x0000000000400f25 <+41>:    add    $0x4,%rbx
0x0000000000400f29 <+45>:    cmp    %rbp,%rbx
0x0000000000400f2c <+48>:    jne    0x400f17 <phase_2+27>
0x0000000000400f2e <+50>:    jmp    0x400f3c <phase_2+64>
0x0000000000400f30 <+52>:    lea    0x4(%rsp),%rbx
0x0000000000400f35 <+57>:    lea    0x18(%rsp),%rbp
0x0000000000400f3a <+62>:    jmp    0x400f17 <phase_2+27>

這裡很顯然是一個循環，依次讀取六個數位（每次移動四個位元組，正好是 int 的長度）

1
2
3

0x0000000000400f1a <+30>:    add    %eax,%eax
0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>

這六個數字，後一個是前一個的兩倍。

於是我們可以得到答案：1 2 4 8 16 32

我們也可以把代碼翻譯成 C 語言：

for (int i = 1; i < 6; i++) {
    // mov -0x4(%rbx), %eax 
    int previous = num[i-1];
    // add %eax, %eax
    int expected = previous + previous; 
    // cmp %eax, (%rbx)
    if (num[i] != expected) {
        explode_bomb();
    }
}

Phase 3

反匯編：

Dump of assembler code for function phase_3:
   0x0000000000400f43 <+0>:     sub    $0x18,%rsp
   0x0000000000400f47 <+4>:     lea    0xc(%rsp),%rcx
   0x0000000000400f4c <+9>:     lea    0x8(%rsp),%rdx
   0x0000000000400f51 <+14>:    mov    $0x4025cf,%esi
   0x0000000000400f56 <+19>:    mov    $0x0,%eax
   0x0000000000400f5b <+24>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x0000000000400f60 <+29>:    cmp    $0x1,%eax
   0x0000000000400f63 <+32>:    jg     0x400f6a <phase_3+39>
   0x0000000000400f65 <+34>:    callq  0x40143a <explode_bomb>
   0x0000000000400f6a <+39>:    cmpl   $0x7,0x8(%rsp)
   0x0000000000400f6f <+44>:    ja     0x400fad <phase_3+106>
   0x0000000000400f71 <+46>:    mov    0x8(%rsp),%eax
   0x0000000000400f75 <+50>:    jmpq   *0x402470(,%rax,8)
   0x0000000000400f7c <+57>:    mov    $0xcf,%eax
   0x0000000000400f81 <+62>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f83 <+64>:    mov    $0x2c3,%eax
   0x0000000000400f88 <+69>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f8a <+71>:    mov    $0x100,%eax
   0x0000000000400f8f <+76>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f91 <+78>:    mov    $0x185,%eax
   0x0000000000400f96 <+83>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f98 <+85>:    mov    $0xce,%eax
   0x0000000000400f9d <+90>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f9f <+92>:    mov    $0x2aa,%eax
   0x0000000000400fa4 <+97>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400fa6 <+99>:    mov    $0x147,%eax
   0x0000000000400fab <+104>:   jmp    0x400fbe <phase_3+123>
   0x0000000000400fad <+106>:   callq  0x40143a <explode_bomb>
   0x0000000000400fb2 <+111>:   mov    $0x0,%eax
   0x0000000000400fb7 <+116>:   jmp    0x400fbe <phase_3+123>
   0x0000000000400fb9 <+118>:   mov    $0x137,%eax
   0x0000000000400fbe <+123>:   cmp    0xc(%rsp),%eax
   0x0000000000400fc2 <+127>:   je     0x400fc9 <phase_3+134>
   0x0000000000400fc4 <+129>:   callq  0x40143a <explode_bomb>
   0x0000000000400fc9 <+134>:   add    $0x18,%rsp
   0x0000000000400fcd <+138>:   retq

看著有點複雜，觀察到 sscanf

看一眼 0x4025cf，x/s 0x4025cf，得到 %d %d，看起來是輸入了兩個整數

1 2	0x0000000000400f47 <+4>: lea 0xc(%rsp),%rcx 0x0000000000400f4c <+9>: lea 0x8(%rsp),%rdx

這兩個整數依次存為 rsp+8, rsp+c

1 2	0x0000000000400f6a <+39>: cmpl $0x7,0x8(%rsp) 0x0000000000400f6f <+44>: ja 0x400fad <phase_3+106>

這裡判斷了第一個數，如果這個數大於 7，就會引爆

1 2	0x0000000000400f71 <+46>: mov 0x8(%rsp),%eax 0x0000000000400f75 <+50>: jmpq *0x402470(,%rax,8)

我們把第一個整數存入 eax，這裡很明顯是一個 switch 的跳轉表：0x402470 + 8*rax

eax 和 rax 實際上是同一個東西，前者是這個暫存器的前 32 位，後者是這個暫存器的完整 64 位，這是歷史遺留產物，實際上，還有 ax, ah, al，為了向後相容而保留。

我們來讀取 10 個，x/10x 0x402470，得到：

1
2
3

0x402470:       0x00400f7c      0x00000000      0x00400fb9      0x00000000
0x402480:       0x00400f83      0x00000000      0x00400f8a      0x00000000
0x402490:       0x00400f91      0x00000000

這是 switch 語句的跳轉表，與匯編代碼中對應。

我們隨便選一個就能得到正確答案，如，0 對應 0x00400f7c

0x0000000000400f7c <+57>:    mov    $0xcf,%eax
0x0000000000400f81 <+62>:    jmp    0x400fbe <phase_3+123>
...
0x0000000000400fbe <+123>:   cmp    0xc(%rsp),%eax
0x0000000000400fc2 <+127>:   je     0x400fc9 <phase_3+134>
0x0000000000400fc4 <+129>:   callq  0x40143a <explode_bomb>

第二個數和 eax 比較，相等就拆除成功

我們得到第二個數 0xcf = 207

於是，答案是 0 207

實際上，答案並不唯一，觀察代碼可以知道，每一個 switch 分支中，都對應了一個第二個整數的正確答案。

Phase 4

反編譯：

Dump of assembler code for function phase_4:
   0x000000000040100c <+0>:     sub    $0x18,%rsp
   0x0000000000401010 <+4>:     lea    0xc(%rsp),%rcx
   0x0000000000401015 <+9>:     lea    0x8(%rsp),%rdx
   0x000000000040101a <+14>:    mov    $0x4025cf,%esi
   0x000000000040101f <+19>:    mov    $0x0,%eax
   0x0000000000401024 <+24>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x0000000000401029 <+29>:    cmp    $0x2,%eax
   0x000000000040102c <+32>:    jne    0x401035 <phase_4+41>
   0x000000000040102e <+34>:    cmpl   $0xe,0x8(%rsp)
   0x0000000000401033 <+39>:    jbe    0x40103a <phase_4+46>
   0x0000000000401035 <+41>:    callq  0x40143a <explode_bomb>
   0x000000000040103a <+46>:    mov    $0xe,%edx
   0x000000000040103f <+51>:    mov    $0x0,%esi
   0x0000000000401044 <+56>:    mov    0x8(%rsp),%edi
   0x0000000000401048 <+60>:    callq  0x400fce <func4>
   0x000000000040104d <+65>:    test   %eax,%eax
   0x000000000040104f <+67>:    jne    0x401058 <phase_4+76>
   0x0000000000401051 <+69>:    cmpl   $0x0,0xc(%rsp)
   0x0000000000401056 <+74>:    je     0x40105d <phase_4+81>
   0x0000000000401058 <+76>:    callq  0x40143a <explode_bomb>
   0x000000000040105d <+81>:    add    $0x18,%rsp
   0x0000000000401061 <+85>:    retq   
End of assembler dump.

我們還是看到 sscanf

讀一下 0x4025cf，得到 %d %d，看起來又是讀兩個數字，分別存入 rdx, rcx

接著往下讀，jbe 0x40103a，要求 rdx <= 14

1
2
3

0x000000000040103a <+46>:    mov    $0xe,%edx
0x000000000040103f <+51>:    mov    $0x0,%esi
0x0000000000401044 <+56>:    mov    0x8(%rsp),%edi

明顯在傳參，調用了 func4

我們先不急著看 func4，接著往下讀

0x000000000040104d <+65>:    test   %eax,%eax
0x000000000040104f <+67>:    jne    0x401058 <phase_4+76>
...
0x0000000000401058 <+76>:    callq  0x40143a <explode_bomb>

回顧一下暫存器知識，eax 在這裡是函數的返回值，這裡要求返回值等於 0

1 2	0x0000000000401051 <+69>: cmpl $0x0,0xc(%rsp) 0x0000000000401056 <+74>: je 0x40105d <phase_4+81>

這裡要求讀取到的第二個數是 0，算是得到了半個答案

接下來我們看 func4

Dump of assembler code for function func4:
   0x0000000000400fce <+0>:     sub    $0x8,%rsp
   0x0000000000400fd2 <+4>:     mov    %edx,%eax
   0x0000000000400fd4 <+6>:     sub    %esi,%eax
   0x0000000000400fd6 <+8>:     mov    %eax,%ecx
   0x0000000000400fd8 <+10>:    shr    $0x1f,%ecx
   0x0000000000400fdb <+13>:    add    %ecx,%eax
   0x0000000000400fdd <+15>:    sar    %eax
   0x0000000000400fdf <+17>:    lea    (%rax,%rsi,1),%ecx
   0x0000000000400fe2 <+20>:    cmp    %edi,%ecx
   0x0000000000400fe4 <+22>:    jle    0x400ff2 <func4+36>
   0x0000000000400fe6 <+24>:    lea    -0x1(%rcx),%edx
   0x0000000000400fe9 <+27>:    callq  0x400fce <func4>
   0x0000000000400fee <+32>:    add    %eax,%eax
   0x0000000000400ff0 <+34>:    jmp    0x401007 <func4+57>
   0x0000000000400ff2 <+36>:    mov    $0x0,%eax
   0x0000000000400ff7 <+41>:    cmp    %edi,%ecx
   0x0000000000400ff9 <+43>:    jge    0x401007 <func4+57>
   0x0000000000400ffb <+45>:    lea    0x1(%rcx),%esi
   0x0000000000400ffe <+48>:    callq  0x400fce <func4>
   0x0000000000401003 <+53>:    lea    0x1(%rax,%rax,1),%eax
   0x0000000000401007 <+57>:    add    $0x8,%rsp
   0x000000000040100b <+61>:    retq   
End of assembler dump.

這個代碼裡面包含遞迴，我們可以手動把這段代碼翻譯到 C 語言：

// edx = 14, esi = 0, edi = a
int func4(int edi, int esi, int edx){
    int mid = l + ((r-l)>>1);
    if(mid <= a){
        if(mid==a){
            return 0;
        }
        l = mid + 1;
        return 2*func4(a, l, r) + 1;
    }else{
        r = mid - 1;
        return 2*func4(a, l, r);
    }
}

這是二分尋找，我們很容易得到答案 a=7，於是返回 0

得到最終的答案 7 0

0x0000000000400fd2 <+4>:     mov    %edx,%eax
0x0000000000400fd4 <+6>:     sub    %esi,%eax
0x0000000000400fd6 <+8>:     mov    %eax,%ecx
0x0000000000400fd8 <+10>:    shr    $0x1f,%ecx
0x0000000000400fdb <+13>:    add    %ecx,%eax
0x0000000000400fdd <+15>:    sar    %eax
0x0000000000400fdf <+17>:    lea    (%rax,%rsi,1),%ecx

這一段代碼就是在計算 mid，非常好理解，但是有個問題：shr $0x1f,%ecx 是在做什麼？

偏置

整數除法要求向零捨入。對於正數，向下捨入；對於負數，向上捨入。除以2的冪可以用右移操作替代。

但是，對於補碼右移，很可能出現捨入錯誤。

我們進行右移的時候，其實是捨去了最低位，是一種向下取整

$x = \underbrace{\sum_{i=k}^{w-1} x_i 2^i}_{\text{高位部分}} + \underbrace{\sum_{i=0}^{k-1} x_i 2^i}_{\text{低位部分}}$

當我們執行右移 x >> k 時：高位部分的權重全部除以了 $2^k$ ，變成了整數結果。低位部分（餘數）直接被丟棄了。

對於負數而言，這一操作進行了向下取整，但我們要求對負數進行向上取整。

因此，我們需要引入偏置。

$\text{對於整數 } x \text{ 和 } y(y>0)，\lceil x/y \rceil = \lfloor (x+y-1)/y \rfloor$

於是 (x+(1<<k)-1)>>k 得到 $\lceil x/2^k \rceil$

也就是下面這兩行的含義

1 2	0x0000000000400fd8 <+10>: shr $0x1f,%ecx 0x0000000000400fdb <+13>: add %ecx,%eax

Phase 5

我們先disas看代碼

Dump of assembler code for function phase_5:
   0x0000000000401062 <+0>:     push   %rbx
   0x0000000000401063 <+1>:     sub    $0x20,%rsp
   0x0000000000401067 <+5>:     mov    %rdi,%rbx
   0x000000000040106a <+8>:     mov    %fs:0x28,%rax
   0x0000000000401073 <+17>:    mov    %rax,0x18(%rsp)
   0x0000000000401078 <+22>:    xor    %eax,%eax
   0x000000000040107a <+24>:    callq  0x40131b <string_length>
   0x000000000040107f <+29>:    cmp    $0x6,%eax
   0x0000000000401082 <+32>:    je     0x4010d2 <phase_5+112>
   0x0000000000401084 <+34>:    callq  0x40143a <explode_bomb>
   0x0000000000401089 <+39>:    jmp    0x4010d2 <phase_5+112>
   0x000000000040108b <+41>:    movzbl (%rbx,%rax,1),%ecx
   0x000000000040108f <+45>:    mov    %cl,(%rsp)
   0x0000000000401092 <+48>:    mov    (%rsp),%rdx
   0x0000000000401096 <+52>:    and    $0xf,%edx
   0x0000000000401099 <+55>:    movzbl 0x4024b0(%rdx),%edx
   0x00000000004010a0 <+62>:    mov    %dl,0x10(%rsp,%rax,1)
   0x00000000004010a4 <+66>:    add    $0x1,%rax
   0x00000000004010a8 <+70>:    cmp    $0x6,%rax
   0x00000000004010ac <+74>:    jne    0x40108b <phase_5+41>
   0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
   0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
   0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
   0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
   0x00000000004010c2 <+96>:    test   %eax,%eax
   0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
   0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
   0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
   0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>
   0x00000000004010d2 <+112>:   mov    $0x0,%eax
   0x00000000004010d7 <+117>:   jmp    0x40108b <phase_5+41>
   0x00000000004010d9 <+119>:   mov    0x18(%rsp),%rax
   0x00000000004010de <+124>:   xor    %fs:0x28,%rax
   0x00000000004010e7 <+133>:   je     0x4010ee <phase_5+140>
   0x00000000004010e9 <+135>:   callq  0x400b30 <__stack_chk_fail@plt>
   0x00000000004010ee <+140>:   add    $0x20,%rsp
   0x00000000004010f2 <+144>:   pop    %rbx
   0x00000000004010f3 <+145>:   retq   
End of assembler dump.

很快識別出來，這一段代碼中有兩個記憶體地址：0x4024b0 0x40245e

讀一下：

1 2	0x4024b0 <array.3449>: "maduiersnfotvbylSo you think you can stop the bomb with ctrl-c, do you?" 0x40245e: "flyers"

第一個 array.3449 是一個字串，我們就記為 a[]

上面的代碼可以分個段

0x0000000000401062 <+0>:     push   %rbx
0x0000000000401063 <+1>:     sub    $0x20,%rsp
0x0000000000401067 <+5>:     mov    %rdi,%rbx
0x000000000040106a <+8>:     mov    %fs:0x28,%rax
0x0000000000401073 <+17>:    mov    %rax,0x18(%rsp)
0x0000000000401078 <+22>:    xor    %eax,%eax
0x000000000040107a <+24>:    callq  0x40131b <string_length>
0x000000000040107f <+29>:    cmp    $0x6,%eax
0x0000000000401082 <+32>:    je     0x4010d2 <phase_5+112>
0x0000000000401084 <+34>:    callq  0x40143a <explode_bomb>
0x0000000000401089 <+39>:    jmp    0x4010d2 <phase_5+112>

這裡是前面初始化的部分，我們可以看到預留了棧空間，應該是讀取了一個字串，長度為 6，存在棧上。

0x00000000004010d2 <+112>:   mov    $0x0,%eax
0x00000000004010d7 <+117>:   jmp    0x40108b <phase_5+41>
...
0x000000000040108b <+41>:    movzbl (%rbx,%rax,1),%ecx
0x000000000040108f <+45>:    mov    %cl,(%rsp)
0x0000000000401092 <+48>:    mov    (%rsp),%rdx
0x0000000000401096 <+52>:    and    $0xf,%edx
0x0000000000401099 <+55>:    movzbl 0x4024b0(%rdx),%edx
0x00000000004010a0 <+62>:    mov    %dl,0x10(%rsp,%rax,1)
0x00000000004010a4 <+66>:    add    $0x1,%rax
0x00000000004010a8 <+70>:    cmp    $0x6,%rax
0x00000000004010ac <+74>:    jne    0x40108b <phase_5+41>

以上是一個 for 循環，循環 6 次，取 edx 的後四位，這是一個 0~15 的數，記為 i，於是把 a[i] 加入棧中對應位置

0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
0x00000000004010c2 <+96>:    test   %eax,%eax
0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>
...
0x00000000004010d9 <+119>:   mov    0x18(%rsp),%rax
0x00000000004010de <+124>:   xor    %fs:0x28,%rax
0x00000000004010e7 <+133>:   je     0x4010ee <phase_5+140>
0x00000000004010e9 <+135>:   callq  0x400b30 <__stack_chk_fail@plt>
0x00000000004010ee <+140>:   add    $0x20,%rsp
0x00000000004010f2 <+144>:   pop    %rbx
0x00000000004010f3 <+145>:   retq

這裡有價值的片段只有

0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
0x00000000004010c2 <+96>:    test   %eax,%eax
0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>

這是比較字串。

我們不難發現，這道題的邏輯是查表映射：程序會把輸入字元對 16 取模得到的數值作為索引，去尋找那個長字串（maduiers…）中的字元。為了讓最終取出來的字元拼成 flyers，我們需要反向尋找 flyers 中每個字母在表中對應的下標位置，然後構造一個輸入字串，使其每一位的 ASCII 碼模 16 後正好等於這些下標。

這個過程可以總結為： Input Char -> ASCII Hex -> AND 0xF (取後4位) -> Table Index -> Lookup Table Char -> Target “flyers”

於是我們可以得到答案 ionefg 或者 IONEFG

其實還可以有一些其他答案，留給讀者去發現

Phase 6

先看代碼

0x00000000004010f4 <+0>:     push   %r14
0x00000000004010f6 <+2>:     push   %r13
0x00000000004010f8 <+4>:     push   %r12
0x00000000004010fa <+6>:     push   %rbp
0x00000000004010fb <+7>:     push   %rbx
0x00000000004010fc <+8>:     sub    $0x50,%rsp
0x0000000000401100 <+12>:    mov    %rsp,%r13
0x0000000000401103 <+15>:    mov    %rsp,%rsi
0x0000000000401106 <+18>:    callq  0x40145c <read_six_numbers>
0x000000000040110b <+23>:    mov    %rsp,%r14
0x000000000040110e <+26>:    mov    $0x0,%r12d
0x0000000000401114 <+32>:    mov    %r13,%rbp
0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
0x0000000000401121 <+45>:    jbe    0x401128 <phase_6+52>
0x0000000000401123 <+47>:    callq  0x40143a <explode_bomb>
0x0000000000401128 <+52>:    add    $0x1,%r12d
0x000000000040112c <+56>:    cmp    $0x6,%r12d
0x0000000000401130 <+60>:    je     0x401153 <phase_6+95>
0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>
0x0000000000401153 <+95>:    lea    0x18(%rsp),%rsi
0x0000000000401158 <+100>:   mov    %r14,%rax
0x000000000040115b <+103>:   mov    $0x7,%ecx
0x0000000000401160 <+108>:   mov    %ecx,%edx
0x0000000000401162 <+110>:   sub    (%rax),%edx
0x0000000000401164 <+112>:   mov    %edx,(%rax)
0x0000000000401166 <+114>:   add    $0x4,%rax
0x000000000040116a <+118>:   cmp    %rsi,%rax
0x000000000040116d <+121>:   jne    0x401160 <phase_6+108>
0x000000000040116f <+123>:   mov    $0x0,%esi
0x0000000000401174 <+128>:   jmp    0x401197 <phase_6+163>
0x0000000000401176 <+130>:   mov    0x8(%rdx),%rdx
0x000000000040117a <+134>:   add    $0x1,%eax
0x000000000040117d <+137>:   cmp    %ecx,%eax
0x000000000040117f <+139>:   jne    0x401176 <phase_6+130>
0x0000000000401181 <+141>:   jmp    0x401188 <phase_6+148>
0x0000000000401183 <+143>:   mov    $0x6032d0,%edx
0x0000000000401188 <+148>:   mov    %rdx,0x20(%rsp,%rsi,2)
0x000000000040118d <+153>:   add    $0x4,%rsi
0x0000000000401191 <+157>:   cmp    $0x18,%rsi
0x0000000000401195 <+161>:   je     0x4011ab <phase_6+183>
0x0000000000401197 <+163>:   mov    (%rsp,%rsi,1),%ecx
0x000000000040119a <+166>:   cmp    $0x1,%ecx
0x000000000040119d <+169>:   jle    0x401183 <phase_6+143>
0x000000000040119f <+171>:   mov    $0x1,%eax
0x00000000004011a4 <+176>:   mov    $0x6032d0,%edx
0x00000000004011a9 <+181>:   jmp    0x401176 <phase_6+130>
0x00000000004011ab <+183>:   mov    0x20(%rsp),%rbx
0x00000000004011b0 <+188>:   lea    0x28(%rsp),%rax
0x00000000004011b5 <+193>:   lea    0x50(%rsp),%rsi
0x00000000004011ba <+198>:   mov    %rbx,%rcx
0x00000000004011bd <+201>:   mov    (%rax),%rdx
0x00000000004011c0 <+204>:   mov    %rdx,0x8(%rcx)
0x00000000004011c4 <+208>:   add    $0x8,%rax
0x00000000004011c8 <+212>:   cmp    %rsi,%rax
0x00000000004011cb <+215>:   je     0x4011d2 <phase_6+222>
0x00000000004011cd <+217>:   mov    %rdx,%rcx
0x00000000004011d0 <+220>:   jmp    0x4011bd <phase_6+201>
0x00000000004011d2 <+222>:   movq   $0x0,0x8(%rdx)
0x00000000004011da <+230>:   mov    $0x5,%ebp
0x00000000004011df <+235>:   mov    0x8(%rbx),%rax
0x00000000004011e3 <+239>:   mov    (%rax),%eax
0x00000000004011e5 <+241>:   cmp    %eax,(%rbx)
0x00000000004011e7 <+243>:   jge    0x4011ee <phase_6+250>
0x00000000004011e9 <+245>:   callq  0x40143a <explode_bomb>
0x00000000004011ee <+250>:   mov    0x8(%rbx),%rbx
0x00000000004011f2 <+254>:   sub    $0x1,%ebp
0x00000000004011f5 <+257>:   jne    0x4011df <phase_6+235>
0x00000000004011f7 <+259>:   add    $0x50,%rsp
0x00000000004011fb <+263>:   pop    %rbx
0x00000000004011fc <+264>:   pop    %rbp
0x00000000004011fd <+265>:   pop    %r12
0x00000000004011ff <+267>:   pop    %r13
0x0000000000401201 <+269>:   pop    %r14
0x0000000000401203 <+271>:   retq

分開來看：

0x00000000004010f4 <+0>:     push   %r14
0x00000000004010f6 <+2>:     push   %r13
0x00000000004010f8 <+4>:     push   %r12
0x00000000004010fa <+6>:     push   %rbp
0x00000000004010fb <+7>:     push   %rbx
0x00000000004010fc <+8>:     sub    $0x50,%rsp
0x0000000000401100 <+12>:    mov    %rsp,%r13
0x0000000000401103 <+15>:    mov    %rsp,%rsi

這一段是設置棧幀

1	0x0000000000401106 <+18>: callq 0x40145c <read_six_numbers>

這裡讀了 6 個數字，我們在 Phase 2 已經看到，這六個數字存在從 rsp 開始的一個數組中。

0x000000000040110b <+23>:    mov    %rsp,%r14
0x000000000040110e <+26>:    mov    $0x0,%r12d
0x0000000000401114 <+32>:    mov    %r13,%rbp
0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
0x0000000000401121 <+45>:    jbe    0x401128 <phase_6+52>
0x0000000000401123 <+47>:    callq  0x40143a <explode_bomb>
0x0000000000401128 <+52>:    add    $0x1,%r12d
0x000000000040112c <+56>:    cmp    $0x6,%r12d
0x0000000000401130 <+60>:    je     0x401153 <phase_6+95>
0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>

此處代碼構建了一個典型的嵌套循環結構：外層循環由 %r12d 計數，內層循環則由 %ebx 控制。

0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
...
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>

首先分析外層循環：它通過 %r13 指針遍歷輸入數組，首要任務是進行邊界檢查，確保讀取到的每一個數字都小於或等於 6。

再來看內層循環：

0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>

這裡從當前外層數字開始，判斷數組之後的每一個數位（int 類型，4 位元組，故 (%rsp,%rax,4) 獲得當前數字），判斷這個數字是否和外層數字相同。

於是，我們發現，這一層循環判斷輸入的每個數字是否互不相同。

總結一下，這個嵌套循環檢查我們的輸入是否是六個互不相同的小於等於 6 的數字

0x0000000000401153 <+95>:    lea    0x18(%rsp),%rsi
0x0000000000401158 <+100>:   mov    %r14,%rax
0x000000000040115b <+103>:   mov    $0x7,%ecx
0x0000000000401160 <+108>:   mov    %ecx,%edx
0x0000000000401162 <+110>:   sub    (%rax),%edx
0x0000000000401164 <+112>:   mov    %edx,(%rax)
0x0000000000401166 <+114>:   add    $0x4,%rax
0x000000000040116a <+118>:   cmp    %rsi,%rax
0x000000000040116d <+121>:   jne    0x401160 <phase_6+108>

這裡又有一個循環。前文已知，r14 就是 rsp，也就是棧指針。這裡遍歷每一個數 x，重新賦值，x = 7-x

0x000000000040116f <+123>:   mov    $0x0,%esi
0x0000000000401174 <+128>:   jmp    0x401197 <phase_6+163>
0x0000000000401176 <+130>:   mov    0x8(%rdx),%rdx
0x000000000040117a <+134>:   add    $0x1,%eax
0x000000000040117d <+137>:   cmp    %ecx,%eax
0x000000000040117f <+139>:   jne    0x401176 <phase_6+130>
0x0000000000401181 <+141>:   jmp    0x401188 <phase_6+148>
0x0000000000401183 <+143>:   mov    $0x6032d0,%edx
0x0000000000401188 <+148>:   mov    %rdx,0x20(%rsp,%rsi,2)
0x000000000040118d <+153>:   add    $0x4,%rsi
0x0000000000401191 <+157>:   cmp    $0x18,%rsi
0x0000000000401195 <+161>:   je     0x4011ab <phase_6+183>
0x0000000000401197 <+163>:   mov    (%rsp,%rsi,1),%ecx
0x000000000040119a <+166>:   cmp    $0x1,%ecx
0x000000000040119d <+169>:   jle    0x401183 <phase_6+143>
0x000000000040119f <+171>:   mov    $0x1,%eax
0x00000000004011a4 <+176>:   mov    $0x6032d0,%edx
0x00000000004011a9 <+181>:   jmp    0x401176 <phase_6+130>

先讀取輸入的元素 x，如果小於等於 1，把 edx 賦值為 0x6032d0，然後把 x 放在一個臨時數組中，然後繼續到下一個元素，直到遍歷完整個數組 (0x18 = 24 = 4*6)

如果元素 x 大於 1，把 eax 賦值為 1，edx 賦值為 0x6032d0，之後執行 x-1 次 mov 0x8(%rdx),%rdx 操作

這裡疑似是鍊表，出現了記憶體地址 0x6032d0，我們來看看：

(gdb) x/12xg 0x6032d0
0x6032d0 <node1>:       0x000000010000014c      0x00000000006032e0
0x6032e0 <node2>:       0x00000002000000a8      0x00000000006032f0
0x6032f0 <node3>:       0x000000030000039c      0x0000000000603300
0x603300 <node4>:       0x00000004000002b3      0x0000000000603310
0x603310 <node5>:       0x00000005000001dd      0x0000000000603320
0x603320 <node6>:       0x00000006000001bb      0x0000000000000000

這裡注意，在 64 位系統中，指針占用 8 位元組（即 64 位）。

顯然是鍊表，0x8(%rdx) 代表 next 指針

故上述操作得到一個數組，設輸入數組的第 i 個數為 x，數組中第 i 個數對應鍊表中第 x 個數的地址。

1
2
3

0x00000000004011ab <+183>:   mov    0x20(%rsp),%rbx
0x00000000004011b0 <+188>:   lea    0x28(%rsp),%rax
0x00000000004011b5 <+193>:   lea    0x50(%rsp),%rsi

這裡是一些初始化。rsi 是邊界指針，標記循環的終止。0x20 到 0x50 正好 6*8=48

0x00000000004011ba <+198>:   mov    %rbx,%rcx
0x00000000004011bd <+201>:   mov    (%rax),%rdx
0x00000000004011c0 <+204>:   mov    %rdx,0x8(%rcx)
0x00000000004011c4 <+208>:   add    $0x8,%rax
0x00000000004011c8 <+212>:   cmp    %rsi,%rax
0x00000000004011cb <+215>:   je     0x4011d2 <phase_6+222>
0x00000000004011cd <+217>:   mov    %rdx,%rcx
0x00000000004011d0 <+220>:   jmp    0x4011bd <phase_6+201>

這裡遍歷了我們剛才得到的鍊表地址數組。寫成 C 語言或許更好理解。

Node *current = node_ptrs[0]; // %rbx, %rcx 初始化
int i = 1; // 對應 %rax 指向 node_ptrs[1]

while (i < 6) {
    Node *next_node = node_ptrs[i]; // mov (%rax), %rdx
    current->next = next_node;      // mov %rdx, 0x8(%rcx)
    current = next_node;            // mov %rdx, %rcx
    i++;                            // add $0x8, %rax
}

這一個循環對於鍊表結構進行了修改。

1	0x00000000004011d2 <+222>: movq $0x0,0x8(%rdx)

這句話則把最後一個節點的 next 賦值為 NULL，確保鍊表結構

接下來又有一個循環：

0x00000000004011da <+230>:   mov    $0x5,%ebp
0x00000000004011df <+235>:   mov    0x8(%rbx),%rax
0x00000000004011e3 <+239>:   mov    (%rax),%eax
0x00000000004011e5 <+241>:   cmp    %eax,(%rbx)
0x00000000004011e7 <+243>:   jge    0x4011ee <phase_6+250>
0x00000000004011e9 <+245>:   callq  0x40143a <explode_bomb>
0x00000000004011ee <+250>:   mov    0x8(%rbx),%rbx
0x00000000004011f2 <+254>:   sub    $0x1,%ebp
0x00000000004011f5 <+257>:   jne    0x4011df <phase_6+235>

遍歷鍊表，確保鍊表倒序排列。

看到這裡，我們就可以得到答案了：

(gdb) x/12xg 0x6032d0
0x6032d0 <node1>:       0x000000010000014c      0x00000000006032e0
0x6032e0 <node2>:       0x00000002000000a8      0x00000000006032f0
0x6032f0 <node3>:       0x000000030000039c      0x0000000000603300
0x603300 <node4>:       0x00000004000002b3      0x0000000000603310
0x603310 <node5>:       0x00000005000001dd      0x0000000000603320
0x603320 <node6>:       0x00000006000001bb      0x0000000000000000

找到鍊表值的倒序索引即可，注意值是 int 類型，只取後四位。於是可以得到 3 4 5 6 1 2

但我們還要注意，輸入進行過 7-x 操作（見上文），所以我們調整答案 4 3 2 1 6 5

最後一個 Phase 有點複雜，巧妙融合了嵌套循環校驗、數組映射變換以及鍊表重組等多種技術。

隱藏關

/* Hmm...  Six phases must be more secure than one phase! */
input = read_line();             /* Get input                   */
phase_1(input);                  /* Run the phase               */
phase_defused();                 /* Drat!  They figured it out!
      * Let me know how they did it. */
printf("Phase 1 defused. How about the next one?\n");

/* The second phase is harder.  No one will ever figure out
 * how to defuse this... */
input = read_line();
phase_2(input);
phase_defused();
printf("That's number 2.  Keep going!\n");

/* I guess this is too easy so far.  Some more complex code will
 * confuse people. */
input = read_line();
phase_3(input);
phase_defused();
printf("Halfway there!\n");

/* Oh yeah?  Well, how good is your math?  Try on this saucy problem! */
input = read_line();
phase_4(input);
phase_defused();
printf("So you got that one.  Try this one.\n");

/* Round and 'round in memory we go, where we stop, the bomb blows! */
input = read_line();
phase_5(input);
phase_defused();
printf("Good work!  On to the next...\n");

/* This phase will never be used, since no one will get past the
 * earlier ones.  But just in case, make this one extra hard. */
input = read_line();
phase_6(input);
phase_defused();

bomb 代碼中，每一個 phase 後都運行 phase_defused。我們來看看：

Dump of assembler code for function phase_defused:
   0x00000000004015c4 <+0>:     sub    $0x78,%rsp
   0x00000000004015c8 <+4>:     mov    %fs:0x28,%rax
   0x00000000004015d1 <+13>:    mov    %rax,0x68(%rsp)
   0x00000000004015d6 <+18>:    xor    %eax,%eax
   0x00000000004015d8 <+20>:    cmpl   $0x6,0x202181(%rip)        # 0x603760 <num_input_strings>
   0x00000000004015df <+27>:    jne    0x40163f <phase_defused+123>
   0x00000000004015e1 <+29>:    lea    0x10(%rsp),%r8
   0x00000000004015e6 <+34>:    lea    0xc(%rsp),%rcx
   0x00000000004015eb <+39>:    lea    0x8(%rsp),%rdx
   0x00000000004015f0 <+44>:    mov    $0x402619,%esi
   0x00000000004015f5 <+49>:    mov    $0x603870,%edi
   0x00000000004015fa <+54>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x00000000004015ff <+59>:    cmp    $0x3,%eax
   0x0000000000401602 <+62>:    jne    0x401635 <phase_defused+113>
   0x0000000000401604 <+64>:    mov    $0x402622,%esi
   0x0000000000401609 <+69>:    lea    0x10(%rsp),%rdi
   0x000000000040160e <+74>:    callq  0x401338 <strings_not_equal>
   0x0000000000401613 <+79>:    test   %eax,%eax
   0x0000000000401615 <+81>:    jne    0x401635 <phase_defused+113>
   0x0000000000401617 <+83>:    mov    $0x4024f8,%edi
   0x000000000040161c <+88>:    callq  0x400b10 <puts@plt>
   0x0000000000401621 <+93>:    mov    $0x402520,%edi
   0x0000000000401626 <+98>:    callq  0x400b10 <puts@plt>
   0x000000000040162b <+103>:   mov    $0x0,%eax
   0x0000000000401630 <+108>:   callq  0x401242 <secret_phase>
   0x0000000000401635 <+113>:   mov    $0x402558,%edi
   0x000000000040163a <+118>:   callq  0x400b10 <puts@plt>
   0x000000000040163f <+123>:   mov    0x68(%rsp),%rax
   0x0000000000401644 <+128>:   xor    %fs:0x28,%rax
   0x000000000040164d <+137>:   je     0x401654 <phase_defused+144>
   0x000000000040164f <+139>:   callq  0x400b30 <__stack_chk_fail@plt>
   0x0000000000401654 <+144>:   add    $0x78,%rsp
   0x0000000000401658 <+148>:   retq

1	0x00000000004015d8 <+20>: cmpl $0x6,0x202181(%rip) # 0x603760 <num_input_strings>

這裡要求六關全部通過之後才能進入 secret_phase

我們可以設置條件斷點：b phase_defused if num_input_strings == 6

注意到：

1	0x0000000000401630 <+108>: callq 0x401242 <secret_phase>

這裡有非常多的記憶體地址，其中：

(gdb) x/s 0x402619
0x402619:       "%d %d %s"
(gdb) x/s 0x603870
0x603870 <input_strings+240>:   "7 0"
(gdb) x/s 0x402622
0x402622:       "DrEvil"

判斷 Phase 4 輸入之後是否有一個字串 DrEvil，如果有，進入隱藏關！

再來看看隱藏關的代碼：

Dump of assembler code for function secret_phase:
   0x0000000000401242 <+0>:     push   %rbx
   0x0000000000401243 <+1>:     callq  0x40149e <read_line>
   0x0000000000401248 <+6>:     mov    $0xa,%edx
   0x000000000040124d <+11>:    mov    $0x0,%esi
   0x0000000000401252 <+16>:    mov    %rax,%rdi
   0x0000000000401255 <+19>:    callq  0x400bd0 <strtol@plt>
   0x000000000040125a <+24>:    mov    %rax,%rbx
   0x000000000040125d <+27>:    lea    -0x1(%rax),%eax
   0x0000000000401260 <+30>:    cmp    $0x3e8,%eax
   0x0000000000401265 <+35>:    jbe    0x40126c <secret_phase+42>
   0x0000000000401267 <+37>:    callq  0x40143a <explode_bomb>
   0x000000000040126c <+42>:    mov    %ebx,%esi
   0x000000000040126e <+44>:    mov    $0x6030f0,%edi
   0x0000000000401273 <+49>:    callq  0x401204 <fun7>
   0x0000000000401278 <+54>:    cmp    $0x2,%eax
   0x000000000040127b <+57>:    je     0x401282 <secret_phase+64>
   0x000000000040127d <+59>:    callq  0x40143a <explode_bomb>
   0x0000000000401282 <+64>:    mov    $0x402438,%edi
   0x0000000000401287 <+69>:    callq  0x400b10 <puts@plt>
   0x000000000040128c <+74>:    callq  0x4015c4 <phase_defused>
   0x0000000000401291 <+79>:    pop    %rbx
   0x0000000000401292 <+80>:    retq   
End of assembler dump.

看到 strtol，知道這裡讀入了一個整數

0x000000000040125a <+24>:    mov    %rax,%rbx
0x000000000040125d <+27>:    lea    -0x1(%rax),%eax
0x0000000000401260 <+30>:    cmp    $0x3e8,%eax
0x0000000000401265 <+35>:    jbe    0x40126c <secret_phase+42>
0x0000000000401267 <+37>:    callq  0x40143a <explode_bomb>

要求讀取的整數小於等於 1001。注意 jbe 是無符號數的跳轉檢查，所以這裡其實也隱性限制了下限。所以嚴格的輸入限制是 [1, 1001] 之間的整數。

1
2
3

0x000000000040126c <+42>:    mov    %ebx,%esi
0x000000000040126e <+44>:    mov    $0x6030f0,%edi
0x0000000000401273 <+49>:    callq  0x401204 <fun7>

傳參，進入 fun7

0x0000000000401278 <+54>:    cmp    $0x2,%eax
0x000000000040127b <+57>:    je     0x401282 <secret_phase+64>
0x000000000040127d <+59>:    callq  0x40143a <explode_bomb>
0x0000000000401282 <+64>:    mov    $0x402438,%edi

這裡要求 fun7 的返回值等於 2

接下來我們看看 fun7，手動分個段

Dump of assembler code for function fun7:
   0x0000000000401204 <+0>:     sub    $0x8,%rsp
   0x0000000000401208 <+4>:     test   %rdi,%rdi
   0x000000000040120b <+7>:     je     0x401238 <fun7+52>
   
   0x000000000040120d <+9>:     mov    (%rdi),%edx
   0x000000000040120f <+11>:    cmp    %esi,%edx
   0x0000000000401211 <+13>:    jle    0x401220 <fun7+28>
   
   0x0000000000401213 <+15>:    mov    0x8(%rdi),%rdi
   0x0000000000401217 <+19>:    callq  0x401204 <fun7>
   0x000000000040121c <+24>:    add    %eax,%eax
   0x000000000040121e <+26>:    jmp    0x40123d <fun7+57>
   
   0x0000000000401220 <+28>:    mov    $0x0,%eax
   0x0000000000401225 <+33>:    cmp    %esi,%edx
   0x0000000000401227 <+35>:    je     0x40123d <fun7+57>
   0x0000000000401229 <+37>:    mov    0x10(%rdi),%rdi
   0x000000000040122d <+41>:    callq  0x401204 <fun7>
   0x0000000000401232 <+46>:    lea    0x1(%rax,%rax,1),%eax
   0x0000000000401236 <+50>:    jmp    0x40123d <fun7+57>
   
   0x0000000000401238 <+52>:    mov    $0xffffffff,%eax
   
   0x000000000040123d <+57>:    add    $0x8,%rsp
   0x0000000000401241 <+61>:    retq   
End of assembler dump.

遍歷當前 rdi 之後的兩個指針，遞迴，有點像二叉樹。我們來看看初始參數：

(gdb) x/60xg 0x6030f0
0x6030f0 <n1>:  0x0000000000000024      0x0000000000603110
0x603100 <n1+16>:       0x0000000000603130      0x0000000000000000
0x603110 <n21>: 0x0000000000000008      0x0000000000603190
0x603120 <n21+16>:      0x0000000000603150      0x0000000000000000
0x603130 <n22>: 0x0000000000000032      0x0000000000603170
0x603140 <n22+16>:      0x00000000006031b0      0x0000000000000000
0x603150 <n32>: 0x0000000000000016      0x0000000000603270
0x603160 <n32+16>:      0x0000000000603230      0x0000000000000000
0x603170 <n33>: 0x000000000000002d      0x00000000006031d0
0x603180 <n33+16>:      0x0000000000603290      0x0000000000000000
0x603190 <n31>: 0x0000000000000006      0x00000000006031f0
0x6031a0 <n31+16>:      0x0000000000603250      0x0000000000000000
0x6031b0 <n34>: 0x000000000000006b      0x0000000000603210
0x6031c0 <n34+16>:      0x00000000006032b0      0x0000000000000000
0x6031d0 <n45>: 0x0000000000000028      0x0000000000000000
0x6031e0 <n45+16>:      0x0000000000000000      0x0000000000000000
0x6031f0 <n41>: 0x0000000000000001      0x0000000000000000
0x603200 <n41+16>:      0x0000000000000000      0x0000000000000000
0x603210 <n47>: 0x0000000000000063      0x0000000000000000
0x603220 <n47+16>:      0x0000000000000000      0x0000000000000000
0x603230 <n44>: 0x0000000000000023      0x0000000000000000
0x603240 <n44+16>:      0x0000000000000000      0x0000000000000000
0x603250 <n42>: 0x0000000000000007      0x0000000000000000
0x603260 <n42+16>:      0x0000000000000000      0x0000000000000000
0x603270 <n43>: 0x0000000000000014      0x0000000000000000
0x603280 <n43+16>:      0x0000000000000000      0x0000000000000000
0x603290 <n46>: 0x000000000000002f      0x0000000000000000
0x6032a0 <n46+16>:      0x0000000000000000      0x0000000000000000
0x6032b0 <n48>: 0x00000000000003e9      0x0000000000000000
0x6032c0 <n48+16>:      0x0000000000000000      0x0000000000000000

確實是一顆二叉樹！（這裡的 60 是我試出來的）

fun7 傳入的參數為 rdi 和 esi

0x0000000000401208 <+4>:     test   %rdi,%rdi
0x000000000040120b <+7>:     je     0x401238 <fun7+52>
...
0x0000000000401238 <+52>:    mov    $0xffffffff,%eax
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

如果遍歷到葉子結點，直接返回 0xffffffff。

1
2
3

0x000000000040120d <+9>:     mov    (%rdi),%edx
0x000000000040120f <+11>:    cmp    %esi,%edx
0x0000000000401211 <+13>:    jle    0x401220 <fun7+28>

查看當前節點的值，如果值大於 esi：

0x0000000000401213 <+15>:    mov    0x8(%rdi),%rdi
0x0000000000401217 <+19>:    callq  0x401204 <fun7>
0x000000000040121c <+24>:    add    %eax,%eax
0x000000000040121e <+26>:    jmp    0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

訪問左子節點，返回值乘以二

如果當前節點的值和 rsi 相等：

0x0000000000401220 <+28>:    mov    $0x0,%eax
0x0000000000401225 <+33>:    cmp    %esi,%edx
0x0000000000401227 <+35>:    je     0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

直接返回

否則，訪問右子節點：

0x0000000000401229 <+37>:    mov    0x10(%rdi),%rdi
0x000000000040122d <+41>:    callq  0x401204 <fun7>
0x0000000000401232 <+46>:    lea    0x1(%rax,%rax,1),%eax
0x0000000000401236 <+50>:    jmp    0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

返回值乘以二再加一

我們可以用 C 語言翻譯上述代碼：

long fun7(struct Node *node, int target_val) {
    // 1. 如果節點為空
    if (node == NULL) {
        return -1; // 對應匯編中的 mov $0xffffffff, %eax
    }

    int current_val = node->value; // mov (%rdi), %edx

    // 2. 如果當前節點值 > 目標值 (target_val < current_val)
    // 匯編邏輯：cmp %esi, %edx -> jle (跳過) -> 否則執行這裡
    if (current_val > target_val) {
        // 遞迴調用左子節點 (偏移量 0x8)
        // 對應 callq fun7, 然後 add %eax, %eax
        return 2 * fun7(node->left, target_val);
    }
    
    // 3. 如果當前節點值 == 目標值
    // 匯編邏輯：cmp %esi, %edx -> je (跳轉到返回0)
    if (current_val == target_val) {
        return 0; // 找到目標，返回 0
    }

    // 4. 如果當前節點值 < 目標值
    // 匯編邏輯：此時只剩下這種情況
    // 遞迴調用右子節點 (偏移量 0x10)
    // 對應 callq fun7, 然後 lea 0x1(%rax,%rax,1) -> 2*rax + 1
    return 2 * fun7(node->right, target_val) + 1;
}

我們再來看看二叉樹的結構，根據：

(gdb) x/60xg 0x6030f0
0x6030f0 <n1>:  0x0000000000000024      0x0000000000603110
0x603100 <n1+16>:       0x0000000000603130      0x0000000000000000
0x603110 <n21>: 0x0000000000000008      0x0000000000603190
0x603120 <n21+16>:      0x0000000000603150      0x0000000000000000
0x603130 <n22>: 0x0000000000000032      0x0000000000603170
0x603140 <n22+16>:      0x00000000006031b0      0x0000000000000000
0x603150 <n32>: 0x0000000000000016      0x0000000000603270
0x603160 <n32+16>:      0x0000000000603230      0x0000000000000000
0x603170 <n33>: 0x000000000000002d      0x00000000006031d0
0x603180 <n33+16>:      0x0000000000603290      0x0000000000000000
0x603190 <n31>: 0x0000000000000006      0x00000000006031f0
0x6031a0 <n31+16>:      0x0000000000603250      0x0000000000000000
0x6031b0 <n34>: 0x000000000000006b      0x0000000000603210
0x6031c0 <n34+16>:      0x00000000006032b0      0x0000000000000000
0x6031d0 <n45>: 0x0000000000000028      0x0000000000000000
0x6031e0 <n45+16>:      0x0000000000000000      0x0000000000000000
0x6031f0 <n41>: 0x0000000000000001      0x0000000000000000
0x603200 <n41+16>:      0x0000000000000000      0x0000000000000000
0x603210 <n47>: 0x0000000000000063      0x0000000000000000
0x603220 <n47+16>:      0x0000000000000000      0x0000000000000000
0x603230 <n44>: 0x0000000000000023      0x0000000000000000
0x603240 <n44+16>:      0x0000000000000000      0x0000000000000000
0x603250 <n42>: 0x0000000000000007      0x0000000000000000
0x603260 <n42+16>:      0x0000000000000000      0x0000000000000000
0x603270 <n43>: 0x0000000000000014      0x0000000000000000
0x603280 <n43+16>:      0x0000000000000000      0x0000000000000000
0x603290 <n46>: 0x000000000000002f      0x0000000000000000
0x6032a0 <n46+16>:      0x0000000000000000      0x0000000000000000
0x6032b0 <n48>: 0x00000000000003e9      0x0000000000000000
0x6032c0 <n48+16>:      0x0000000000000000      0x0000000000000000

graph TD
    N1((36)) --> N21((8))
    N1 --> N22((50))

    N21 --> N31((6))
    N21 --> N32((22))

    N22 --> N33((45))
    N22 --> N34((107))

    N31 --> N41((1))
    N31 --> N42((7))

    N32 --> N43((20))
    N32 --> N44((35))

    N33 --> N45((40))
    N33 --> N46((47))

    N34 --> N47((99))
    N34 --> N48((1001))

要求最終輸出為 2，2 = 1*2

先向左，再向右，然後找到了答案。

於是，我們得到答案 22

總結

於是，最終答案是：

Border relations with Canada have never been better.
1 2 4 8 16 32
0 207
7 0 DrEvil
ionefg
4 3 2 1 6 5
22

最後讓 AI 生成一段小結

CSAPP Bomb Lab 是一個非常經典的實驗，它不僅是一次對匯編語言 (x86-64) 的深度練習，更是一場邏輯推理的解謎遊戲。

回顧整個拆彈過程，我們經歷了從簡單到複雜的演進：

基礎控制流：從 Phase 1 的字串比較，到 Phase 2 的循環與棧上數組操作。
高級控制流：Phase 3 展示了 switch 語句如何通過跳轉表實現，Phase 4 則通過遞迴讓我們深入理解了棧幀的生長與銷毀以及二分尋找算法。
數據操縱：Phase 5 的位運算與字元數組索引映射，考察了對指針和記憶體定址的敏感度。
數據結構：Phase 6 的鍊表重排以及隱藏關卡的二叉搜索樹（BST），讓我們看到了高級數據結構在匯編層面的具體形態（指針即地址）。

在這個過程中，gdb 是最強大的武器。熟練掌握斷點設置、暫存器查看 (i r) 和記憶體檢查 (x/) 是通關的關鍵。同時，我們也深刻體會到了編譯器最佳化的“智慧”（如利用 lea 進行算術運算、利用無符號數比較合併上下界檢查）和 C 語言與機器碼之間的映射關係。

當看到終端最終列印出 “Congratulations! You’ve defused the bomb!” 時，所有的查表、計算和堆棧分析都是值得的。希望這篇解析能對你理解計算機底層系統有所幫助。 Happy Hacking!

CSAPP Data Lab 解析

Louis Aeilot's Blog

Louis C Deng

2025年12月2日 02:45

前一段時間做完了 CSAPP 的第一個 Lab，寫一篇總結。（其實這篇文章拖了很久）

CS:APP Data Lab 旨在通過一系列位操作謎題，訓練對整數和浮點數底層表示（特別是補碼和 IEEE 754 標準）的理解。要求在嚴格限制的操作符和操作數數量下，實現特定的數學或邏輯功能。

函數名 (Name)	描述 (Description)	難度 (Rating)	最大操作數 (Max ops)
`bitXor(x, y)`	只使用 `&` 和 `~` 實現 `x ^ y` (異或)。	1	14
`tmin()`	返回最小的補碼整數 (Two’s complement integer)。	1	4
`isTmax(x)`	僅當 `x` 是最大的補碼整數時返回 True。	1	10
`allOddBits(x)`	僅當 `x` 的所有奇數位都為 1 時返回 True。	2	12
`negate(x)`	返回 `-x`，不使用 `-` 運算符。	2	5
`isAsciiDigit(x)`	如果 `0x30 <= x <= 0x39` (即 ASCII 數字字元) 則返回 True。	3	15
`conditional(x, y, z)`	等同於 `x ? y : z` (三元運算符)。	3	16
`isLessOrEqual(x, y)`	如果 `x <= y` 返回 True，否則返回 False。	3	24
`logicalNeg(x)`	計算 `!x` (邏輯非)，不使用 `!` 運算符。	4	12
`howManyBits(x)`	用補碼表示 `x` 所需的最小位數。	4	90
`floatScale2(uf)`	對於浮點參數 `f`，返回 `2 * f` 的位級等價表示。	4	30
`floatFloat2Int(uf)`	對於浮點參數 `f`，返回 `(int)f` 的位級等價表示。	4	30
`floatPower2(x)`	對於整數 `x`，返回 `2.0^x` 的位級等價表示。	4	30

bitXor

該題要求僅使用 ~（取反）和 &（與），實現 ^（異或）

1
2
3

int bitXor(int x, int y) {
  return ~((~(x&~y))&(~((~x)&y)));
}

使用 De Morgan 律，容易得到 ~(x&y) = (～x)|(~y)，於是我們可以使用 ~ 和 & 實現 | 操作。

異或操作，可以表示為 x^y = (~x & y) | (x & ~y)，結合 De Morgan 律，我們很容易得到最終的答案 x^y = ~((~(x&~y))&(~((~x)&y)))。

tmin

這道題很簡單，返回最小的補碼整數。回顧補碼的定義，最高位取負權，故令符號位為 1 即可。

1
2
3

int tmin(void) {
  return 1<<31;
}

isTmax

判斷 x 是否是最大的補碼。若是，返回 1；否則，返回 0。

int isTmax(int x) {
  int map = x + 1;
  int res = ~(map + x);
  return !res & (!!map);
}

最大的補碼有一個性質，加一之後變成最小的補碼：0x7fffffff -> 0x80000000

而最大的補碼加上最小的補碼等於 0xffffffff 即 -1，取反之後為 0 （這裡推出 0 是為了得到返回值中的 0/1）

因此，我們可以通過 ~(x+x+1) 得到答案。

但是 -1+0 也等於 -1，即如果 x=0 時，~(x+x+1) 同樣等於 1，是一個 Corner Case。

因此，我們還需要對結果與 !!(x+1)，才能得到最終的答案。（如果 x=-1，!!(x+1)=0；其餘情況均為 1）

於是我們得到最終的答案 !(~(x+x+1)) & (!!(x+1))

allOddBits

僅當 x 的所有奇數位都為 1 時返回 1

int allOddBits(int x) {
  int a = 0xAA;
  int b = (a<<8) + (a<<16) + (a <<24) + a;
  int bm = ~b+1;
  return !((x&b)+bm);
}

我們做一個奇數位掩碼即可 0xAA = 0b10101010，通過左移，可以得到 a + (a<<8) + (a<<16) + (a <<24) = 0xAAAAAAAA = b

於是 x&b 取出所有奇數位，但是我們需要得到 0/1 的答案

bm = ~b + 1，得到 -b（取反加一是補碼相反數），b+(-b) = 0，再取邏輯非，就可以得到答案

negate

這道題要求不使用 - 運算符計算 -x

1
2
3

int negate(int x) {
  return ~x+1;
}

非常簡單，根據補碼的定義得到。取反加一就是相反數。

isAsciiDigit

如果 0x30 <= x <= 0x39 (即 ASCII 數字字元) 則返回 True。

我們在這道題中不能使用 <= 這類運算符，因此，我們想到，進行減法之後取符號位的操作。

int isAsciiDigit(int x) {
    int ge_30 = !((x + (~0x30 + 1)) >> 31);     
    int le_39 = !((0x39 + (~x + 1)) >> 31); 
    return ge_30 & le_39; 
}

conditional

使用位運算實現三目運算符（x ? y : z）

int conditional(int x, int y, int z) {
  int xb = !(!x);
  int M = ~xb + 1;
  return (M&y) | (~M&z);
}

我們可以使用邏輯掩碼

先使用 !(!x) 將 x 轉換成 0/1，記為 xb

~xb + 1，則有 0 -> 0；1 -> -1 = 0xffffffff（掩碼，取所有位）

因此，(M&y) | (~M&z) 就是最終的答案。

如果 x = 1，M = 0xffffffff，~M = 0，取 y；否則，取 z

isLessOrEqual

1
2
3

int isLessOrEqual(int x, int y) {
  return !((y+(~x+1))>>31);
}

簡單判斷符號位即可。但是實現的是 <=，對 > 取非即可

logicalNeg

計算 !x (邏輯非)，不使用 ! 運算符

1
2
3

int logicalNeg(int x) {
  return ((x>>31) | ((~x+1)>>31))+1;
}

howManyBits

計算用補碼表示 x 所需的最小位數

int howManyBits(int x) {
  int fg = x>>31;
  x = ((~fg) & x) | (fg &(~x));
  int h16 = !!(x >> 16) << 4;
  x >>= h16;
  int h8 = !!(x>>8) << 3;
  x >>= h8;
  int h4 = !!(x>>4) << 2;
  x >>= h4;
  int h2 = !!(x>>2) << 1;
  x>>=h2;
  int h1 = !!(x>>1);
  x>>=h1;
  int h0 = x;
  return h0 + h1 + h2 + h4 + h8 + h16 + 1;
}

這道題，先選取符號位，然後計算之後的最高位即可。

為了方便計算，我們把負數補碼表示為正數，這樣就只用計算最高位的 1 在哪裡就行了

((~fg) & x) | (fg & (~x)) 是一個條件取反操作，相當於 x = (x < 0) ? ~x : x

若 fg 為 0（正數）：表達式變為 (All_1 & x) | (0 & ~x) -> x。保持不變。
若 fg 為 -1（負數）：表達式變為 (0 & x) | (All_1 & ~x) -> ~x。按位取反。

這裡提醒各位，此處補碼右移是算術右移，所以負數右移得到一個所有位都為 1 的數，也就是 -1。

接下來進行位的二份尋找：

這裡的邏輯是**“分治法”**。我們有 32 位要檢查，像二分尋找一樣：

檢查高 16 位：
- x >> 16：如果不為 0，說明最高位的 1 在高 16 位中（即位 16-31）。
- !!(...)：將結果轉化為 0 或 1。如果高 16 位有數，結果為 1，否則為 0。
- 1<< 4：如果高 16 位有數，說明我們至少需要 16 位，即 1 << 4 = 16。
- h16：這就是我們找到的基數（0 或 16）。
- x >>= h16：關鍵點。如果我們確定高 16 位有數，我們將 x 右移 16 位，丟棄低 16 位，接下來的檢查只關注剛才的高 16 位。如果高 16 位全是 0，x 保持不變，我們繼續檢查原本的低 16 位。
檢查高 8 位（在剩下的 16 位範圍內）：

邏輯同上。如果剩下的這部分的高 8 位有數，則 h8 = 8，並將 x 右移 8 位。

依此類推：

h4：檢查剩下的 4 位中的高 2 位… (這裡代碼邏輯是一致的，檢查高4位)。
h2：檢查剩下的 4 位。
h1：檢查剩下的 2 位。
h0 = x：檢查最後剩下的 1 位。

最後，我們計算 h16+…+h0 的總和即可。這裡要注意，補碼有一個符號位，所以結果還要再 +1。

得到答案：h0 + h1 + h2 + h4 + h8 + h16 + 1

floatScale2

對於浮點參數 f，返回 2 * f 的位級等價表示

IEEE 754

我們先來回顧一下浮點數的位級表示，即 IEEE 754，這裡以 float 為例

浮點數位中有三段：

Sign (s): 1 bit [31] -> 符號位
Exponent (exp): 8 bits [30:23] -> 階碼
Fraction (frac): 23 bits [22:0] -> 尾數

1
2
3

int sign = (uf >> 31) & 0x1;
int exp  = (uf >> 23) & 0xFF;
int frac = uf & 0x7FFFFF;

對於一個浮點數的解釋，有三種情況：

Case A: 非規格化 (Denormalized)

特徵：exp == 0
真實值： $V = (-1)^s \times M \times 2^{1-Bias}$
- 這裡 $M = 0.frac$ (沒有隱含的 1)

Case B: 規格化 (Normalized)

特徵：exp != 0 且 exp != 255
真實值： $V = (-1)^s \times M \times 2^{exp-Bias}$
- 這裡 $M = 1.frac$ (有一個隱含的 1)
- Bias = 127

Case C: 特殊值 (Special Values)

特徵：exp == 255 (全 1)
類型：
- frac == 0：Infinity (無窮大)
- frac != 0：NaN (Not a Number)

接下來我們看這道題，這道題只需要注意分類討論就可以。

unsigned floatScale2(unsigned uf) {
  unsigned s = uf >> 31;
  unsigned exp = (uf >> 23) & 0xFF;
  unsigned ff = uf & 0x7fffff;

  // 特殊值 (Special Values)
  // 如果階碼全為1 (exp == 255)，表示 NaN (非數) 或 Infinity (無窮大)
  // 規則：NaN * 2 = NaN, Inf * 2 = Inf，直接返回原值
  if (exp == 0xFF) {
    return uf;
  }

  // 非規格化數 (Denormalized)
  // 如果階碼為0，表示非規格化數，數值非常接近 0
  if (exp == 0) {
    // 非規格化數乘以2：直接將尾數左移一位
    ff <<= 1;
    
    // 檢查尾數是否溢出 (從非規格化過渡到規格化)
    // 如果左移後 ff 超過了 23 位能表示的最大值 (即 0x7fffff)
    // 說明最高位變成了 1，這個 1 應該“進位”給階碼
    if (ff > 0x7fffff) {
      ff -= 0x800000; // 去掉溢出的那一位 (因為它現在變成了隱含的 1)
      exp += 1;       // 階碼從 0 變為 1 (成為規格化數)
    }
  } 
  // 規格化數 (Normalized)
  else {
    // 規格化數乘以2：直接給階碼加 1
    exp += 1;
    
    // 檢查階碼上溢 (Overflow)
    // 如果加 1 後階碼變成了 255，說明數值太大，變成了無窮大 (Infinity)
    if (exp == 0xFF) {
      ff = 0; // 無窮大的定義是 exp=255 且 frac=0
    }
  }

  return (s << 31) | (exp << 23) | (ff);
}```

## floatFloat2Int

對於浮點參數 `f`，返回 `(int)f` 的位級等價表示

```c
int floatFloat2Int(unsigned uf) {
  unsigned s = uf >> 31;
  unsigned exp = (uf >> 23) & 0xFF;
  unsigned ff = uf & 0x7fffff;

  // 處理特殊情況：NaN (非數) 或 Infinity (無窮大)
  // 當階碼全為 1 時。根據題目要求，越界通常返回 TMin (0x80000000)
  if (exp == 0xFF) {
    return 0x80000000u;
  }

  // 處理非規格化數 (Denormalized)
  // 當階碼全為 0 時，數值極小 (0.xxxx * 2^-126)，轉換為 int 必定為 0
  if (exp == 0) {
    return 0;
  }

  // 計算真實指數 E
  // Bias (偏置值) 是 127。 E = exp - Bias
  int E = (int)exp - 127;

  // 處理小於 1 的數
  // 如果真實指數小於 0 (例如 2^-1, 2^-2)，數值為 0.xxxx
  // 強轉 int 會向零截斷，結果為 0
  if (E < 0) return 0;

  // 還原隱含的 1 (Restore Implicit 1)
  // 規格化數的真實尾數形式是 1.fffff...
  // 我們手動把第 23 位置 1，代表那個隱含的整數部分 "1"
  ff = ff | (1 << 23);

  // 處理溢出 (Overflow)
  // 如果指數 E >= 31，說明數值 magnitude >= 2^31
  // int 的最大值是 2^31 - 1。
  // 無論是正數溢出，還是負數正好是 TMin (-2^31) 或更小，
  // 按照題目規則，都返回 TMin (0x80000000)
  if (E >= 31) {
    return 0x80000000u;
  }

  // 位移對齊 (Bit Shifting)
  // 現在的 ff 看起來是這樣： [1]. [xxxxxx]... (1 在第 23 位)
  // 這相當於 1.xxxxx * 2^23 (如果在整數暫存器看)
  // 我們實際需要的是 1.xxxxx * 2^E
  if (E < 23) {
    // 情況 A: 指數較小 (例如 E = 20)
    // 我們需要將小數點右移 20 位。
    // 但當前 ff 是左對齊在第 23 位的，所以需要**右移**丟棄多餘的小數位。
    // 移位量 = 23 - 20 = 3
    ff = ff >> (23 - E);
  } else {
    // 情況 B: 指數較大 (例如 E = 30)
    // 我們需要將小數點右移 30 位。
    // 當前只在第 23 位，不夠，需要**左移**補零。
    // 移位量 = 30 - 23 = 7
    ff = ff << (E - 23);
  }

  // 處理符號
  // 如果原數是負數，進行取反加一 (即 -ff)
  if (s) return -ff;
  
  // 原數是正數，直接返回
  return ff;
}

floatPower2

對於整數 x，返回 2.0^x 的位級等價表示。對於這道題，計算出幾個臨界點即可。

unsigned floatPower2(int x) {
    // 1. 處理下溢 (Underflow)
    // 最小的非規格化數是 2^(-149)。
    // 計算邏輯：Min Denorm = 2^(1-Bias) * 2^(-23) = 2^(-126) * 2^(-23) = 2^(-149)
    // 如果 x 比這個還小，說明數值太小無法表示，直接返回 0.0
    if (x < -149)
        return 0;

    // 2. 處理非規格化數 (Denormalized)
    // 範圍：[-149, -127]
    // 非規格化數的階碼 (exp) 全為 0，值公式為：M * 2^(-126)
    // 我們需要構建 2^x。
    // 方程：2^x = (1 << shift) * 2^(-23) * 2^(-126)  <-- (1<<shift)*2^-23 是尾數部分
    //      2^x = 2^shift * 2^(-149)
    //      x = shift - 149
    //      shift = x + 149
    // 所以，我們將 1 左移 (x + 149) 位放在尾數部分 (Fraction)
    else if (x < -126)
        return 1 << (x + 149);

    // 3. 處理規格化數 (Normalized)
    // 範圍：[-126, 127]
    // 規格化數的值公式為：1.0 * 2^(exp - Bias)
    // 我們需要 2^x，尾數部分保持為 0 (即 1.0)，只需要設置階碼。
    // 方程：x = exp - Bias
    //      exp = x + Bias
    //      exp = x + 127
    // 將計算出的 exp 移到階碼的位置 (第 23-30 位)
    else if (x <= 127)
        return (x + 127) << 23;

    // 4. 處理上溢 (Overflow)
    // 範圍：x > 127
    // 單精度浮點數最大能表示的 2 的冪是 2^127。
    // 超過這個值，返回正無窮大 (+Infinity)。
    // +Inf 的表示：符號位 0，階碼全 1 (0xFF)，尾數全 0。
    else
        return (0xFF) << 23;
}

小結

Data Lab 實驗使我深入理解整數（補碼）和浮點數（IEEE 754）在二進制層面的表示方法，透過使用一組極其受限的位運算符（如 ~, &, |, ^, +, <<, >>）來實現複雜的邏輯、算術、比較和類型轉換操作，從而真正掌握了位運算的技巧。

我的代碼存放在 aeilot/CSAPP-Labs。

聊一聊位掩碼（Bit Mask）

Louis Aeilot's Blog

Louis C Deng

2025年10月21日 07:45

掩碼 (Mask) 是一種位運算技巧，它使用一個特定的值（掩碼）與目標值進行 $\mathtt{\&}$ (與)、 $\mathtt{|}$ (或)、 $\mathtt{\wedge}$ (異或) 運算，以精確地、批次地操作、提取或檢查目標值中的一個或多個位。

基本概念

掩碼利用位運算的特性，透過設定掩碼中的特定位為 1 或 0，來控制目標值中對應位的行為。具體來說，掩碼可以用來提取某些位的值，清除某些位的值，反轉某些位的值，或者設定某些位的值。

提取位

透過與運算（ $\mathtt{\&}$ ）和一個掩碼，可以提取目標值中特定位置的位。例如，假設我們有一個 8 位的二進位制數 10101100，我們想提取其中的第 3 位（從右數起，0 開始計數）。我們可以使用掩碼 00000100：

  10101100  (目標值)
& 00000100  (掩碼)
------------
  00000100  (結果)

結果 00000100 表示第 3 位是 1。

這一技巧可以用來提取多位，比如想要提取某個數的低 4 位，可以使用掩碼 00001111。

清除位

透過與運算（ $\mathtt{\&}$ ）和一個掩碼，可以清除目標值中特定位置的位。例如，假設我們有一個 8 位的二進位制數 10101100，我們想清除其中的第 3 位。我們可以使用掩碼 11111011：

  10101100  (目標值)
& 11111011  (掩碼)
------------
  10101000  (結果)

結果 10101000 表示第 3 位被清除為 0。

清除就是不提取某些位 lol

反轉位

透過異或運算（ $\mathtt{\wedge}$ ）和一個掩碼，可以反轉目標值中特定位置的位。例如，假設我們有一個 8 位的二進位制數 10101100，我們想反轉其中的第 3 位。我們可以使用掩碼 00000100：

  10101100  (目標值)
^ 00000100  (掩碼)
------------
  10101000  (結果)

結果 10101000 表示第 3 位被反轉。

設定位

透過或運算（ $\mathtt{|}$ ）和一個掩碼，可以設定目標值中特定位置的位。例如，假設我們有一個 8 位的二進位制數 10101000，我們想設定其中的第 3 位為 1。我們可以使用掩碼 00000100：

  10101000  (目標值)
| 00000100  (掩碼)
------------
  10101100  (結果)

結果 10101100 表示第 3 位被設定為 1。

構造掩碼

構造合適的掩碼是使用技巧的關鍵。

單個位: $\mathtt{1 \ll n}$
1. $\mathtt{1 \ll 5}$ ( $\mathtt{00100000}$ ) 是第 5 位的掩碼。
連續低位: $\mathtt{(1 \ll n) - 1}$
1. $\mathtt{(1 \ll 8) - 1}$ ( $\mathtt{0xFF}$ ) 是低 8 位的掩碼。
全 1 掩碼: $\mathtt{\sim 0}$ (即 $-1$ )
1. $\mathtt{0xFFFFFFFF}$ (假設 32 位)
全 0 掩碼: $\mathtt{0}$

條件掩碼

在 CSAPP Data Lab 中，我們有一道題目要求用位運算實現三目運算子 x ? y : z。我們可以使用條件掩碼來實現這一點。

int conditional(int x, int y, int z) {
  int mask = !!x;          // mask 為 1 如果 x 非零，否則為 0
  mask = ~mask + 1;       // mask 為 0xFFFFFFFF 如果 x 非零，否則為 0x0
  return (y & mask) | (z & ~mask);
}

這段程式碼的邏輯是：

計算 mask = !!x，如果 x 非零，mask 為 1，否則為 0。
透過 mask = ~mask + 1，將 mask 轉換為全 1 (0xFFFFFFFF) 或全 0 (0x0)。
返回 (y & mask) | (z & ~mask)，如果 x 非零，結果為 y，否則為 z。

總結

掩碼是一種強大的位運算技巧，可以用來精確地操作和檢查資料中的特定位。

透過合理構造掩碼，我們可以高效地實現各種位操作，如提取、清除、反轉和設定位。在實際程式設計中，掌握掩碼的使用能夠幫助我們編寫出更高效、更簡潔的程式碼。

整數溢位與未定義行為

Louis Aeilot's Blog

Louis C Deng

2025年10月14日 06:45

在做 CSAPP Data Lab 的時候，關於整數溢位，遇到一些問題。

題幹

/*
 * isTmax - returns 1 if x is the maximum, two's complement number, 
 *     and 0 otherwise 
 *   Legal ops: ! ~ & ^ | +
 *   Max ops: 10
 *   Rating: 1
 */

int isTmax(int x) {
  return 2;
}

題目要求，僅僅使用運算子 ! ~ & ^ | + 來判斷一個數是否是最大的二的補碼（int 範圍內），即 0x7fffffff。如果是，輸出 1；否則，輸出 0。

思路

由於我們不能使用移位操作（很多人會直接 1<<31 - 1），可以考慮整數溢位的特殊性質。

具體地，我們有 0x7fffffff + 1 = 0x80000000，符號改變。

而 0x80000000 + 0x80000000 = 0

我們可以得到 x = 0x7fffffff 滿足 x + 1 + x + 1 = 0

而對於其他數字，假設 y = x + k 其中 k 非零，則有 y + 1 + y + 1 = 2*k

此時，我們發現，對於 y=-1 也有 y + 1 + y + 1 = 0，需要排除掉

其他情況下，非零數轉換為 bool 型別自動變為 1

我們不難寫出以下程式碼：

int isTmax(int x) {
  int p1 = x+1;
  int p2 = p1 + p1;
  return !(p2) & !!(p1);
}

發現問題

這段程式碼在我本地（macOS，Apple clang version 17.0.0 (clang-1700.3.19.1), Target: arm64-apple-darwin25.0.0) 上執行，使用命令 clang main.c 是沒有任何問題的。

但是，檢查到 CSAPP 提供的 Makefile，有

#
# Makefile that builds btest and other helper programs for the CS:APP data lab
# 
CC = gcc
CFLAGS = -O -Wall
LIBS = -lm

all: btest fshow ishow

btest: btest.c bits.c decl.c tests.c btest.h bits.h
$(CC) $(CFLAGS) $(LIBS) -o btest bits.c btest.c decl.c tests.c

fshow: fshow.c
$(CC) $(CFLAGS) -o fshow fshow.c

ishow: ishow.c
$(CC) $(CFLAGS) -o ishow ishow.c

# Forces a recompile. Used by the driver program. 
btestexplicit:
$(CC) $(CFLAGS) $(LIBS) -o btest bits.c btest.c decl.c tests.c 

clean:
rm -f *.o btest fshow ishow *~

注意到，編譯器使用了 -O flag，即 O1 最佳化。

此時執行這段程式碼，對於 0x7fffffff 輸出 0，懷疑可能是編譯器最佳化時，假設未定義行為（整數溢位）不會發生，將 !p2 最佳化。p1 + p1 的形式過於簡單。

未定義行為

未定義行為（UB），根據 cppreference 的定義：

1	undefined behavior - There are no restrictions on the behavior of the program.

有符號整數溢位是一種常見的未定義行為。

Because correct C++ programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled.

也就是說，編譯器最佳化會對未定義行為產生意料之外的結果

cppreference 給出了一個整數溢位的例子：

int foo(int x)
{
    return x + 1 > x; // either true or UB due to signed overflow
}

編譯之後卻變成了

1
2
3

foo(int):
        mov     eax, 1
        ret

意思是，不管怎麼樣都輸出 1

觀察出錯程式碼

我們透過 gcc -S 輸出編譯後的彙編程式碼

_Z6isTmaxi:
.LFB2:
.cfi_startproc
endbr64
movl$0, %eax
ret
.cfi_endproc

我們看到，編譯器直接把這個函式返回值改成了 0，不管輸入什麼，與我們的錯誤原因推斷是相同的。

修改

我們可以嘗試構造一個更復雜的、不易被簡單規則匹配的表示式，躲過 O1 級別的最佳化。

核心思路不變，仍然是利用 Tmax + 1 = Tmin 這個特性。我們來觀察一下 Tmax 和 Tmin 在二進位制下的關係：

Tmax = 0x7fffffff = 0111...1111
Tmin = 0x80000000 = 1000...0000

一個非常有趣的性質是 Tmax + Tmin = -1 (0xffffffff)。

  0111 1111 ... 1111  (Tmax)
+ 1000 0000 ... 0000  (Tmin)
-------------------------
  1111 1111 ... 1111  (-1)

基於這個觀察，我們可以設計一個新的檢查方案：如果一個數 x 是 Tmax，那麼 x + (x+1) 的結果就應該是 -1。取反後 ~(-1) 則為 0。

我們可以寫出如下的修改版程式碼：

int isTmax(int x) {
  int map = x + 1;
  int res = ~(map + x);
  return !res & (!!map);
}

這段程式碼的邏輯是：

計算 map = x + 1。對於 x = Tmax，這裡同樣會發生有符號溢位，map 變為 Tmin。這依然是未定義行為（UB）。
計算 res = ~(map + x)。如果 x 是 Tmax，這一步就是 ~(Tmin + Tmax)，結果為 ~(-1)，即 0。
return !res & (!!map)。!res 為 !0，即 1。!!map 部分和之前的版本一樣，是為了排除 x = -1 的情況（此時 map 為 0， !!map 為 0，最終返回 0）。

這段程式碼在 -O 最佳化下可能會得到正確的結果。

為什麼這個“可能”有效？

我們必須清醒地認識到，新版本的程式碼本質上沒有解決未定義行為的問題，它只是“僥倖”地繞過了當前編譯器版本的特定最佳化策略。

程式碼模式的複雜性：p1 + p1 ((x+1)+(x+1)) 是一個非常簡單直白的模式，最佳化器很容易建立一個“如果 p1 非零，則 p1+p1 結果也非零”的最佳化規則。而 ~((x+1)+x) 混合了加法和位運算，模式更復雜，可能沒有觸發編譯器中已有的、基於UB的最佳化捷徑。
最佳化的機會主義：編譯器最佳化並不是要窮盡所有的數學可能，而是應用一系列已知的高效模式。我們的新程式碼恰好不在這些常見模式的“黑名單”上。

所以，這個修改版只是一個更具迷惑性的“偽裝”。它在特定環境下能工作，但其行為是不被C語言標準所保證的，在不同的編譯器或未來的GCC版本下，它隨時可能失效。

結論：如何正確面對未定義行為

透過 isTmax 這個小小的函式，我們可以一窺C語言中未定義行為的危險性以及現代編譯器最佳化的強大。作為開發者，我們應該得到以下啟示：

不要依賴未定義行為：永遠不要編寫依賴於UB的程式碼，即使它“在你的機器上看起來能跑”。程式碼的健壯性來源於對語言標準的嚴格遵守，而非僥倖。
相信編譯器，但要驗證：編譯器非常聰明，它會嚴格按照語言規範進行最佳化。當你發現最佳化後的程式碼行為不符合你的“直覺”時，首先應該懷疑自己的程式碼是否觸碰了UB的紅線。
善用工具：
- 始終開啟編譯器警告 (-Wall -Wextra) 並將警告視為錯誤 (-Werror)，這能幫你發現許多潛在問題。
- 使用執行時檢測工具，如GCC/Clang的 UndefinedBehaviorSanitizer (UBSan)。只需在編譯時加上 -fsanitize=undefined，它就能在程式執行時精確地捕獲有符號整數溢位等UB，是除錯這類問題的神器。

對於CSAPP Data Lab這道題來說，它的目的正是為了讓我們在“規則的鐐銬”下舞蹈，從而深刻理解整數表示、運算和編譯器行為。而我們在實際工程中，最安全、最清晰的寫法永遠是第一選擇。

CSAPP Cache Lab II: Optimizing Matrix Transposition

Louis Aeilot's Blog

Louis C Deng

2026年2月5日 08:00

In this part of the Cache Lab, the mission is simple yet devious: optimize matrix transposition for three specific sizes: 32x32, 64x64, and 61x67. Our primary enemy? Cache misses.

Matrix Transposition

A standard transposition swaps rows and columns directly:

void trans(int M, int N, int A[N][M], int B[M][N])
{
    int i, j, tmp;

    for (i = 0; i < N; i++) {
        for (j = 0; j < M; j++) {
            tmp = A[i][j];
            B[j][i] = tmp;
        }
    }    

}

While correct, this approach is a cache-miss nightmare because it ignores how data is actually stored in memory.

Cache Overview

To optimize effectively, we first have to understand our hardware constraints. The lab specifies a directly mapped cache with the following parameters:

Parameter	Value
Sets (S)	32
Block Size (B)	32 bytes
Associativity (E)	1 (Direct-mapped)
Integer Size	4 bytes
Capacity per line	8 integers

We will use Matrix Tiling and Loop Unrolling to optimize the codes.

32x32 Case

In this case, a row of the matrix needs 32/8 = 4 sets of cache to store. And cache conflicts occur every 32/4 = 8 rows. This makes 8x8 tiling the sweet spot.

int i,j,k;
int tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8;
for(i = 0; i<N; i+=8){
    for(j = 0; j<M; j+=8){
        for(k = i; k<N && k<i+8; k++) {
          // Read row from A
            tmp1 = A[k][j];
            tmp2 = A[k][j+1];
            tmp3 = A[k][j+2];
            tmp4 = A[k][j+3];
            tmp5 = A[k][j+4];
            tmp6 = A[k][j+5];
            tmp7 = A[k][j+6];
            tmp8 = A[k][j+7];

          // Write to columns of B
            B[j][k] = tmp1;
            B[j+1][k] = tmp2;
            B[j+2][k] = tmp3;
            B[j+3][k] = tmp4;
            B[j+4][k] = tmp5;
            B[j+5][k] = tmp6;
            B[j+6][k] = tmp7;
            B[j+7][k] = tmp8;
        }
    }
}

61x67 Case

int BLOCK_SIZE = 16;
int i,j,k,l,tmp;
int a,b;
for(i = 0; i<N; i+=BLOCK_SIZE){
    for(j = 0; j<M; j+=BLOCK_SIZE){
        a = i+BLOCK_SIZE;
        b = j+BLOCK_SIZE;
        for(k = i; k<N && k<a; k++) {
            for(l = j; l<M && l<b; l++){
                tmp = A[k][l];
                B[l][k] = tmp;
            }
        }
    }
}

64x64 Case

This is the hardest part. In a 64x64 matrix, a row needs 8 sets, but conflict misses occur every $32/8 = 4$ rows. If we use 8x8 tiling, the bottom half of the block will evict the top half.

We can try a 4x4 matrix tiling first.

int BLOCK_SIZE = 4;
int i,j,k,l,tmp;
int a,b;
for(i = 0; i<N; i+=BLOCK_SIZE){
    for(j = 0; j<M; j+=BLOCK_SIZE){
        a = i+BLOCK_SIZE;
        b = j+BLOCK_SIZE;
        for(k = i; k<N && k<a; k++) {
            for(l = j; l<M && l<b; l++){
                tmp = A[k][l];
                B[l][k] = tmp;
            }
        }
    }
}

But this isn’t enough to pass the miss-count threshold.

We try a 8x8 matrix tiling. We solve this by partitioning the $8 \times 8$ block into four $4 \times 4$ sub-blocks and using the upper-right corner of B as a “buffer” to store data temporarily.

Here are the steps:

Transpose $A_{TL}$ into $B_{TL}$ while simultaneously moving $A_{TR}$ into $B_{TR}$ (as a temp storage).
Move the stored $A_{TR}$ from $B_{TR}$ to its final position, while moving $A_{BL}$ into its spot.
Transpose $A_{BR}$ into $B_{BR}$ .

int i, j, k;
int tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8;

// Iterate through the matrix in 8x8 blocks to improve spatial locality
for (i = 0; i < N; i += 8) {
    for (j = 0; j < M; j += 8) {
        
        /**
         * STEP 1: Handle the top half of the 8x8 block (rows i to i+3)
         */
        for (k = 0; k < 4; k++) {
            // Read 8 elements from row i+k of matrix A into registers
            tmp1 = A[i + k][j];     tmp2 = A[i + k][j + 1];
            tmp3 = A[i + k][j + 2]; tmp4 = A[i + k][j + 3]; // Top-left 4x4
            tmp5 = A[i + k][j + 4]; tmp6 = A[i + k][j + 5];
            tmp7 = A[i + k][j + 6]; tmp8 = A[i + k][j + 7]; // Top-right 4x4

            // Transpose top-left 4x4 from A directly into top-left of B
            B[j][i + k]     = tmp1;
            B[j + 1][i + k] = tmp2;
            B[j + 2][i + k] = tmp3;
            B[j + 3][i + k] = tmp4;

            // Temporarily store top-right 4x4 of A in the top-right of B
            // This avoids cache misses by using the already-loaded cache line in B
            B[j][i + k + 4]     = tmp5;
            B[j + 1][i + k + 4] = tmp6;
            B[j + 2][i + k + 4] = tmp7;
            B[j + 3][i + k + 4] = tmp8;
        }

        /**
         * STEP 2: Handle the bottom half and fix the temporary placement
         */
        for (k = 0; k < 4; k++) {
            // Read bottom-left 4x4 column-wise from A
            tmp1 = A[i + 4][j + k]; tmp2 = A[i + 5][j + k];
            tmp3 = A[i + 6][j + k]; tmp4 = A[i + 7][j + k];
            
            // Read bottom-right 4x4 column-wise from A
            tmp5 = A[i + 4][j + k + 4]; tmp6 = A[i + 5][j + k + 4];
            tmp7 = A[i + 6][j + k + 4]; tmp8 = A[i + 7][j + k + 4];

            // Retrieve the top-right elements we temporarily stored in B in Step 1
            int t1 = B[j + k][i + 4];
            int t2 = B[j + k][i + 5];
            int t3 = B[j + k][i + 6];
            int t4 = B[j + k][i + 7];

            // Move bottom-left of A into the top-right of B
            B[j + k][i + 4] = tmp1;
            B[j + k][i + 5] = tmp2;
            B[j + k][i + 6] = tmp3;
            B[j + k][i + 7] = tmp4;

            // Move the retrieved temporary values into the bottom-left of B
            B[j + k + 4][i]     = t1;
            B[j + k + 4][i + 1] = t2;
            B[j + k + 4][i + 2] = t3;
            B[j + k + 4][i + 3] = t4;

            // Place bottom-right of A into the bottom-right of B
            B[j + k + 4][i + 4] = tmp5;
            B[j + k + 4][i + 5] = tmp6;
            B[j + k + 4][i + 6] = tmp7;
            B[j + k + 4][i + 7] = tmp8;
        }
    }
}

Note: The key trick here is traversing B by columns where possible (so B stays right in the cache) and utilizing local registers (temporary variables) to bridge the gap between conflicting cache lines.

Conclusion

Optimizing matrix transposition is less about the math and more about mechanical sympathy—understanding the underlying hardware to write code that plays nice with the CPU’s cache.

CSAPP Cache Lab I: Let's simulate a cache memory!

Louis Aeilot's Blog

Louis C Deng

2026年2月5日 00:45

For the CSAPP Cache Lab, the students are asked to write a small C program (200~300 lines) that simulates a cache memory.

The full code is here on GitHub.

Understanding a Cache

1. The Anatomy of a Cache ( $S$ , $E$ , $B$ , $m$ )

A cache can be described with the following four parameters:

$S = 2^s$ (Cache Sets): The cache is divided into sets.
$E$ (Cache Lines per set): This is the “associativity.”
- If $E=1$ , it’s a direct-mapped cache. If $E>1$ , it’s set-associative.
- Each line contains a valid bit, a tag, and the actual data block.
$B = 2^b$ (Block Size): The number of bytes stored in each line.
- The $b$ bits at the end of an address tell the cache the offset within that block.
$m$ : The bits of the machine memory address.

2. Address Decomposition

When the CPU wants to access a 64-bit address, the cache doesn’t look at the whole number at once. It slices the address into three distinct fields:

Field	Purpose
Tag	Used to uniquely identify the memory block within a specific set. `t = m - b - s`
Set Index	Determines which set the address maps to.
Block Offset	Identifies the specific byte within the cache line.

3. The “Search and Match” Process

When our simulator receives an address (e.g., from an L or S operation in the trace file), it follows these steps:

Find the Set: Use the set index bits to jump to the correct set in our cache structure.
Search the Lines: Look through all the lines in that set.

Hit: If a line has valid == true AND the tag matches the address tag.
Miss: If no line matches.

Handle the Miss:

Cold Start: If there is an empty line (valid == false), fill it with the new tag and set valid = true.
Eviction: If all lines are full, we must kick one out. This is where the LRU (Least Recently Used) policy comes in: we find the line that hasn’t been touched for the longest time and replace it.

Lab Requirements

For this Lab Project, we will write a cache simulator that takes a valgrind memory trace as an input.

Input

The input looks like:

I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,8

Each line denotes one or two memory accesses. The format of each line is

1	[space]operation address,size

The operation field denotes the type of memory access:

“I” denotes an instruction load, “L” a data load,
“S” a data store
“M” a data modify (i.e., a data load followed by a data store).

Mind you: There is never a space before each “I”. There is always a space before each “M”, “L”, and “S”.

The address field specifies a 64-bit hexadecimal memory address. The size field specifies the number of bytes accessed by the operation.

CLI

Our program should take the following command line arguments:

Usage: ./csim-ref [-hv] -s <s> -E <E> -b <b> -t <tracefile>

-h: Optional help flag that prints usage info
-v: Optional verbose flag that displays trace info
-s <s>: Number of set index bits (S = 2s is the number of sets)
-E <E>: Associativity (number of lines per set)
-b <b>: Number of block bits (B = 2b is the block size)
-t <tracefile>: Name of the valgrind trace to replay

Caveats

For this lab, we ignore all Is (the instruction cache accesses).

We assume that memory accesses are aligned properly, such that a single memory access never crosses block boundaries.

The Codes

We basically start from scratch, given an almost blank csim.c file to fill in. The file comes with only a main function and no header files.

Data Models

// Data Model
char* fileName = NULL;
int set_bit = -1;
long long sets = -1;
int associativity = -1;
int block_bit = -1;
long long block_size = -1;
bool verboseMode = false;

int global_timer = 0; // For LRU

int memory_bit = 64; // Assuming 64-bit addresses
int tag_bit = 0; // Tag bits

Handling Command-Line Arguments

First, we add the int argc, char** argv parameters to the main function. argc stands for argument count, while argv stands for argument values.

We use getopt to parse arguments.

void handleArgs(int argc, char** argv){
    int opt;

    while ((opt = getopt(argc, argv, "hvs:E:b:t:")) != -1) {
        switch(opt) {
            case 'h':
                printUsage(argv);
                exit(0);
            case 'v':
                verboseMode = true;
                break;
            case 't':
                fileName = optarg;
                break;
            case 's':
                set_bit = atoi(optarg);
                break;
            case 'E':
                associativity = atoi(optarg);
                break;
            case 'b':
                block_bit = atoi(optarg);
                break;
            case '?':
                printUsage(argv);
                exit(1);
            default:
                exit(1); 
        }
    }

    if(fileName == NULL || set_bit == -1 || associativity == -1 || block_bit == -1) {
        printf("Missing required command line argument");
        printUsage(argv);
        exit(1);
    }

    sets = 1LL << set_bit;
    block_size = 1LL << block_bit;
    
    tag_bit = memory_bit - (set_bit + block_bit);
}

getopt comes in unistd.h, but the compiler option is set to -std=c99, which hides all POSIX extensions. GNU systems provide a standalone <getopt.h> header. So we include getopt.h instead.

1	opt = getopt(argc, argv, "hvs:E:b:t:")

h and v: These are boolean flags.
s:, E:, b:, and t:: These are required arguments. The colon tells getopt that these flags must be followed by a value (e.g., -s 4).

After parsing the arguments, we set the initial value of our Cache Data Model.

sets = 1LL << set_bit;
block_size = 1LL << block_bit;

tag_bit = memory_bit - (set_bit + block_bit);

Initialize Cache

// Cache Line Structure
typedef struct CacheLine {
    bool valid;
    long long tag;
    /*
        Need LRU stamp to implement LRU eviction policy
    */
    int lru_counter;
} CacheLine;

CacheLine** cache = NULL;

void initCache() {
    // Initialize cache data structures
    cache = (CacheLine**) malloc(sizeof(CacheLine*) * sets);
    for(int i = 0; i<sets; i++){
        cache[i] = (CacheLine*) calloc(associativity, sizeof(CacheLine));
    }   
}

Caution: malloc has to be initialized. Or the data might contain garbage values.

So we use calloc. The calloc (stands for contiguous allocation) function is similar to malloc but it initializes the allocated memory to zero.

And don’t forget to free the allocated memory!

void freeCache() {
    // Free allocated memory for cache
    for(int i = 0; i<sets; i++) free(cache[i]);
    free(cache);
}

Handling File Input

  // Handle trace file
  FILE *traceFile = fopen(fileName, "r");
  if (traceFile == NULL) {
      printf("Error opening file: %s\n", fileName);
      exit(1);
  }
  char operation;
  long long address;
  int size;
  while (fscanf(traceFile, " %c %llx,%d", &operation, &address, &size) == 3) {
      switch (operation) {
          case 'L':
              // Handle load operation
              loadData(address, size);
              break;
          case 'S':
              // Handle store operation
              storeData(address, size);
              break;
          case 'M':
              // Handle modify operation
              modifyData(address, size);
              break;
          default:
              // Ignore other operations
              break;
      }
  }
  // Close trace file
fclose(traceFile);

Caution:

fscanf does not skip spaces before %c, so we add a space before %c in the format string.
!feof(traceFile) does not work correctly here.It only returns true after a read operation has already attempted to go past the end of the file and failed. Using it as a loop condition (e.g., while (!feof(p))) causes an “off-by-one” error, where the loop executes one extra time with garbage data from the last successful read.

Parsing Addresses

// Parse Line Structure
long long getTag(long long address) {
    return address >> (set_bit + block_bit);
}

long long getSetIndex(long long address) {
    long long mask = (1LL << set_bit) - 1;
    return (address >> block_bit) & mask;
}

long long getBlockOffset(long long address) {
    long long mask = (1LL << block_bit) - 1;
    return address & mask;
}

We use bit masks to parse the addresses.

Loading Cache

void loadData(long long address, int size) {
    // Simulate accessing data at the given address
    int s = getSetIndex(address);
    long long t = getTag(address);
    global_timer++;

    for (int i = 0; i < associativity; i++) {
        if (cache[s][i].valid && cache[s][i].tag == t) {
            hit_count++;
            cache[s][i].lru_counter = global_timer;
            if (verboseMode) printf(" hit");
            return;
        }
    }

    miss_count++;
    if (verboseMode) printf(" miss");

    for (int i = 0; i < associativity; i++) {
        if (!cache[s][i].valid) {
            cache[s][i].valid = true;
            cache[s][i].tag = t;
            cache[s][i].lru_counter = global_timer; 
            return;
        }
    }

    eviction_count++;
    if (verboseMode) printf(" eviction");

    int victim_index = 0;
    int min_lru = cache[s][0].lru_counter;

    for (int i = 1; i < associativity; i++) {
        if (cache[s][i].lru_counter < min_lru) {
            min_lru = cache[s][i].lru_counter;
            victim_index = i;
        }
    }

    cache[s][victim_index].tag = t;
    cache[s][victim_index].lru_counter = global_timer;
}

The code simulates the process of loading cache.

We first check if the data already exists in the cache.

If it doesn’t exist, we have to scan for blank lines to load the data.

If blank lines don’t exist, we need to evict a line using the LRU strategy. We replace the victim line with the new line.

Other Operations

void storeData(long long address, int size) {
    // Simulate storing data at the given address
    loadData(address, size);
}

void modifyData(long long address, int size) {
    // Simulate modifying data at the given address
    loadData(address, size);
    hit_count++;
    if (verboseMode) printf(" hit\n");
}

For this simulator, storing data and modifying data are basically the same thing as loading data.

Print Summary

We are asked to output the answer using the printSummary function.

1 2	// Print Summary printSummary(hit_count, miss_count, eviction_count);

And Voila!

                        Your simulator     Reference simulator
Points (s,E,b)    Hits  Misses  Evicts    Hits  Misses  Evicts
     3 (1,1,1)       9       8       6       9       8       6  traces/yi2.trace
     3 (4,2,4)       4       5       2       4       5       2  traces/yi.trace
     3 (2,1,4)       2       3       1       2       3       1  traces/dave.trace
     3 (2,1,3)     167      71      67     167      71      67  traces/trans.trace
     3 (2,2,3)     201      37      29     201      37      29  traces/trans.trace
     3 (2,4,3)     212      26      10     212      26      10  traces/trans.trace
     3 (5,1,5)     231       7       0     231       7       0  traces/trans.trace
     6 (5,1,5)  265189   21775   21743  265189   21775   21743  traces/long.trace
    27

Summary

In this project, we moved from the theory of hierarchy to the practical reality of memory management. By building this simulator, we reinforced several core concepts of computer systems.

CSAPP Bomb Lab 解析

Louis Aeilot's Blog

Louis C Deng

2025年12月21日 02:45

做完了 CSAPP Bomb Lab，寫一篇解析。

題目要求

環境

這個系統是在 x86_64 Linux 上運行的，而筆者的環境是 ARM 架構的 macOS (Apple Silicon)。

弄了半天 docker，虛擬化一個 x86_64 Ubuntu 出來，結果裡面的 gdb 不能用，不想折騰。

發現 educoder 上面有環境，可以直接用，而且免費，於是就在 educoder 上面完成了本實驗。

地址：https://www.educoder.net/paths/6g398fky

前置知識

本實驗要求掌握 gdb 的一些指令。

1. 啟動與退出 (Startup & Exit)

指令	縮寫	描述
`gdb executable`	-	啟動 GDB 並載入可執行文件。
`run [args]`	`r`	開始運行程序。如果有命令行參數，跟在後面（如 `r input.txt`）。
`quit`	`q`	退出 GDB。
`start`	-	運行程序並在 `main` 函數的第一行自動暫停（省去手動打斷點的麻煩）。
`set args ...`	-	設置運行時的參數（在 `r` 之前使用）。

2. 斷點管理 (Breakpoints)

指令	縮寫	描述	範例
`break <loc>`	`b`	設置斷點。支持函數名、行號、檔案名:行號。	`b main` `b 15` `b file.c:20`
`info breakpoints`	`i b`	查看當前所有斷點及其編號 (Num)。	-
`delete <Num>`	`d`	刪除指定編號的斷點。不加編號則刪除所有。	`d 1`
`disable/enable <Num>`	-	暫時禁用或啟用某個斷點（保留配置但不生效）。	`disable 2`
`break ... if <cond>`	-	條件斷點：僅當條件為真時才暫停（非常有用）。	`b 10 if i==5`

3. 執行控制 (Execution Control)

指令	縮寫	描述	區別點
`next`	`n`	單步跳過。執行下一行程式碼。	如果遇到函數調用，不進入函數內部，直接執行完該函數。
`step`	`s`	單步進入。執行下一行程式碼。	如果遇到函數調用，進入函數內部逐行除錯。
`continue`	`c`	繼續運行，直到遇到下一個斷點或程序結束。	-
`finish`	-	執行直到當前函數返回。	當你不小心 `s` 進了一個不想看的庫函數時，用這個跳出來。
`until <line>`	`u`	運行直到指定行號。	常用於快速跳出循環。

4. 查看數據 (Inspection)

指令	縮寫	描述
`print <var>`	`p`	列印變數的值。支持表達式（如 `p index + 1`）。
`display <var>`	-	持續顯示。每次程序暫停時，自動列印該變數的值（適合跟蹤循環中的變數）。
`info locals`	-	列印當前棧幀中所有局部變數的值。
`whatis <var>`	-	查看變數的數據類型。
`ptype <struct>`	-	查看結構體或類的具體定義（成員列表）。
`x /nfu <addr>`	`x`	查看記憶體。`n`是數量，`f`是格式(x=hex, d=dec, s=str)，`u`是單位(b=byte, w=word)。例如：`x/10xw &array` (以16進制顯示數組前10個word)。

5. 堆棧與上下文 (Stack & Context)

指令	縮寫	描述
`backtrace`	`bt`	查看調用棧。顯示程序崩潰時的函數調用路徑（從 main 到當前函數）。
`frame <Num>`	`f`	切換到指定的堆棧幀（配合 `bt` 看到的編號）。切換後可以用 `p` 查看該層函數的局部變數。
`list`	`l`	顯示當前行附近的原始碼。

6. 提升體驗：TUI 模式 (Text User Interface)

layout src：螢幕分為兩半，上面顯示原始碼和當前執行行，下面是命令窗口。（強烈推薦）
layout asm：顯示匯編代碼。
layout split：同時顯示原始碼和匯編。

反匯編

我們可以使用 objdump 直接進行反匯編，查看匯編原始碼。

1	objdump -d bomb > bomb.asm

我們可以觀察到，幾個 phase 其實是幾個函數，phase_x()。

strings

在終端輸入：

1	strings bomb

這會把 bomb 文件裡所有連續的可列印字元（ASCII）都列印出來。

Phase 1

我們先看看 phase_1 長什麼樣子，disas phase_1

Dump of assembler code for function phase_1:
   0x0000000000400ee0 <+0>:     sub    $0x8,%rsp
   0x0000000000400ee4 <+4>:     mov    $0x402400,%esi
   0x0000000000400ee9 <+9>:     callq  0x401338 <strings_not_equal>
   0x0000000000400eee <+14>:    test   %eax,%eax
   0x0000000000400ef0 <+16>:    je     0x400ef7 <phase_1+23>
   0x0000000000400ef2 <+18>:    callq  0x40143a <explode_bomb>
   0x0000000000400ef7 <+23>:    add    $0x8,%rsp
   0x0000000000400efb <+27>:    retq   
End of assembler dump.

sub $0x8,%rsp 是設置棧幀，在這裡不用管。

mov $0x402400,%esi 和 callq 0x401338 <strings_not_equal> 似乎進行了字串的 strcmp。

接下來 je 0x400ef7 <phase_1+23> 就很明顯了，如果相等跳出炸彈。

設置斷點，b phase_1

之後運行程序，r，隨便輸入一些內容，就可以觸發斷點

以字串形式查看 0x402400 所指向的記憶體：x/s 0x402400

1	0x402400: "Border relations with Canada have never been better."

我們找到了答案。

Phase 2

還是先反匯編：

Dump of assembler code for function phase_2:
   0x0000000000400efc <+0>:     push   %rbp
   0x0000000000400efd <+1>:     push   %rbx
   0x0000000000400efe <+2>:     sub    $0x28,%rsp
   0x0000000000400f02 <+6>:     mov    %rsp,%rsi
   0x0000000000400f05 <+9>:     callq  0x40145c <read_six_numbers>
   0x0000000000400f0a <+14>:    cmpl   $0x1,(%rsp)
   0x0000000000400f0e <+18>:    je     0x400f30 <phase_2+52>
   0x0000000000400f10 <+20>:    callq  0x40143a <explode_bomb>
   0x0000000000400f15 <+25>:    jmp    0x400f30 <phase_2+52>
   0x0000000000400f17 <+27>:    mov    -0x4(%rbx),%eax
   0x0000000000400f1a <+30>:    add    %eax,%eax
   0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
   0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>
   0x0000000000400f20 <+36>:    callq  0x40143a <explode_bomb>
   0x0000000000400f25 <+41>:    add    $0x4,%rbx
   0x0000000000400f29 <+45>:    cmp    %rbp,%rbx
   0x0000000000400f2c <+48>:    jne    0x400f17 <phase_2+27>
   0x0000000000400f2e <+50>:    jmp    0x400f3c <phase_2+64>
   0x0000000000400f30 <+52>:    lea    0x4(%rsp),%rbx
   0x0000000000400f35 <+57>:    lea    0x18(%rsp),%rbp
   0x0000000000400f3a <+62>:    jmp    0x400f17 <phase_2+27>
   0x0000000000400f3c <+64>:    add    $0x28,%rsp
   0x0000000000400f40 <+68>:    pop    %rbx
   0x0000000000400f41 <+69>:    pop    %rbp
   0x0000000000400f42 <+70>:    retq   
End of assembler dump.

0x0000000000400f05 <+9>: callq 0x40145c <read_six_numbers> 這裡看到 read_six_numbers

我們可以反匯編 read_six_numbers

Dump of assembler code for function read_six_numbers:
   0x000000000040145c <+0>:     sub    $0x18,%rsp
   0x0000000000401460 <+4>:     mov    %rsi,%rdx
   0x0000000000401463 <+7>:     lea    0x4(%rsi),%rcx
   0x0000000000401467 <+11>:    lea    0x14(%rsi),%rax
   0x000000000040146b <+15>:    mov    %rax,0x8(%rsp)
   0x0000000000401470 <+20>:    lea    0x10(%rsi),%rax
   0x0000000000401474 <+24>:    mov    %rax,(%rsp)
   0x0000000000401478 <+28>:    lea    0xc(%rsi),%r9
   0x000000000040147c <+32>:    lea    0x8(%rsi),%r8
   0x0000000000401480 <+36>:    mov    $0x4025c3,%esi
   0x0000000000401485 <+41>:    mov    $0x0,%eax
   0x000000000040148a <+46>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x000000000040148f <+51>:    cmp    $0x5,%eax
   0x0000000000401492 <+54>:    jg     0x401499 <read_six_numbers+61>
   0x0000000000401494 <+56>:    callq  0x40143a <explode_bomb>
   0x0000000000401499 <+61>:    add    $0x18,%rsp
   0x000000000040149d <+65>:    retq   
End of assembler dump.

看到有一行 callq 0x400bf0 <__isoc99_sscanf@plt>，調用了 sscanf

我們看一眼 $0x4025c3，x/s 0x4025c3，得到 %d %d %d %d %d %d，確實是讀了六個數字。

函數調用時，參數多於六個，就會丟到棧裡面去。我們看到：

0x0000000000401460 <+4>:     mov    %rsi,%rdx
0x0000000000401463 <+7>:     lea    0x4(%rsi),%rcx
0x0000000000401467 <+11>:    lea    0x14(%rsi),%rax
0x000000000040146b <+15>:    mov    %rax,0x8(%rsp)
0x0000000000401470 <+20>:    lea    0x10(%rsi),%rax
0x0000000000401474 <+24>:    mov    %rax,(%rsp)
0x0000000000401478 <+28>:    lea    0xc(%rsi),%r9
0x000000000040147c <+32>:    lea    0x8(%rsi),%r8

參數順序：rdi, rsi, rdx, rcx, r8, r9，超過了六個參數。rsp 為棧頂指針，多於六個的參數存在棧上。

於是讀取的六個數字依次存為：rsi, rsi+4, rsi+8, rsi+12, rsi+16 (0x10 = 16), rsi+20 (0x14 = 20)

再回到 phase_2

1	0x0000000000400f02 <+6>: mov %rsp,%rsi

棧頂指針作為參數傳入了 read_six_numbers，因此，這六個數字應該是在 phase_2 對應棧幀的棧上

1
2
3

0x0000000000400f0a <+14>:    cmpl   $0x1,(%rsp)
0x0000000000400f0e <+18>:    je     0x400f30 <phase_2+52>
0x0000000000400f10 <+20>:    callq  0x40143a <explode_bomb>

這裡判斷棧頂元素是否是 1，也就是說第一個元素是否是 1

之後跳轉到了 0x400f30

0x0000000000400f17 <+27>:    mov    -0x4(%rbx),%eax
0x0000000000400f1a <+30>:    add    %eax,%eax
0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>
0x0000000000400f20 <+36>:    callq  0x40143a <explode_bomb>
0x0000000000400f25 <+41>:    add    $0x4,%rbx
0x0000000000400f29 <+45>:    cmp    %rbp,%rbx
0x0000000000400f2c <+48>:    jne    0x400f17 <phase_2+27>
0x0000000000400f2e <+50>:    jmp    0x400f3c <phase_2+64>
0x0000000000400f30 <+52>:    lea    0x4(%rsp),%rbx
0x0000000000400f35 <+57>:    lea    0x18(%rsp),%rbp
0x0000000000400f3a <+62>:    jmp    0x400f17 <phase_2+27>

這裡很顯然是一個循環，依次讀取六個數位（每次移動四個位元組，正好是 int 的長度）

1
2
3

0x0000000000400f1a <+30>:    add    %eax,%eax
0x0000000000400f1c <+32>:    cmp    %eax,(%rbx)
0x0000000000400f1e <+34>:    je     0x400f25 <phase_2+41>

這六個數字，後一個是前一個的兩倍。

於是我們可以得到答案：1 2 4 8 16 32

我們也可以把代碼翻譯成 C 語言：

for (int i = 1; i < 6; i++) {
    // mov -0x4(%rbx), %eax 
    int previous = num[i-1];
    // add %eax, %eax
    int expected = previous + previous; 
    // cmp %eax, (%rbx)
    if (num[i] != expected) {
        explode_bomb();
    }
}

Phase 3

反匯編：

Dump of assembler code for function phase_3:
   0x0000000000400f43 <+0>:     sub    $0x18,%rsp
   0x0000000000400f47 <+4>:     lea    0xc(%rsp),%rcx
   0x0000000000400f4c <+9>:     lea    0x8(%rsp),%rdx
   0x0000000000400f51 <+14>:    mov    $0x4025cf,%esi
   0x0000000000400f56 <+19>:    mov    $0x0,%eax
   0x0000000000400f5b <+24>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x0000000000400f60 <+29>:    cmp    $0x1,%eax
   0x0000000000400f63 <+32>:    jg     0x400f6a <phase_3+39>
   0x0000000000400f65 <+34>:    callq  0x40143a <explode_bomb>
   0x0000000000400f6a <+39>:    cmpl   $0x7,0x8(%rsp)
   0x0000000000400f6f <+44>:    ja     0x400fad <phase_3+106>
   0x0000000000400f71 <+46>:    mov    0x8(%rsp),%eax
   0x0000000000400f75 <+50>:    jmpq   *0x402470(,%rax,8)
   0x0000000000400f7c <+57>:    mov    $0xcf,%eax
   0x0000000000400f81 <+62>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f83 <+64>:    mov    $0x2c3,%eax
   0x0000000000400f88 <+69>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f8a <+71>:    mov    $0x100,%eax
   0x0000000000400f8f <+76>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f91 <+78>:    mov    $0x185,%eax
   0x0000000000400f96 <+83>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f98 <+85>:    mov    $0xce,%eax
   0x0000000000400f9d <+90>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400f9f <+92>:    mov    $0x2aa,%eax
   0x0000000000400fa4 <+97>:    jmp    0x400fbe <phase_3+123>
   0x0000000000400fa6 <+99>:    mov    $0x147,%eax
   0x0000000000400fab <+104>:   jmp    0x400fbe <phase_3+123>
   0x0000000000400fad <+106>:   callq  0x40143a <explode_bomb>
   0x0000000000400fb2 <+111>:   mov    $0x0,%eax
   0x0000000000400fb7 <+116>:   jmp    0x400fbe <phase_3+123>
   0x0000000000400fb9 <+118>:   mov    $0x137,%eax
   0x0000000000400fbe <+123>:   cmp    0xc(%rsp),%eax
   0x0000000000400fc2 <+127>:   je     0x400fc9 <phase_3+134>
   0x0000000000400fc4 <+129>:   callq  0x40143a <explode_bomb>
   0x0000000000400fc9 <+134>:   add    $0x18,%rsp
   0x0000000000400fcd <+138>:   retq

看著有點複雜，觀察到 sscanf

看一眼 0x4025cf，x/s 0x4025cf，得到 %d %d，看起來是輸入了兩個整數

1 2	0x0000000000400f47 <+4>: lea 0xc(%rsp),%rcx 0x0000000000400f4c <+9>: lea 0x8(%rsp),%rdx

這兩個整數依次存為 rsp+8, rsp+c

1 2	0x0000000000400f6a <+39>: cmpl $0x7,0x8(%rsp) 0x0000000000400f6f <+44>: ja 0x400fad <phase_3+106>

這裡判斷了第一個數，如果這個數大於 7，就會引爆

1 2	0x0000000000400f71 <+46>: mov 0x8(%rsp),%eax 0x0000000000400f75 <+50>: jmpq *0x402470(,%rax,8)

我們把第一個整數存入 eax，這裡很明顯是一個 switch 的跳轉表：0x402470 + 8*rax

我們來讀取 10 個，x/10x 0x402470，得到：

1
2
3

0x402470:       0x00400f7c      0x00000000      0x00400fb9      0x00000000
0x402480:       0x00400f83      0x00000000      0x00400f8a      0x00000000
0x402490:       0x00400f91      0x00000000

這是 switch 語句的跳轉表，與匯編代碼中對應。

我們隨便選一個就能得到正確答案，如，0 對應 0x00400f7c

0x0000000000400f7c <+57>:    mov    $0xcf,%eax
0x0000000000400f81 <+62>:    jmp    0x400fbe <phase_3+123>
...
0x0000000000400fbe <+123>:   cmp    0xc(%rsp),%eax
0x0000000000400fc2 <+127>:   je     0x400fc9 <phase_3+134>
0x0000000000400fc4 <+129>:   callq  0x40143a <explode_bomb>

第二個數和 eax 比較，相等就拆除成功

我們得到第二個數 0xcf = 207

於是，答案是 0 207

實際上，答案並不唯一，觀察代碼可以知道，每一個 switch 分支中，都對應了一個第二個整數的正確答案。

Phase 4

反編譯：

Dump of assembler code for function phase_4:
   0x000000000040100c <+0>:     sub    $0x18,%rsp
   0x0000000000401010 <+4>:     lea    0xc(%rsp),%rcx
   0x0000000000401015 <+9>:     lea    0x8(%rsp),%rdx
   0x000000000040101a <+14>:    mov    $0x4025cf,%esi
   0x000000000040101f <+19>:    mov    $0x0,%eax
   0x0000000000401024 <+24>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x0000000000401029 <+29>:    cmp    $0x2,%eax
   0x000000000040102c <+32>:    jne    0x401035 <phase_4+41>
   0x000000000040102e <+34>:    cmpl   $0xe,0x8(%rsp)
   0x0000000000401033 <+39>:    jbe    0x40103a <phase_4+46>
   0x0000000000401035 <+41>:    callq  0x40143a <explode_bomb>
   0x000000000040103a <+46>:    mov    $0xe,%edx
   0x000000000040103f <+51>:    mov    $0x0,%esi
   0x0000000000401044 <+56>:    mov    0x8(%rsp),%edi
   0x0000000000401048 <+60>:    callq  0x400fce <func4>
   0x000000000040104d <+65>:    test   %eax,%eax
   0x000000000040104f <+67>:    jne    0x401058 <phase_4+76>
   0x0000000000401051 <+69>:    cmpl   $0x0,0xc(%rsp)
   0x0000000000401056 <+74>:    je     0x40105d <phase_4+81>
   0x0000000000401058 <+76>:    callq  0x40143a <explode_bomb>
   0x000000000040105d <+81>:    add    $0x18,%rsp
   0x0000000000401061 <+85>:    retq   
End of assembler dump.

我們還是看到 sscanf

讀一下 0x4025cf，得到 %d %d，看起來又是讀兩個數字，分別存入 rdx, rcx

接著往下讀，jbe 0x40103a，要求 rdx <= 14

1
2
3

0x000000000040103a <+46>:    mov    $0xe,%edx
0x000000000040103f <+51>:    mov    $0x0,%esi
0x0000000000401044 <+56>:    mov    0x8(%rsp),%edi

明顯在傳參，調用了 func4

我們先不急著看 func4，接著往下讀

0x000000000040104d <+65>:    test   %eax,%eax
0x000000000040104f <+67>:    jne    0x401058 <phase_4+76>
...
0x0000000000401058 <+76>:    callq  0x40143a <explode_bomb>

回顧一下暫存器知識，eax 在這裡是函數的返回值，這裡要求返回值等於 0

1 2	0x0000000000401051 <+69>: cmpl $0x0,0xc(%rsp) 0x0000000000401056 <+74>: je 0x40105d <phase_4+81>

這裡要求讀取到的第二個數是 0，算是得到了半個答案

接下來我們看 func4

Dump of assembler code for function func4:
   0x0000000000400fce <+0>:     sub    $0x8,%rsp
   0x0000000000400fd2 <+4>:     mov    %edx,%eax
   0x0000000000400fd4 <+6>:     sub    %esi,%eax
   0x0000000000400fd6 <+8>:     mov    %eax,%ecx
   0x0000000000400fd8 <+10>:    shr    $0x1f,%ecx
   0x0000000000400fdb <+13>:    add    %ecx,%eax
   0x0000000000400fdd <+15>:    sar    %eax
   0x0000000000400fdf <+17>:    lea    (%rax,%rsi,1),%ecx
   0x0000000000400fe2 <+20>:    cmp    %edi,%ecx
   0x0000000000400fe4 <+22>:    jle    0x400ff2 <func4+36>
   0x0000000000400fe6 <+24>:    lea    -0x1(%rcx),%edx
   0x0000000000400fe9 <+27>:    callq  0x400fce <func4>
   0x0000000000400fee <+32>:    add    %eax,%eax
   0x0000000000400ff0 <+34>:    jmp    0x401007 <func4+57>
   0x0000000000400ff2 <+36>:    mov    $0x0,%eax
   0x0000000000400ff7 <+41>:    cmp    %edi,%ecx
   0x0000000000400ff9 <+43>:    jge    0x401007 <func4+57>
   0x0000000000400ffb <+45>:    lea    0x1(%rcx),%esi
   0x0000000000400ffe <+48>:    callq  0x400fce <func4>
   0x0000000000401003 <+53>:    lea    0x1(%rax,%rax,1),%eax
   0x0000000000401007 <+57>:    add    $0x8,%rsp
   0x000000000040100b <+61>:    retq   
End of assembler dump.

這個代碼裡面包含遞迴，我們可以手動把這段代碼翻譯到 C 語言：

// edx = 14, esi = 0, edi = a
int func4(int edi, int esi, int edx){
    int mid = l + ((r-l)>>1);
    if(mid <= a){
        if(mid==a){
            return 0;
        }
        l = mid + 1;
        return 2*func4(a, l, r) + 1;
    }else{
        r = mid - 1;
        return 2*func4(a, l, r);
    }
}

這是二分尋找，我們很容易得到答案 a=7，於是返回 0

得到最終的答案 7 0

0x0000000000400fd2 <+4>:     mov    %edx,%eax
0x0000000000400fd4 <+6>:     sub    %esi,%eax
0x0000000000400fd6 <+8>:     mov    %eax,%ecx
0x0000000000400fd8 <+10>:    shr    $0x1f,%ecx
0x0000000000400fdb <+13>:    add    %ecx,%eax
0x0000000000400fdd <+15>:    sar    %eax
0x0000000000400fdf <+17>:    lea    (%rax,%rsi,1),%ecx

這一段代碼就是在計算 mid，非常好理解，但是有個問題：shr $0x1f,%ecx 是在做什麼？

偏置

整數除法要求向零捨入。對於正數，向下捨入；對於負數，向上捨入。除以2的冪可以用右移操作替代。

但是，對於補碼右移，很可能出現捨入錯誤。

我們進行右移的時候，其實是捨去了最低位，是一種向下取整

$x = \underbrace{\sum_{i=k}^{w-1} x_i 2^i}_{\text{高位部分}} + \underbrace{\sum_{i=0}^{k-1} x_i 2^i}_{\text{低位部分}}$

當我們執行右移 x >> k 時：高位部分的權重全部除以了 $2^k$ ，變成了整數結果。低位部分（餘數）直接被丟棄了。

對於負數而言，這一操作進行了向下取整，但我們要求對負數進行向上取整。

因此，我們需要引入偏置。

$\text{對於整數 } x \text{ 和 } y(y>0)，\lceil x/y \rceil = \lfloor (x+y-1)/y \rfloor$

於是 (x+(1<<k)-1)>>k 得到 $\lceil x/2^k \rceil$

也就是下面這兩行的含義

1 2	0x0000000000400fd8 <+10>: shr $0x1f,%ecx 0x0000000000400fdb <+13>: add %ecx,%eax

Phase 5

我們先disas看代碼

Dump of assembler code for function phase_5:
   0x0000000000401062 <+0>:     push   %rbx
   0x0000000000401063 <+1>:     sub    $0x20,%rsp
   0x0000000000401067 <+5>:     mov    %rdi,%rbx
   0x000000000040106a <+8>:     mov    %fs:0x28,%rax
   0x0000000000401073 <+17>:    mov    %rax,0x18(%rsp)
   0x0000000000401078 <+22>:    xor    %eax,%eax
   0x000000000040107a <+24>:    callq  0x40131b <string_length>
   0x000000000040107f <+29>:    cmp    $0x6,%eax
   0x0000000000401082 <+32>:    je     0x4010d2 <phase_5+112>
   0x0000000000401084 <+34>:    callq  0x40143a <explode_bomb>
   0x0000000000401089 <+39>:    jmp    0x4010d2 <phase_5+112>
   0x000000000040108b <+41>:    movzbl (%rbx,%rax,1),%ecx
   0x000000000040108f <+45>:    mov    %cl,(%rsp)
   0x0000000000401092 <+48>:    mov    (%rsp),%rdx
   0x0000000000401096 <+52>:    and    $0xf,%edx
   0x0000000000401099 <+55>:    movzbl 0x4024b0(%rdx),%edx
   0x00000000004010a0 <+62>:    mov    %dl,0x10(%rsp,%rax,1)
   0x00000000004010a4 <+66>:    add    $0x1,%rax
   0x00000000004010a8 <+70>:    cmp    $0x6,%rax
   0x00000000004010ac <+74>:    jne    0x40108b <phase_5+41>
   0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
   0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
   0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
   0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
   0x00000000004010c2 <+96>:    test   %eax,%eax
   0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
   0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
   0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
   0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>
   0x00000000004010d2 <+112>:   mov    $0x0,%eax
   0x00000000004010d7 <+117>:   jmp    0x40108b <phase_5+41>
   0x00000000004010d9 <+119>:   mov    0x18(%rsp),%rax
   0x00000000004010de <+124>:   xor    %fs:0x28,%rax
   0x00000000004010e7 <+133>:   je     0x4010ee <phase_5+140>
   0x00000000004010e9 <+135>:   callq  0x400b30 <__stack_chk_fail@plt>
   0x00000000004010ee <+140>:   add    $0x20,%rsp
   0x00000000004010f2 <+144>:   pop    %rbx
   0x00000000004010f3 <+145>:   retq   
End of assembler dump.

很快識別出來，這一段代碼中有兩個記憶體地址：0x4024b0 0x40245e

讀一下：

1 2	0x4024b0 <array.3449>: "maduiersnfotvbylSo you think you can stop the bomb with ctrl-c, do you?" 0x40245e: "flyers"

第一個 array.3449 是一個字串，我們就記為 a[]

上面的代碼可以分個段

0x0000000000401062 <+0>:     push   %rbx
0x0000000000401063 <+1>:     sub    $0x20,%rsp
0x0000000000401067 <+5>:     mov    %rdi,%rbx
0x000000000040106a <+8>:     mov    %fs:0x28,%rax
0x0000000000401073 <+17>:    mov    %rax,0x18(%rsp)
0x0000000000401078 <+22>:    xor    %eax,%eax
0x000000000040107a <+24>:    callq  0x40131b <string_length>
0x000000000040107f <+29>:    cmp    $0x6,%eax
0x0000000000401082 <+32>:    je     0x4010d2 <phase_5+112>
0x0000000000401084 <+34>:    callq  0x40143a <explode_bomb>
0x0000000000401089 <+39>:    jmp    0x4010d2 <phase_5+112>

這裡是前面初始化的部分，我們可以看到預留了棧空間，應該是讀取了一個字串，長度為 6，存在棧上。

0x00000000004010d2 <+112>:   mov    $0x0,%eax
0x00000000004010d7 <+117>:   jmp    0x40108b <phase_5+41>
...
0x000000000040108b <+41>:    movzbl (%rbx,%rax,1),%ecx
0x000000000040108f <+45>:    mov    %cl,(%rsp)
0x0000000000401092 <+48>:    mov    (%rsp),%rdx
0x0000000000401096 <+52>:    and    $0xf,%edx
0x0000000000401099 <+55>:    movzbl 0x4024b0(%rdx),%edx
0x00000000004010a0 <+62>:    mov    %dl,0x10(%rsp,%rax,1)
0x00000000004010a4 <+66>:    add    $0x1,%rax
0x00000000004010a8 <+70>:    cmp    $0x6,%rax
0x00000000004010ac <+74>:    jne    0x40108b <phase_5+41>

以上是一個 for 循環，循環 6 次，取 edx 的後四位，這是一個 0~15 的數，記為 i，於是把 a[i] 加入棧中對應位置

0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
0x00000000004010c2 <+96>:    test   %eax,%eax
0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>
...
0x00000000004010d9 <+119>:   mov    0x18(%rsp),%rax
0x00000000004010de <+124>:   xor    %fs:0x28,%rax
0x00000000004010e7 <+133>:   je     0x4010ee <phase_5+140>
0x00000000004010e9 <+135>:   callq  0x400b30 <__stack_chk_fail@plt>
0x00000000004010ee <+140>:   add    $0x20,%rsp
0x00000000004010f2 <+144>:   pop    %rbx
0x00000000004010f3 <+145>:   retq

這裡有價值的片段只有

0x00000000004010ae <+76>:    movb   $0x0,0x16(%rsp)
0x00000000004010b3 <+81>:    mov    $0x40245e,%esi
0x00000000004010b8 <+86>:    lea    0x10(%rsp),%rdi
0x00000000004010bd <+91>:    callq  0x401338 <strings_not_equal>
0x00000000004010c2 <+96>:    test   %eax,%eax
0x00000000004010c4 <+98>:    je     0x4010d9 <phase_5+119>
0x00000000004010c6 <+100>:   callq  0x40143a <explode_bomb>
0x00000000004010cb <+105>:   nopl   0x0(%rax,%rax,1)
0x00000000004010d0 <+110>:   jmp    0x4010d9 <phase_5+119>

這是比較字串。

這個過程可以總結為： Input Char -> ASCII Hex -> AND 0xF (取後4位) -> Table Index -> Lookup Table Char -> Target “flyers”

於是我們可以得到答案 ionefg 或者 IONEFG

其實還可以有一些其他答案，留給讀者去發現

Phase 6

先看代碼

0x00000000004010f4 <+0>:     push   %r14
0x00000000004010f6 <+2>:     push   %r13
0x00000000004010f8 <+4>:     push   %r12
0x00000000004010fa <+6>:     push   %rbp
0x00000000004010fb <+7>:     push   %rbx
0x00000000004010fc <+8>:     sub    $0x50,%rsp
0x0000000000401100 <+12>:    mov    %rsp,%r13
0x0000000000401103 <+15>:    mov    %rsp,%rsi
0x0000000000401106 <+18>:    callq  0x40145c <read_six_numbers>
0x000000000040110b <+23>:    mov    %rsp,%r14
0x000000000040110e <+26>:    mov    $0x0,%r12d
0x0000000000401114 <+32>:    mov    %r13,%rbp
0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
0x0000000000401121 <+45>:    jbe    0x401128 <phase_6+52>
0x0000000000401123 <+47>:    callq  0x40143a <explode_bomb>
0x0000000000401128 <+52>:    add    $0x1,%r12d
0x000000000040112c <+56>:    cmp    $0x6,%r12d
0x0000000000401130 <+60>:    je     0x401153 <phase_6+95>
0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>
0x0000000000401153 <+95>:    lea    0x18(%rsp),%rsi
0x0000000000401158 <+100>:   mov    %r14,%rax
0x000000000040115b <+103>:   mov    $0x7,%ecx
0x0000000000401160 <+108>:   mov    %ecx,%edx
0x0000000000401162 <+110>:   sub    (%rax),%edx
0x0000000000401164 <+112>:   mov    %edx,(%rax)
0x0000000000401166 <+114>:   add    $0x4,%rax
0x000000000040116a <+118>:   cmp    %rsi,%rax
0x000000000040116d <+121>:   jne    0x401160 <phase_6+108>
0x000000000040116f <+123>:   mov    $0x0,%esi
0x0000000000401174 <+128>:   jmp    0x401197 <phase_6+163>
0x0000000000401176 <+130>:   mov    0x8(%rdx),%rdx
0x000000000040117a <+134>:   add    $0x1,%eax
0x000000000040117d <+137>:   cmp    %ecx,%eax
0x000000000040117f <+139>:   jne    0x401176 <phase_6+130>
0x0000000000401181 <+141>:   jmp    0x401188 <phase_6+148>
0x0000000000401183 <+143>:   mov    $0x6032d0,%edx
0x0000000000401188 <+148>:   mov    %rdx,0x20(%rsp,%rsi,2)
0x000000000040118d <+153>:   add    $0x4,%rsi
0x0000000000401191 <+157>:   cmp    $0x18,%rsi
0x0000000000401195 <+161>:   je     0x4011ab <phase_6+183>
0x0000000000401197 <+163>:   mov    (%rsp,%rsi,1),%ecx
0x000000000040119a <+166>:   cmp    $0x1,%ecx
0x000000000040119d <+169>:   jle    0x401183 <phase_6+143>
0x000000000040119f <+171>:   mov    $0x1,%eax
0x00000000004011a4 <+176>:   mov    $0x6032d0,%edx
0x00000000004011a9 <+181>:   jmp    0x401176 <phase_6+130>
0x00000000004011ab <+183>:   mov    0x20(%rsp),%rbx
0x00000000004011b0 <+188>:   lea    0x28(%rsp),%rax
0x00000000004011b5 <+193>:   lea    0x50(%rsp),%rsi
0x00000000004011ba <+198>:   mov    %rbx,%rcx
0x00000000004011bd <+201>:   mov    (%rax),%rdx
0x00000000004011c0 <+204>:   mov    %rdx,0x8(%rcx)
0x00000000004011c4 <+208>:   add    $0x8,%rax
0x00000000004011c8 <+212>:   cmp    %rsi,%rax
0x00000000004011cb <+215>:   je     0x4011d2 <phase_6+222>
0x00000000004011cd <+217>:   mov    %rdx,%rcx
0x00000000004011d0 <+220>:   jmp    0x4011bd <phase_6+201>
0x00000000004011d2 <+222>:   movq   $0x0,0x8(%rdx)
0x00000000004011da <+230>:   mov    $0x5,%ebp
0x00000000004011df <+235>:   mov    0x8(%rbx),%rax
0x00000000004011e3 <+239>:   mov    (%rax),%eax
0x00000000004011e5 <+241>:   cmp    %eax,(%rbx)
0x00000000004011e7 <+243>:   jge    0x4011ee <phase_6+250>
0x00000000004011e9 <+245>:   callq  0x40143a <explode_bomb>
0x00000000004011ee <+250>:   mov    0x8(%rbx),%rbx
0x00000000004011f2 <+254>:   sub    $0x1,%ebp
0x00000000004011f5 <+257>:   jne    0x4011df <phase_6+235>
0x00000000004011f7 <+259>:   add    $0x50,%rsp
0x00000000004011fb <+263>:   pop    %rbx
0x00000000004011fc <+264>:   pop    %rbp
0x00000000004011fd <+265>:   pop    %r12
0x00000000004011ff <+267>:   pop    %r13
0x0000000000401201 <+269>:   pop    %r14
0x0000000000401203 <+271>:   retq

分開來看：

0x00000000004010f4 <+0>:     push   %r14
0x00000000004010f6 <+2>:     push   %r13
0x00000000004010f8 <+4>:     push   %r12
0x00000000004010fa <+6>:     push   %rbp
0x00000000004010fb <+7>:     push   %rbx
0x00000000004010fc <+8>:     sub    $0x50,%rsp
0x0000000000401100 <+12>:    mov    %rsp,%r13
0x0000000000401103 <+15>:    mov    %rsp,%rsi

這一段是設置棧幀

1	0x0000000000401106 <+18>: callq 0x40145c <read_six_numbers>

這裡讀了 6 個數字，我們在 Phase 2 已經看到，這六個數字存在從 rsp 開始的一個數組中。

0x000000000040110b <+23>:    mov    %rsp,%r14
0x000000000040110e <+26>:    mov    $0x0,%r12d
0x0000000000401114 <+32>:    mov    %r13,%rbp
0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
0x0000000000401121 <+45>:    jbe    0x401128 <phase_6+52>
0x0000000000401123 <+47>:    callq  0x40143a <explode_bomb>
0x0000000000401128 <+52>:    add    $0x1,%r12d
0x000000000040112c <+56>:    cmp    $0x6,%r12d
0x0000000000401130 <+60>:    je     0x401153 <phase_6+95>
0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>

此處代碼構建了一個典型的嵌套循環結構：外層循環由 %r12d 計數，內層循環則由 %ebx 控制。

0x0000000000401117 <+35>:    mov    0x0(%r13),%eax
0x000000000040111b <+39>:    sub    $0x1,%eax
0x000000000040111e <+42>:    cmp    $0x5,%eax
...
0x000000000040114d <+89>:    add    $0x4,%r13
0x0000000000401151 <+93>:    jmp    0x401114 <phase_6+32>

首先分析外層循環：它通過 %r13 指針遍歷輸入數組，首要任務是進行邊界檢查，確保讀取到的每一個數字都小於或等於 6。

再來看內層循環：

0x0000000000401132 <+62>:    mov    %r12d,%ebx
0x0000000000401135 <+65>:    movslq %ebx,%rax
0x0000000000401138 <+68>:    mov    (%rsp,%rax,4),%eax
0x000000000040113b <+71>:    cmp    %eax,0x0(%rbp)
0x000000000040113e <+74>:    jne    0x401145 <phase_6+81>
0x0000000000401140 <+76>:    callq  0x40143a <explode_bomb>
0x0000000000401145 <+81>:    add    $0x1,%ebx
0x0000000000401148 <+84>:    cmp    $0x5,%ebx
0x000000000040114b <+87>:    jle    0x401135 <phase_6+65>

這裡從當前外層數字開始，判斷數組之後的每一個數位（int 類型，4 位元組，故 (%rsp,%rax,4) 獲得當前數字），判斷這個數字是否和外層數字相同。

於是，我們發現，這一層循環判斷輸入的每個數字是否互不相同。

總結一下，這個嵌套循環檢查我們的輸入是否是六個互不相同的小於等於 6 的數字

0x0000000000401153 <+95>:    lea    0x18(%rsp),%rsi
0x0000000000401158 <+100>:   mov    %r14,%rax
0x000000000040115b <+103>:   mov    $0x7,%ecx
0x0000000000401160 <+108>:   mov    %ecx,%edx
0x0000000000401162 <+110>:   sub    (%rax),%edx
0x0000000000401164 <+112>:   mov    %edx,(%rax)
0x0000000000401166 <+114>:   add    $0x4,%rax
0x000000000040116a <+118>:   cmp    %rsi,%rax
0x000000000040116d <+121>:   jne    0x401160 <phase_6+108>

這裡又有一個循環。前文已知，r14 就是 rsp，也就是棧指針。這裡遍歷每一個數 x，重新賦值，x = 7-x

0x000000000040116f <+123>:   mov    $0x0,%esi
0x0000000000401174 <+128>:   jmp    0x401197 <phase_6+163>
0x0000000000401176 <+130>:   mov    0x8(%rdx),%rdx
0x000000000040117a <+134>:   add    $0x1,%eax
0x000000000040117d <+137>:   cmp    %ecx,%eax
0x000000000040117f <+139>:   jne    0x401176 <phase_6+130>
0x0000000000401181 <+141>:   jmp    0x401188 <phase_6+148>
0x0000000000401183 <+143>:   mov    $0x6032d0,%edx
0x0000000000401188 <+148>:   mov    %rdx,0x20(%rsp,%rsi,2)
0x000000000040118d <+153>:   add    $0x4,%rsi
0x0000000000401191 <+157>:   cmp    $0x18,%rsi
0x0000000000401195 <+161>:   je     0x4011ab <phase_6+183>
0x0000000000401197 <+163>:   mov    (%rsp,%rsi,1),%ecx
0x000000000040119a <+166>:   cmp    $0x1,%ecx
0x000000000040119d <+169>:   jle    0x401183 <phase_6+143>
0x000000000040119f <+171>:   mov    $0x1,%eax
0x00000000004011a4 <+176>:   mov    $0x6032d0,%edx
0x00000000004011a9 <+181>:   jmp    0x401176 <phase_6+130>

如果元素 x 大於 1，把 eax 賦值為 1，edx 賦值為 0x6032d0，之後執行 x-1 次 mov 0x8(%rdx),%rdx 操作

這裡疑似是鍊表，出現了記憶體地址 0x6032d0，我們來看看：

(gdb) x/12xg 0x6032d0
0x6032d0 <node1>:       0x000000010000014c      0x00000000006032e0
0x6032e0 <node2>:       0x00000002000000a8      0x00000000006032f0
0x6032f0 <node3>:       0x000000030000039c      0x0000000000603300
0x603300 <node4>:       0x00000004000002b3      0x0000000000603310
0x603310 <node5>:       0x00000005000001dd      0x0000000000603320
0x603320 <node6>:       0x00000006000001bb      0x0000000000000000

這裡注意，在 64 位系統中，指針占用 8 位元組（即 64 位）。

顯然是鍊表，0x8(%rdx) 代表 next 指針

故上述操作得到一個數組，設輸入數組的第 i 個數為 x，數組中第 i 個數對應鍊表中第 x 個數的地址。

1
2
3

0x00000000004011ab <+183>:   mov    0x20(%rsp),%rbx
0x00000000004011b0 <+188>:   lea    0x28(%rsp),%rax
0x00000000004011b5 <+193>:   lea    0x50(%rsp),%rsi

這裡是一些初始化。rsi 是邊界指針，標記循環的終止。0x20 到 0x50 正好 6*8=48

0x00000000004011ba <+198>:   mov    %rbx,%rcx
0x00000000004011bd <+201>:   mov    (%rax),%rdx
0x00000000004011c0 <+204>:   mov    %rdx,0x8(%rcx)
0x00000000004011c4 <+208>:   add    $0x8,%rax
0x00000000004011c8 <+212>:   cmp    %rsi,%rax
0x00000000004011cb <+215>:   je     0x4011d2 <phase_6+222>
0x00000000004011cd <+217>:   mov    %rdx,%rcx
0x00000000004011d0 <+220>:   jmp    0x4011bd <phase_6+201>

這裡遍歷了我們剛才得到的鍊表地址數組。寫成 C 語言或許更好理解。

Node *current = node_ptrs[0]; // %rbx, %rcx 初始化
int i = 1; // 對應 %rax 指向 node_ptrs[1]

while (i < 6) {
    Node *next_node = node_ptrs[i]; // mov (%rax), %rdx
    current->next = next_node;      // mov %rdx, 0x8(%rcx)
    current = next_node;            // mov %rdx, %rcx
    i++;                            // add $0x8, %rax
}

這一個循環對於鍊表結構進行了修改。

1	0x00000000004011d2 <+222>: movq $0x0,0x8(%rdx)

這句話則把最後一個節點的 next 賦值為 NULL，確保鍊表結構

接下來又有一個循環：

0x00000000004011da <+230>:   mov    $0x5,%ebp
0x00000000004011df <+235>:   mov    0x8(%rbx),%rax
0x00000000004011e3 <+239>:   mov    (%rax),%eax
0x00000000004011e5 <+241>:   cmp    %eax,(%rbx)
0x00000000004011e7 <+243>:   jge    0x4011ee <phase_6+250>
0x00000000004011e9 <+245>:   callq  0x40143a <explode_bomb>
0x00000000004011ee <+250>:   mov    0x8(%rbx),%rbx
0x00000000004011f2 <+254>:   sub    $0x1,%ebp
0x00000000004011f5 <+257>:   jne    0x4011df <phase_6+235>

遍歷鍊表，確保鍊表倒序排列。

看到這裡，我們就可以得到答案了：

(gdb) x/12xg 0x6032d0
0x6032d0 <node1>:       0x000000010000014c      0x00000000006032e0
0x6032e0 <node2>:       0x00000002000000a8      0x00000000006032f0
0x6032f0 <node3>:       0x000000030000039c      0x0000000000603300
0x603300 <node4>:       0x00000004000002b3      0x0000000000603310
0x603310 <node5>:       0x00000005000001dd      0x0000000000603320
0x603320 <node6>:       0x00000006000001bb      0x0000000000000000

找到鍊表值的倒序索引即可，注意值是 int 類型，只取後四位。於是可以得到 3 4 5 6 1 2

但我們還要注意，輸入進行過 7-x 操作（見上文），所以我們調整答案 4 3 2 1 6 5

最後一個 Phase 有點複雜，巧妙融合了嵌套循環校驗、數組映射變換以及鍊表重組等多種技術。

隱藏關

/* Hmm...  Six phases must be more secure than one phase! */
input = read_line();             /* Get input                   */
phase_1(input);                  /* Run the phase               */
phase_defused();                 /* Drat!  They figured it out!
      * Let me know how they did it. */
printf("Phase 1 defused. How about the next one?\n");

/* The second phase is harder.  No one will ever figure out
 * how to defuse this... */
input = read_line();
phase_2(input);
phase_defused();
printf("That's number 2.  Keep going!\n");

/* I guess this is too easy so far.  Some more complex code will
 * confuse people. */
input = read_line();
phase_3(input);
phase_defused();
printf("Halfway there!\n");

/* Oh yeah?  Well, how good is your math?  Try on this saucy problem! */
input = read_line();
phase_4(input);
phase_defused();
printf("So you got that one.  Try this one.\n");

/* Round and 'round in memory we go, where we stop, the bomb blows! */
input = read_line();
phase_5(input);
phase_defused();
printf("Good work!  On to the next...\n");

/* This phase will never be used, since no one will get past the
 * earlier ones.  But just in case, make this one extra hard. */
input = read_line();
phase_6(input);
phase_defused();

bomb 代碼中，每一個 phase 後都運行 phase_defused。我們來看看：

Dump of assembler code for function phase_defused:
   0x00000000004015c4 <+0>:     sub    $0x78,%rsp
   0x00000000004015c8 <+4>:     mov    %fs:0x28,%rax
   0x00000000004015d1 <+13>:    mov    %rax,0x68(%rsp)
   0x00000000004015d6 <+18>:    xor    %eax,%eax
   0x00000000004015d8 <+20>:    cmpl   $0x6,0x202181(%rip)        # 0x603760 <num_input_strings>
   0x00000000004015df <+27>:    jne    0x40163f <phase_defused+123>
   0x00000000004015e1 <+29>:    lea    0x10(%rsp),%r8
   0x00000000004015e6 <+34>:    lea    0xc(%rsp),%rcx
   0x00000000004015eb <+39>:    lea    0x8(%rsp),%rdx
   0x00000000004015f0 <+44>:    mov    $0x402619,%esi
   0x00000000004015f5 <+49>:    mov    $0x603870,%edi
   0x00000000004015fa <+54>:    callq  0x400bf0 <__isoc99_sscanf@plt>
   0x00000000004015ff <+59>:    cmp    $0x3,%eax
   0x0000000000401602 <+62>:    jne    0x401635 <phase_defused+113>
   0x0000000000401604 <+64>:    mov    $0x402622,%esi
   0x0000000000401609 <+69>:    lea    0x10(%rsp),%rdi
   0x000000000040160e <+74>:    callq  0x401338 <strings_not_equal>
   0x0000000000401613 <+79>:    test   %eax,%eax
   0x0000000000401615 <+81>:    jne    0x401635 <phase_defused+113>
   0x0000000000401617 <+83>:    mov    $0x4024f8,%edi
   0x000000000040161c <+88>:    callq  0x400b10 <puts@plt>
   0x0000000000401621 <+93>:    mov    $0x402520,%edi
   0x0000000000401626 <+98>:    callq  0x400b10 <puts@plt>
   0x000000000040162b <+103>:   mov    $0x0,%eax
   0x0000000000401630 <+108>:   callq  0x401242 <secret_phase>
   0x0000000000401635 <+113>:   mov    $0x402558,%edi
   0x000000000040163a <+118>:   callq  0x400b10 <puts@plt>
   0x000000000040163f <+123>:   mov    0x68(%rsp),%rax
   0x0000000000401644 <+128>:   xor    %fs:0x28,%rax
   0x000000000040164d <+137>:   je     0x401654 <phase_defused+144>
   0x000000000040164f <+139>:   callq  0x400b30 <__stack_chk_fail@plt>
   0x0000000000401654 <+144>:   add    $0x78,%rsp
   0x0000000000401658 <+148>:   retq

1	0x00000000004015d8 <+20>: cmpl $0x6,0x202181(%rip) # 0x603760 <num_input_strings>

這裡要求六關全部通過之後才能進入 secret_phase

我們可以設置條件斷點：b phase_defused if num_input_strings == 6

注意到：

1	0x0000000000401630 <+108>: callq 0x401242 <secret_phase>

這裡有非常多的記憶體地址，其中：

(gdb) x/s 0x402619
0x402619:       "%d %d %s"
(gdb) x/s 0x603870
0x603870 <input_strings+240>:   "7 0"
(gdb) x/s 0x402622
0x402622:       "DrEvil"

判斷 Phase 4 輸入之後是否有一個字串 DrEvil，如果有，進入隱藏關！

再來看看隱藏關的代碼：

Dump of assembler code for function secret_phase:
   0x0000000000401242 <+0>:     push   %rbx
   0x0000000000401243 <+1>:     callq  0x40149e <read_line>
   0x0000000000401248 <+6>:     mov    $0xa,%edx
   0x000000000040124d <+11>:    mov    $0x0,%esi
   0x0000000000401252 <+16>:    mov    %rax,%rdi
   0x0000000000401255 <+19>:    callq  0x400bd0 <strtol@plt>
   0x000000000040125a <+24>:    mov    %rax,%rbx
   0x000000000040125d <+27>:    lea    -0x1(%rax),%eax
   0x0000000000401260 <+30>:    cmp    $0x3e8,%eax
   0x0000000000401265 <+35>:    jbe    0x40126c <secret_phase+42>
   0x0000000000401267 <+37>:    callq  0x40143a <explode_bomb>
   0x000000000040126c <+42>:    mov    %ebx,%esi
   0x000000000040126e <+44>:    mov    $0x6030f0,%edi
   0x0000000000401273 <+49>:    callq  0x401204 <fun7>
   0x0000000000401278 <+54>:    cmp    $0x2,%eax
   0x000000000040127b <+57>:    je     0x401282 <secret_phase+64>
   0x000000000040127d <+59>:    callq  0x40143a <explode_bomb>
   0x0000000000401282 <+64>:    mov    $0x402438,%edi
   0x0000000000401287 <+69>:    callq  0x400b10 <puts@plt>
   0x000000000040128c <+74>:    callq  0x4015c4 <phase_defused>
   0x0000000000401291 <+79>:    pop    %rbx
   0x0000000000401292 <+80>:    retq   
End of assembler dump.

看到 strtol，知道這裡讀入了一個整數

0x000000000040125a <+24>:    mov    %rax,%rbx
0x000000000040125d <+27>:    lea    -0x1(%rax),%eax
0x0000000000401260 <+30>:    cmp    $0x3e8,%eax
0x0000000000401265 <+35>:    jbe    0x40126c <secret_phase+42>
0x0000000000401267 <+37>:    callq  0x40143a <explode_bomb>

要求讀取的整數小於等於 1001。注意 jbe 是無符號數的跳轉檢查，所以這裡其實也隱性限制了下限。所以嚴格的輸入限制是 [1, 1001] 之間的整數。

1
2
3

0x000000000040126c <+42>:    mov    %ebx,%esi
0x000000000040126e <+44>:    mov    $0x6030f0,%edi
0x0000000000401273 <+49>:    callq  0x401204 <fun7>

傳參，進入 fun7

0x0000000000401278 <+54>:    cmp    $0x2,%eax
0x000000000040127b <+57>:    je     0x401282 <secret_phase+64>
0x000000000040127d <+59>:    callq  0x40143a <explode_bomb>
0x0000000000401282 <+64>:    mov    $0x402438,%edi

這裡要求 fun7 的返回值等於 2

接下來我們看看 fun7，手動分個段

Dump of assembler code for function fun7:
   0x0000000000401204 <+0>:     sub    $0x8,%rsp
   0x0000000000401208 <+4>:     test   %rdi,%rdi
   0x000000000040120b <+7>:     je     0x401238 <fun7+52>
   
   0x000000000040120d <+9>:     mov    (%rdi),%edx
   0x000000000040120f <+11>:    cmp    %esi,%edx
   0x0000000000401211 <+13>:    jle    0x401220 <fun7+28>
   
   0x0000000000401213 <+15>:    mov    0x8(%rdi),%rdi
   0x0000000000401217 <+19>:    callq  0x401204 <fun7>
   0x000000000040121c <+24>:    add    %eax,%eax
   0x000000000040121e <+26>:    jmp    0x40123d <fun7+57>
   
   0x0000000000401220 <+28>:    mov    $0x0,%eax
   0x0000000000401225 <+33>:    cmp    %esi,%edx
   0x0000000000401227 <+35>:    je     0x40123d <fun7+57>
   0x0000000000401229 <+37>:    mov    0x10(%rdi),%rdi
   0x000000000040122d <+41>:    callq  0x401204 <fun7>
   0x0000000000401232 <+46>:    lea    0x1(%rax,%rax,1),%eax
   0x0000000000401236 <+50>:    jmp    0x40123d <fun7+57>
   
   0x0000000000401238 <+52>:    mov    $0xffffffff,%eax
   
   0x000000000040123d <+57>:    add    $0x8,%rsp
   0x0000000000401241 <+61>:    retq   
End of assembler dump.

遍歷當前 rdi 之後的兩個指針，遞迴，有點像二叉樹。我們來看看初始參數：

(gdb) x/60xg 0x6030f0
0x6030f0 <n1>:  0x0000000000000024      0x0000000000603110
0x603100 <n1+16>:       0x0000000000603130      0x0000000000000000
0x603110 <n21>: 0x0000000000000008      0x0000000000603190
0x603120 <n21+16>:      0x0000000000603150      0x0000000000000000
0x603130 <n22>: 0x0000000000000032      0x0000000000603170
0x603140 <n22+16>:      0x00000000006031b0      0x0000000000000000
0x603150 <n32>: 0x0000000000000016      0x0000000000603270
0x603160 <n32+16>:      0x0000000000603230      0x0000000000000000
0x603170 <n33>: 0x000000000000002d      0x00000000006031d0
0x603180 <n33+16>:      0x0000000000603290      0x0000000000000000
0x603190 <n31>: 0x0000000000000006      0x00000000006031f0
0x6031a0 <n31+16>:      0x0000000000603250      0x0000000000000000
0x6031b0 <n34>: 0x000000000000006b      0x0000000000603210
0x6031c0 <n34+16>:      0x00000000006032b0      0x0000000000000000
0x6031d0 <n45>: 0x0000000000000028      0x0000000000000000
0x6031e0 <n45+16>:      0x0000000000000000      0x0000000000000000
0x6031f0 <n41>: 0x0000000000000001      0x0000000000000000
0x603200 <n41+16>:      0x0000000000000000      0x0000000000000000
0x603210 <n47>: 0x0000000000000063      0x0000000000000000
0x603220 <n47+16>:      0x0000000000000000      0x0000000000000000
0x603230 <n44>: 0x0000000000000023      0x0000000000000000
0x603240 <n44+16>:      0x0000000000000000      0x0000000000000000
0x603250 <n42>: 0x0000000000000007      0x0000000000000000
0x603260 <n42+16>:      0x0000000000000000      0x0000000000000000
0x603270 <n43>: 0x0000000000000014      0x0000000000000000
0x603280 <n43+16>:      0x0000000000000000      0x0000000000000000
0x603290 <n46>: 0x000000000000002f      0x0000000000000000
0x6032a0 <n46+16>:      0x0000000000000000      0x0000000000000000
0x6032b0 <n48>: 0x00000000000003e9      0x0000000000000000
0x6032c0 <n48+16>:      0x0000000000000000      0x0000000000000000

確實是一顆二叉樹！（這裡的 60 是我試出來的）

fun7 傳入的參數為 rdi 和 esi

0x0000000000401208 <+4>:     test   %rdi,%rdi
0x000000000040120b <+7>:     je     0x401238 <fun7+52>
...
0x0000000000401238 <+52>:    mov    $0xffffffff,%eax
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

如果遍歷到葉子結點，直接返回 0xffffffff。

1
2
3

0x000000000040120d <+9>:     mov    (%rdi),%edx
0x000000000040120f <+11>:    cmp    %esi,%edx
0x0000000000401211 <+13>:    jle    0x401220 <fun7+28>

查看當前節點的值，如果值大於 esi：

0x0000000000401213 <+15>:    mov    0x8(%rdi),%rdi
0x0000000000401217 <+19>:    callq  0x401204 <fun7>
0x000000000040121c <+24>:    add    %eax,%eax
0x000000000040121e <+26>:    jmp    0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

訪問左子節點，返回值乘以二

如果當前節點的值和 rsi 相等：

0x0000000000401220 <+28>:    mov    $0x0,%eax
0x0000000000401225 <+33>:    cmp    %esi,%edx
0x0000000000401227 <+35>:    je     0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

直接返回

否則，訪問右子節點：

0x0000000000401229 <+37>:    mov    0x10(%rdi),%rdi
0x000000000040122d <+41>:    callq  0x401204 <fun7>
0x0000000000401232 <+46>:    lea    0x1(%rax,%rax,1),%eax
0x0000000000401236 <+50>:    jmp    0x40123d <fun7+57>
...
0x000000000040123d <+57>:    add    $0x8,%rsp
0x0000000000401241 <+61>:    retq

返回值乘以二再加一

我們可以用 C 語言翻譯上述代碼：

long fun7(struct Node *node, int target_val) {
    // 1. 如果節點為空
    if (node == NULL) {
        return -1; // 對應匯編中的 mov $0xffffffff, %eax
    }

    int current_val = node->value; // mov (%rdi), %edx

    // 2. 如果當前節點值 > 目標值 (target_val < current_val)
    // 匯編邏輯：cmp %esi, %edx -> jle (跳過) -> 否則執行這裡
    if (current_val > target_val) {
        // 遞迴調用左子節點 (偏移量 0x8)
        // 對應 callq fun7, 然後 add %eax, %eax
        return 2 * fun7(node->left, target_val);
    }
    
    // 3. 如果當前節點值 == 目標值
    // 匯編邏輯：cmp %esi, %edx -> je (跳轉到返回0)
    if (current_val == target_val) {
        return 0; // 找到目標，返回 0
    }

    // 4. 如果當前節點值 < 目標值
    // 匯編邏輯：此時只剩下這種情況
    // 遞迴調用右子節點 (偏移量 0x10)
    // 對應 callq fun7, 然後 lea 0x1(%rax,%rax,1) -> 2*rax + 1
    return 2 * fun7(node->right, target_val) + 1;
}

我們再來看看二叉樹的結構，根據：

(gdb) x/60xg 0x6030f0
0x6030f0 <n1>:  0x0000000000000024      0x0000000000603110
0x603100 <n1+16>:       0x0000000000603130      0x0000000000000000
0x603110 <n21>: 0x0000000000000008      0x0000000000603190
0x603120 <n21+16>:      0x0000000000603150      0x0000000000000000
0x603130 <n22>: 0x0000000000000032      0x0000000000603170
0x603140 <n22+16>:      0x00000000006031b0      0x0000000000000000
0x603150 <n32>: 0x0000000000000016      0x0000000000603270
0x603160 <n32+16>:      0x0000000000603230      0x0000000000000000
0x603170 <n33>: 0x000000000000002d      0x00000000006031d0
0x603180 <n33+16>:      0x0000000000603290      0x0000000000000000
0x603190 <n31>: 0x0000000000000006      0x00000000006031f0
0x6031a0 <n31+16>:      0x0000000000603250      0x0000000000000000
0x6031b0 <n34>: 0x000000000000006b      0x0000000000603210
0x6031c0 <n34+16>:      0x00000000006032b0      0x0000000000000000
0x6031d0 <n45>: 0x0000000000000028      0x0000000000000000
0x6031e0 <n45+16>:      0x0000000000000000      0x0000000000000000
0x6031f0 <n41>: 0x0000000000000001      0x0000000000000000
0x603200 <n41+16>:      0x0000000000000000      0x0000000000000000
0x603210 <n47>: 0x0000000000000063      0x0000000000000000
0x603220 <n47+16>:      0x0000000000000000      0x0000000000000000
0x603230 <n44>: 0x0000000000000023      0x0000000000000000
0x603240 <n44+16>:      0x0000000000000000      0x0000000000000000
0x603250 <n42>: 0x0000000000000007      0x0000000000000000
0x603260 <n42+16>:      0x0000000000000000      0x0000000000000000
0x603270 <n43>: 0x0000000000000014      0x0000000000000000
0x603280 <n43+16>:      0x0000000000000000      0x0000000000000000
0x603290 <n46>: 0x000000000000002f      0x0000000000000000
0x6032a0 <n46+16>:      0x0000000000000000      0x0000000000000000
0x6032b0 <n48>: 0x00000000000003e9      0x0000000000000000
0x6032c0 <n48+16>:      0x0000000000000000      0x0000000000000000

graph TD
    N1((36)) --> N21((8))
    N1 --> N22((50))

    N21 --> N31((6))
    N21 --> N32((22))

    N22 --> N33((45))
    N22 --> N34((107))

    N31 --> N41((1))
    N31 --> N42((7))

    N32 --> N43((20))
    N32 --> N44((35))

    N33 --> N45((40))
    N33 --> N46((47))

    N34 --> N47((99))
    N34 --> N48((1001))

要求最終輸出為 2，2 = 1*2

先向左，再向右，然後找到了答案。

於是，我們得到答案 22

總結

於是，最終答案是：

Border relations with Canada have never been better.
1 2 4 8 16 32
0 207
7 0 DrEvil
ionefg
4 3 2 1 6 5
22

最後讓 AI 生成一段小結

CSAPP Bomb Lab 是一個非常經典的實驗，它不僅是一次對匯編語言 (x86-64) 的深度練習，更是一場邏輯推理的解謎遊戲。

回顧整個拆彈過程，我們經歷了從簡單到複雜的演進：

基礎控制流：從 Phase 1 的字串比較，到 Phase 2 的循環與棧上數組操作。
高級控制流：Phase 3 展示了 switch 語句如何通過跳轉表實現，Phase 4 則通過遞迴讓我們深入理解了棧幀的生長與銷毀以及二分尋找算法。
數據操縱：Phase 5 的位運算與字元數組索引映射，考察了對指針和記憶體定址的敏感度。
數據結構：Phase 6 的鍊表重排以及隱藏關卡的二叉搜索樹（BST），讓我們看到了高級數據結構在匯編層面的具體形態（指針即地址）。

CSAPP Data Lab 解析

Louis Aeilot's Blog

Louis C Deng

2025年12月2日 02:45

前一段時間做完了 CSAPP 的第一個 Lab，寫一篇總結。（其實這篇文章拖了很久）

函數名 (Name)	描述 (Description)	難度 (Rating)	最大操作數 (Max ops)
`bitXor(x, y)`	只使用 `&` 和 `~` 實現 `x ^ y` (異或)。	1	14
`tmin()`	返回最小的補碼整數 (Two’s complement integer)。	1	4
`isTmax(x)`	僅當 `x` 是最大的補碼整數時返回 True。	1	10
`allOddBits(x)`	僅當 `x` 的所有奇數位都為 1 時返回 True。	2	12
`negate(x)`	返回 `-x`，不使用 `-` 運算符。	2	5
`isAsciiDigit(x)`	如果 `0x30 <= x <= 0x39` (即 ASCII 數字字元) 則返回 True。	3	15
`conditional(x, y, z)`	等同於 `x ? y : z` (三元運算符)。	3	16
`isLessOrEqual(x, y)`	如果 `x <= y` 返回 True，否則返回 False。	3	24
`logicalNeg(x)`	計算 `!x` (邏輯非)，不使用 `!` 運算符。	4	12
`howManyBits(x)`	用補碼表示 `x` 所需的最小位數。	4	90
`floatScale2(uf)`	對於浮點參數 `f`，返回 `2 * f` 的位級等價表示。	4	30
`floatFloat2Int(uf)`	對於浮點參數 `f`，返回 `(int)f` 的位級等價表示。	4	30
`floatPower2(x)`	對於整數 `x`，返回 `2.0^x` 的位級等價表示。	4	30

bitXor

該題要求僅使用 ~（取反）和 &（與），實現 ^（異或）

1
2
3

int bitXor(int x, int y) {
  return ~((~(x&~y))&(~((~x)&y)));
}

使用 De Morgan 律，容易得到 ~(x&y) = (～x)|(~y)，於是我們可以使用 ~ 和 & 實現 | 操作。

異或操作，可以表示為 x^y = (~x & y) | (x & ~y)，結合 De Morgan 律，我們很容易得到最終的答案 x^y = ~((~(x&~y))&(~((~x)&y)))。

tmin

這道題很簡單，返回最小的補碼整數。回顧補碼的定義，最高位取負權，故令符號位為 1 即可。

1
2
3

int tmin(void) {
  return 1<<31;
}

isTmax

判斷 x 是否是最大的補碼。若是，返回 1；否則，返回 0。

int isTmax(int x) {
  int map = x + 1;
  int res = ~(map + x);
  return !res & (!!map);
}

最大的補碼有一個性質，加一之後變成最小的補碼：0x7fffffff -> 0x80000000

而最大的補碼加上最小的補碼等於 0xffffffff 即 -1，取反之後為 0 （這裡推出 0 是為了得到返回值中的 0/1）

因此，我們可以通過 ~(x+x+1) 得到答案。

但是 -1+0 也等於 -1，即如果 x=0 時，~(x+x+1) 同樣等於 1，是一個 Corner Case。

因此，我們還需要對結果與 !!(x+1)，才能得到最終的答案。（如果 x=-1，!!(x+1)=0；其餘情況均為 1）

於是我們得到最終的答案 !(~(x+x+1)) & (!!(x+1))

allOddBits

僅當 x 的所有奇數位都為 1 時返回 1

int allOddBits(int x) {
  int a = 0xAA;
  int b = (a<<8) + (a<<16) + (a <<24) + a;
  int bm = ~b+1;
  return !((x&b)+bm);
}

我們做一個奇數位掩碼即可 0xAA = 0b10101010，通過左移，可以得到 a + (a<<8) + (a<<16) + (a <<24) = 0xAAAAAAAA = b

於是 x&b 取出所有奇數位，但是我們需要得到 0/1 的答案

bm = ~b + 1，得到 -b（取反加一是補碼相反數），b+(-b) = 0，再取邏輯非，就可以得到答案

negate

這道題要求不使用 - 運算符計算 -x

1
2
3

int negate(int x) {
  return ~x+1;
}

非常簡單，根據補碼的定義得到。取反加一就是相反數。

isAsciiDigit

如果 0x30 <= x <= 0x39 (即 ASCII 數字字元) 則返回 True。

我們在這道題中不能使用 <= 這類運算符，因此，我們想到，進行減法之後取符號位的操作。

int isAsciiDigit(int x) {
    int ge_30 = !((x + (~0x30 + 1)) >> 31);     
    int le_39 = !((0x39 + (~x + 1)) >> 31); 
    return ge_30 & le_39; 
}

conditional

使用位運算實現三目運算符（x ? y : z）

int conditional(int x, int y, int z) {
  int xb = !(!x);
  int M = ~xb + 1;
  return (M&y) | (~M&z);
}

我們可以使用邏輯掩碼

先使用 !(!x) 將 x 轉換成 0/1，記為 xb

~xb + 1，則有 0 -> 0；1 -> -1 = 0xffffffff（掩碼，取所有位）

因此，(M&y) | (~M&z) 就是最終的答案。

如果 x = 1，M = 0xffffffff，~M = 0，取 y；否則，取 z

isLessOrEqual

1
2
3

int isLessOrEqual(int x, int y) {
  return !((y+(~x+1))>>31);
}

簡單判斷符號位即可。但是實現的是 <=，對 > 取非即可

logicalNeg

計算 !x (邏輯非)，不使用 ! 運算符

1
2
3

int logicalNeg(int x) {
  return ((x>>31) | ((~x+1)>>31))+1;
}

howManyBits

計算用補碼表示 x 所需的最小位數

int howManyBits(int x) {
  int fg = x>>31;
  x = ((~fg) & x) | (fg &(~x));
  int h16 = !!(x >> 16) << 4;
  x >>= h16;
  int h8 = !!(x>>8) << 3;
  x >>= h8;
  int h4 = !!(x>>4) << 2;
  x >>= h4;
  int h2 = !!(x>>2) << 1;
  x>>=h2;
  int h1 = !!(x>>1);
  x>>=h1;
  int h0 = x;
  return h0 + h1 + h2 + h4 + h8 + h16 + 1;
}

這道題，先選取符號位，然後計算之後的最高位即可。

為了方便計算，我們把負數補碼表示為正數，這樣就只用計算最高位的 1 在哪裡就行了

((~fg) & x) | (fg & (~x)) 是一個條件取反操作，相當於 x = (x < 0) ? ~x : x

若 fg 為 0（正數）：表達式變為 (All_1 & x) | (0 & ~x) -> x。保持不變。
若 fg 為 -1（負數）：表達式變為 (0 & x) | (All_1 & ~x) -> ~x。按位取反。

這裡提醒各位，此處補碼右移是算術右移，所以負數右移得到一個所有位都為 1 的數，也就是 -1。

接下來進行位的二份尋找：

這裡的邏輯是**“分治法”**。我們有 32 位要檢查，像二分尋找一樣：

檢查高 16 位：
- x >> 16：如果不為 0，說明最高位的 1 在高 16 位中（即位 16-31）。
- !!(...)：將結果轉化為 0 或 1。如果高 16 位有數，結果為 1，否則為 0。
- 1<< 4：如果高 16 位有數，說明我們至少需要 16 位，即 1 << 4 = 16。
- h16：這就是我們找到的基數（0 或 16）。
- x >>= h16：關鍵點。如果我們確定高 16 位有數，我們將 x 右移 16 位，丟棄低 16 位，接下來的檢查只關注剛才的高 16 位。如果高 16 位全是 0，x 保持不變，我們繼續檢查原本的低 16 位。
檢查高 8 位（在剩下的 16 位範圍內）：

邏輯同上。如果剩下的這部分的高 8 位有數，則 h8 = 8，並將 x 右移 8 位。

依此類推：

h4：檢查剩下的 4 位中的高 2 位… (這裡代碼邏輯是一致的，檢查高4位)。
h2：檢查剩下的 4 位。
h1：檢查剩下的 2 位。
h0 = x：檢查最後剩下的 1 位。

最後，我們計算 h16+…+h0 的總和即可。這裡要注意，補碼有一個符號位，所以結果還要再 +1。

得到答案：h0 + h1 + h2 + h4 + h8 + h16 + 1

floatScale2

對於浮點參數 f，返回 2 * f 的位級等價表示

IEEE 754

我們先來回顧一下浮點數的位級表示，即 IEEE 754，這裡以 float 為例

浮點數位中有三段：

Sign (s): 1 bit [31] -> 符號位
Exponent (exp): 8 bits [30:23] -> 階碼
Fraction (frac): 23 bits [22:0] -> 尾數

1
2
3

int sign = (uf >> 31) & 0x1;
int exp  = (uf >> 23) & 0xFF;
int frac = uf & 0x7FFFFF;

對於一個浮點數的解釋，有三種情況：

Case A: 非規格化 (Denormalized)

特徵：exp == 0
真實值： $V = (-1)^s \times M \times 2^{1-Bias}$
- 這裡 $M = 0.frac$ (沒有隱含的 1)

Case B: 規格化 (Normalized)

特徵：exp != 0 且 exp != 255
真實值： $V = (-1)^s \times M \times 2^{exp-Bias}$
- 這裡 $M = 1.frac$ (有一個隱含的 1)
- Bias = 127

Case C: 特殊值 (Special Values)

特徵：exp == 255 (全 1)
類型：
- frac == 0：Infinity (無窮大)
- frac != 0：NaN (Not a Number)

接下來我們看這道題，這道題只需要注意分類討論就可以。

unsigned floatScale2(unsigned uf) {
  unsigned s = uf >> 31;
  unsigned exp = (uf >> 23) & 0xFF;
  unsigned ff = uf & 0x7fffff;

  // 特殊值 (Special Values)
  // 如果階碼全為1 (exp == 255)，表示 NaN (非數) 或 Infinity (無窮大)
  // 規則：NaN * 2 = NaN, Inf * 2 = Inf，直接返回原值
  if (exp == 0xFF) {
    return uf;
  }

  // 非規格化數 (Denormalized)
  // 如果階碼為0，表示非規格化數，數值非常接近 0
  if (exp == 0) {
    // 非規格化數乘以2：直接將尾數左移一位
    ff <<= 1;
    
    // 檢查尾數是否溢出 (從非規格化過渡到規格化)
    // 如果左移後 ff 超過了 23 位能表示的最大值 (即 0x7fffff)
    // 說明最高位變成了 1，這個 1 應該“進位”給階碼
    if (ff > 0x7fffff) {
      ff -= 0x800000; // 去掉溢出的那一位 (因為它現在變成了隱含的 1)
      exp += 1;       // 階碼從 0 變為 1 (成為規格化數)
    }
  } 
  // 規格化數 (Normalized)
  else {
    // 規格化數乘以2：直接給階碼加 1
    exp += 1;
    
    // 檢查階碼上溢 (Overflow)
    // 如果加 1 後階碼變成了 255，說明數值太大，變成了無窮大 (Infinity)
    if (exp == 0xFF) {
      ff = 0; // 無窮大的定義是 exp=255 且 frac=0
    }
  }

  return (s << 31) | (exp << 23) | (ff);
}```

## floatFloat2Int

對於浮點參數 `f`，返回 `(int)f` 的位級等價表示

```c
int floatFloat2Int(unsigned uf) {
  unsigned s = uf >> 31;
  unsigned exp = (uf >> 23) & 0xFF;
  unsigned ff = uf & 0x7fffff;

  // 處理特殊情況：NaN (非數) 或 Infinity (無窮大)
  // 當階碼全為 1 時。根據題目要求，越界通常返回 TMin (0x80000000)
  if (exp == 0xFF) {
    return 0x80000000u;
  }

  // 處理非規格化數 (Denormalized)
  // 當階碼全為 0 時，數值極小 (0.xxxx * 2^-126)，轉換為 int 必定為 0
  if (exp == 0) {
    return 0;
  }

  // 計算真實指數 E
  // Bias (偏置值) 是 127。 E = exp - Bias
  int E = (int)exp - 127;

  // 處理小於 1 的數
  // 如果真實指數小於 0 (例如 2^-1, 2^-2)，數值為 0.xxxx
  // 強轉 int 會向零截斷，結果為 0
  if (E < 0) return 0;

  // 還原隱含的 1 (Restore Implicit 1)
  // 規格化數的真實尾數形式是 1.fffff...
  // 我們手動把第 23 位置 1，代表那個隱含的整數部分 "1"
  ff = ff | (1 << 23);

  // 處理溢出 (Overflow)
  // 如果指數 E >= 31，說明數值 magnitude >= 2^31
  // int 的最大值是 2^31 - 1。
  // 無論是正數溢出，還是負數正好是 TMin (-2^31) 或更小，
  // 按照題目規則，都返回 TMin (0x80000000)
  if (E >= 31) {
    return 0x80000000u;
  }

  // 位移對齊 (Bit Shifting)
  // 現在的 ff 看起來是這樣： [1]. [xxxxxx]... (1 在第 23 位)
  // 這相當於 1.xxxxx * 2^23 (如果在整數暫存器看)
  // 我們實際需要的是 1.xxxxx * 2^E
  if (E < 23) {
    // 情況 A: 指數較小 (例如 E = 20)
    // 我們需要將小數點右移 20 位。
    // 但當前 ff 是左對齊在第 23 位的，所以需要**右移**丟棄多餘的小數位。
    // 移位量 = 23 - 20 = 3
    ff = ff >> (23 - E);
  } else {
    // 情況 B: 指數較大 (例如 E = 30)
    // 我們需要將小數點右移 30 位。
    // 當前只在第 23 位，不夠，需要**左移**補零。
    // 移位量 = 30 - 23 = 7
    ff = ff << (E - 23);
  }

  // 處理符號
  // 如果原數是負數，進行取反加一 (即 -ff)
  if (s) return -ff;
  
  // 原數是正數，直接返回
  return ff;
}

floatPower2

對於整數 x，返回 2.0^x 的位級等價表示。對於這道題，計算出幾個臨界點即可。

unsigned floatPower2(int x) {
    // 1. 處理下溢 (Underflow)
    // 最小的非規格化數是 2^(-149)。
    // 計算邏輯：Min Denorm = 2^(1-Bias) * 2^(-23) = 2^(-126) * 2^(-23) = 2^(-149)
    // 如果 x 比這個還小，說明數值太小無法表示，直接返回 0.0
    if (x < -149)
        return 0;

    // 2. 處理非規格化數 (Denormalized)
    // 範圍：[-149, -127]
    // 非規格化數的階碼 (exp) 全為 0，值公式為：M * 2^(-126)
    // 我們需要構建 2^x。
    // 方程：2^x = (1 << shift) * 2^(-23) * 2^(-126)  <-- (1<<shift)*2^-23 是尾數部分
    //      2^x = 2^shift * 2^(-149)
    //      x = shift - 149
    //      shift = x + 149
    // 所以，我們將 1 左移 (x + 149) 位放在尾數部分 (Fraction)
    else if (x < -126)
        return 1 << (x + 149);

    // 3. 處理規格化數 (Normalized)
    // 範圍：[-126, 127]
    // 規格化數的值公式為：1.0 * 2^(exp - Bias)
    // 我們需要 2^x，尾數部分保持為 0 (即 1.0)，只需要設置階碼。
    // 方程：x = exp - Bias
    //      exp = x + Bias
    //      exp = x + 127
    // 將計算出的 exp 移到階碼的位置 (第 23-30 位)
    else if (x <= 127)
        return (x + 127) << 23;

    // 4. 處理上溢 (Overflow)
    // 範圍：x > 127
    // 單精度浮點數最大能表示的 2 的冪是 2^127。
    // 超過這個值，返回正無窮大 (+Infinity)。
    // +Inf 的表示：符號位 0，階碼全 1 (0xFF)，尾數全 0。
    else
        return (0xFF) << 23;
}

小結

我的代碼存放在 aeilot/CSAPP-Labs。

聊一聊位掩碼（Bit Mask）

Louis Aeilot's Blog

Louis C Deng

2025年10月21日 07:45

掩碼 (Mask) 是一種位運算技巧，它使用一個特定的值（掩碼）與目標值進行 $\mathtt{\&}$ (與)、 $\mathtt{|}$ (或)、 $\mathtt{\wedge}$ (異或) 運算，以精確地、批次地操作、提取或檢查目標值中的一個或多個位。

基本概念

提取位

  10101100  (目標值)
& 00000100  (掩碼)
------------
  00000100  (結果)

結果 00000100 表示第 3 位是 1。

這一技巧可以用來提取多位，比如想要提取某個數的低 4 位，可以使用掩碼 00001111。

清除位

  10101100  (目標值)
& 11111011  (掩碼)
------------
  10101000  (結果)

結果 10101000 表示第 3 位被清除為 0。

清除就是不提取某些位 lol

反轉位

  10101100  (目標值)
^ 00000100  (掩碼)
------------
  10101000  (結果)

結果 10101000 表示第 3 位被反轉。

設定位

  10101000  (目標值)
| 00000100  (掩碼)
------------
  10101100  (結果)

結果 10101100 表示第 3 位被設定為 1。

構造掩碼

構造合適的掩碼是使用技巧的關鍵。

單個位: $\mathtt{1 \ll n}$
1. $\mathtt{1 \ll 5}$ ( $\mathtt{00100000}$ ) 是第 5 位的掩碼。
連續低位: $\mathtt{(1 \ll n) - 1}$
1. $\mathtt{(1 \ll 8) - 1}$ ( $\mathtt{0xFF}$ ) 是低 8 位的掩碼。
全 1 掩碼: $\mathtt{\sim 0}$ (即 $-1$ )
1. $\mathtt{0xFFFFFFFF}$ (假設 32 位)
全 0 掩碼: $\mathtt{0}$

條件掩碼

在 CSAPP Data Lab 中，我們有一道題目要求用位運算實現三目運算子 x ? y : z。我們可以使用條件掩碼來實現這一點。

int conditional(int x, int y, int z) {
  int mask = !!x;          // mask 為 1 如果 x 非零，否則為 0
  mask = ~mask + 1;       // mask 為 0xFFFFFFFF 如果 x 非零，否則為 0x0
  return (y & mask) | (z & ~mask);
}

這段程式碼的邏輯是：

計算 mask = !!x，如果 x 非零，mask 為 1，否則為 0。
透過 mask = ~mask + 1，將 mask 轉換為全 1 (0xFFFFFFFF) 或全 0 (0x0)。
返回 (y & mask) | (z & ~mask)，如果 x 非零，結果為 y，否則為 z。

總結

掩碼是一種強大的位運算技巧，可以用來精確地操作和檢查資料中的特定位。

整數溢位與未定義行為

Louis Aeilot's Blog

Louis C Deng

2025年10月14日 06:45

在做 CSAPP Data Lab 的時候，關於整數溢位，遇到一些問題。

題幹

/*
 * isTmax - returns 1 if x is the maximum, two's complement number, 
 *     and 0 otherwise 
 *   Legal ops: ! ~ & ^ | +
 *   Max ops: 10
 *   Rating: 1
 */

int isTmax(int x) {
  return 2;
}

題目要求，僅僅使用運算子 ! ~ & ^ | + 來判斷一個數是否是最大的二的補碼（int 範圍內），即 0x7fffffff。如果是，輸出 1；否則，輸出 0。

思路

由於我們不能使用移位操作（很多人會直接 1<<31 - 1），可以考慮整數溢位的特殊性質。

具體地，我們有 0x7fffffff + 1 = 0x80000000，符號改變。

而 0x80000000 + 0x80000000 = 0

我們可以得到 x = 0x7fffffff 滿足 x + 1 + x + 1 = 0

而對於其他數字，假設 y = x + k 其中 k 非零，則有 y + 1 + y + 1 = 2*k

此時，我們發現，對於 y=-1 也有 y + 1 + y + 1 = 0，需要排除掉

其他情況下，非零數轉換為 bool 型別自動變為 1

我們不難寫出以下程式碼：

int isTmax(int x) {
  int p1 = x+1;
  int p2 = p1 + p1;
  return !(p2) & !!(p1);
}

發現問題

這段程式碼在我本地（macOS，Apple clang version 17.0.0 (clang-1700.3.19.1), Target: arm64-apple-darwin25.0.0) 上執行，使用命令 clang main.c 是沒有任何問題的。

但是，檢查到 CSAPP 提供的 Makefile，有

#
# Makefile that builds btest and other helper programs for the CS:APP data lab
# 
CC = gcc
CFLAGS = -O -Wall
LIBS = -lm

all: btest fshow ishow

btest: btest.c bits.c decl.c tests.c btest.h bits.h
$(CC) $(CFLAGS) $(LIBS) -o btest bits.c btest.c decl.c tests.c

fshow: fshow.c
$(CC) $(CFLAGS) -o fshow fshow.c

ishow: ishow.c
$(CC) $(CFLAGS) -o ishow ishow.c

# Forces a recompile. Used by the driver program. 
btestexplicit:
$(CC) $(CFLAGS) $(LIBS) -o btest bits.c btest.c decl.c tests.c 

clean:
rm -f *.o btest fshow ishow *~

注意到，編譯器使用了 -O flag，即 O1 最佳化。

未定義行為

未定義行為（UB），根據 cppreference 的定義：

1	undefined behavior - There are no restrictions on the behavior of the program.

有符號整數溢位是一種常見的未定義行為。

Because correct C++ programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled.

也就是說，編譯器最佳化會對未定義行為產生意料之外的結果

cppreference 給出了一個整數溢位的例子：

int foo(int x)
{
    return x + 1 > x; // either true or UB due to signed overflow
}

編譯之後卻變成了

1
2
3

foo(int):
        mov     eax, 1
        ret

意思是，不管怎麼樣都輸出 1

觀察出錯程式碼

我們透過 gcc -S 輸出編譯後的彙編程式碼

_Z6isTmaxi:
.LFB2:
.cfi_startproc
endbr64
movl$0, %eax
ret
.cfi_endproc

我們看到，編譯器直接把這個函式返回值改成了 0，不管輸入什麼，與我們的錯誤原因推斷是相同的。

修改

我們可以嘗試構造一個更復雜的、不易被簡單規則匹配的表示式，躲過 O1 級別的最佳化。

核心思路不變，仍然是利用 Tmax + 1 = Tmin 這個特性。我們來觀察一下 Tmax 和 Tmin 在二進位制下的關係：

Tmax = 0x7fffffff = 0111...1111
Tmin = 0x80000000 = 1000...0000

一個非常有趣的性質是 Tmax + Tmin = -1 (0xffffffff)。

  0111 1111 ... 1111  (Tmax)
+ 1000 0000 ... 0000  (Tmin)
-------------------------
  1111 1111 ... 1111  (-1)

基於這個觀察，我們可以設計一個新的檢查方案：如果一個數 x 是 Tmax，那麼 x + (x+1) 的結果就應該是 -1。取反後 ~(-1) 則為 0。

我們可以寫出如下的修改版程式碼：

int isTmax(int x) {
  int map = x + 1;
  int res = ~(map + x);
  return !res & (!!map);
}

這段程式碼的邏輯是：

計算 map = x + 1。對於 x = Tmax，這裡同樣會發生有符號溢位，map 變為 Tmin。這依然是未定義行為（UB）。
計算 res = ~(map + x)。如果 x 是 Tmax，這一步就是 ~(Tmin + Tmax)，結果為 ~(-1)，即 0。
return !res & (!!map)。!res 為 !0，即 1。!!map 部分和之前的版本一樣，是為了排除 x = -1 的情況（此時 map 為 0， !!map 為 0，最終返回 0）。

這段程式碼在 -O 最佳化下可能會得到正確的結果。

為什麼這個“可能”有效？

我們必須清醒地認識到，新版本的程式碼本質上沒有解決未定義行為的問題，它只是“僥倖”地繞過了當前編譯器版本的特定最佳化策略。

程式碼模式的複雜性：p1 + p1 ((x+1)+(x+1)) 是一個非常簡單直白的模式，最佳化器很容易建立一個“如果 p1 非零，則 p1+p1 結果也非零”的最佳化規則。而 ~((x+1)+x) 混合了加法和位運算，模式更復雜，可能沒有觸發編譯器中已有的、基於UB的最佳化捷徑。
最佳化的機會主義：編譯器最佳化並不是要窮盡所有的數學可能，而是應用一系列已知的高效模式。我們的新程式碼恰好不在這些常見模式的“黑名單”上。

結論：如何正確面對未定義行為

透過 isTmax 這個小小的函式，我們可以一窺C語言中未定義行為的危險性以及現代編譯器最佳化的強大。作為開發者，我們應該得到以下啟示：

不要依賴未定義行為：永遠不要編寫依賴於UB的程式碼，即使它“在你的機器上看起來能跑”。程式碼的健壯性來源於對語言標準的嚴格遵守，而非僥倖。
相信編譯器，但要驗證：編譯器非常聰明，它會嚴格按照語言規範進行最佳化。當你發現最佳化後的程式碼行為不符合你的“直覺”時，首先應該懷疑自己的程式碼是否觸碰了UB的紅線。
善用工具：
- 始終開啟編譯器警告 (-Wall -Wextra) 並將警告視為錯誤 (-Werror)，這能幫你發現許多潛在問題。
- 使用執行時檢測工具，如GCC/Clang的 UndefinedBehaviorSanitizer (UBSan)。只需在編譯時加上 -fsanitize=undefined，它就能在程式執行時精確地捕獲有符號整數溢位等UB，是除錯這類問題的神器。

CSAPP DataLab 题解

Claude's Blog

Claude Ray

2019年10月2日 23:19

DataLab

近来开始读 CS:APP3e 第二章，但干看书做课后题太乏味，于是提前把 DataLab 拉出来练练。不一定是优解，趁热记录一下思路吧。

如果读者是那种还没做完 lab 就想借鉴答案的，还请收手，坚持独立完成吧，正如课程作者所说，Don't cheat, even the act of searching is checting.

bitXor

/* 
 * bitXor - x^y using only ~ and & 
 *   Example: bitXor(4, 5) = 1
 *   Legal ops: ~ &
 *   Max ops: 14
 *   Rating: 1
 */
int bitXor(int x, int y) {
  return ~(~(x & ~y) & ~(~x & y));
}

简单的公式可以写作 (x & y) | (~x & y) ，但题目要求只能用 ~ & 两种操作，换句话就是考察用 ~ & 来实现 | 操作，和逻辑与或非类似。

tmin

/* 
 * tmin - return minimum two's complement integer 
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 4
 *   Rating: 1
 */
int tmin(void) {
  return 1 << 31;
}

这个题目就是计算出 0x80000000 ，基本的移位操作即可，不用复杂化。

isTmax

/*
 * isTmax - returns 1 if x is the maximum, two's complement number,
 *     and 0 otherwise 
 *   Legal ops: ! ~ & ^ | +
 *   Max ops: 10
 *   Rating: 1
 */
int isTmax(int x) {
  return !(~(1 << 31) ^ x);
}

上面已经知道怎么获取 TMIN，TMAX 可以用 ~TMIN 表示，因此主要考察两个数是否相等 —— ^。

错误更正

感谢 @nerrons 兄指正

前面的解法忽略了操作符的限制，是不合题意的。故更换思路：由于 TMAX + 1 可得到 TMIN，若 x 为 TMAX，则 x + 1 + x 结果为 0。

但直接这样写无法通过检测程序，是因为 0xffffffff 同样满足 x + 1 + x 为 0 的特性，需要将该情况排除。

1
2
3

int isTmax(int x) {
  return !(~((x + 1) + x) | !(x + 1));
}

allOddBits

/* 
 * allOddBits - return 1 if all odd-numbered bits in word set to 1
 *   where bits are numbered from 0 (least significant) to 31 (most significant)
 *   Examples allOddBits(0xFFFFFFFD) = 0, allOddBits(0xAAAAAAAA) = 1
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 12
 *   Rating: 2
 */
int allOddBits(int x) {
  int odd = (0xAA << 24) + (0xAA << 16) + (0xAA << 8) + 0xAA;
  return !((x & odd) ^ odd);
}

先构造 0xAAAAAAAA，利用 & 操作将所有奇数位提出来，再和已构造的数判等。

negate

/* 
 * negate - return -x 
 *   Example: negate(1) = -1.
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 5
 *   Rating: 2
 */
int negate(int x) {
  return ~x + 1;
}

二进制基础扎实的话，可以秒出结果。

isAsciiDigit

/* 
 * isAsciiDigit - return 1 if 0x30 <= x <= 0x39 (ASCII codes for characters '0' to '9')
 *   Example: isAsciiDigit(0x35) = 1.
 *            isAsciiDigit(0x3a) = 0.
 *            isAsciiDigit(0x05) = 0.
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 15
 *   Rating: 3
 */
int isAsciiDigit(int x) {
  /* (x - 0x30 >= 0) && (0x39 - x) >=0 */
  int TMIN = 1 << 31;
  return !((x + ~0x30 + 1) & TMIN) & !((0x39 + ~x + 1) & TMIN);
}

主要思路可以用逻辑运算表示，(x - 0x30 >= 0) && (0x39 - x) >=0，这里新概念是如何判断数值是否小于 0。

conditional

/* 
 * conditional - same as x ? y : z 
 *   Example: conditional(2,4,5) = 4
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 16
 *   Rating: 3
 */
int conditional(int x, int y, int z) {
  int f = ~(!x) + 1;
  int of = ~f;
  return ((f ^ y) & of) | ((of ^ z) & f);
}

这里我用 ~(!x) + 1 构造了 x 的类布尔表示，如果 x 为真，表达式结果为 0，反之表达式结果为 ~0。

isLessOrEqual

/* 
 * isLessOrEqual - if x <= y  then return 1, else return 0 
 *   Example: isLessOrEqual(4,5) = 1.
 *   Legal ops: ! ~ & ^ | + << >>
 *   Max ops: 24
 *   Rating: 3
 */
int isLessOrEqual(int x, int y) {
  /* (y >=0 && x <0) || ((x * y >= 0) && (y + (-x) >= 0)) */
  int signX = (x >> 31) & 1;
  int signY = (y >> 31) & 1;
  int signXSubY = ((y + ~x + 1) >> 31) & 1;
  return (signX & ~signY) | (!(signX ^ signY) & !signXSubY);
}

核心是判断 y + (-x) >= 0。一开始我做题时被 0x80000000 边界条件烦到了，所以将其考虑进了判断条件。

具体做法是判断 Y 等于 TMIN 时返回 0，X 等于 TMIN 时返回 1。此外也考虑了若 x 为负 y 为正返回 1，x 为正 y 为负返回 0。

这样想得太复杂了，使用的操作有点多，而题目对 ops 限制是 24，担心过不了 dlc 的语法检查。所以又花更多时间想出更简单的方法。用逻辑操作可以写作 (y >=0 && x <0) || ((x * y >= 0) && (y + (-x) >= 0))。不过我后来在 linux 上运行了一下第一种方法，dlc 并没有报错。

logicalNeg

/* 
 * logicalNeg - implement the ! operator, using all of 
 *              the legal operators except !
 *   Examples: logicalNeg(3) = 0, logicalNeg(0) = 1
 *   Legal ops: ~ & ^ | + << >>
 *   Max ops: 12
 *   Rating: 4 
 */
int logicalNeg(int x) {
  int sign = (x >> 31) & 1;
  int TMAX = ~(1 << 31);
  return (sign ^ 1) & ((((x + TMAX) >> 31) & 1) ^ 1);
}

x 小于 0 时结果为 1，否则检查 x + TMAX 是否进位为负数。

howManyBits

/* howManyBits - return the minimum number of bits required to represent x in
 *             two's complement
 *  Examples: howManyBits(12) = 5
 *            howManyBits(298) = 10
 *            howManyBits(-5) = 4
 *            howManyBits(0)  = 1
 *            howManyBits(-1) = 1
 *            howManyBits(0x80000000) = 32
 *  Legal ops: ! ~ & ^ | + << >>
 *  Max ops: 90
 *  Rating: 4
 */
int howManyBits(int x) {
  int sign = (x >> 31) & 1;
  int f = ~(!sign) + 1;
  int of = ~f;
  /*
   * NOTing x to remove the effect of the sign bit.
   * x = x < 0 ? ~x : x
   */
  x = ((f ^ ~x) & of) | ((of ^ x) & f);
  /*
   * We need to get the index of the highest bit 1.
   * Easy to find that if it's even-numbered, `n` will lose the length of 1.
   * But the odd-numvered won't.
   * So let's left shift 1 (for the first 1) to fix this.
   */
  x |= (x << 1);
  int n = 0;
  // Get index with bisection.
  n += (!!(x & (~0 << (n + 16)))) << 4;
  n += (!!(x & (~0 << (n + 8)))) << 3;
  n += (!!(x & (~0 << (n + 4)))) << 2;
  n += (!!(x & (~0 << (n + 2)))) << 1;
  n += !!(x & (~0 << (n + 1)));
  // Add one more for the sign bit.
  return n + 1;
}

这里我利用了之前 conditional 的做法，讲 x 为负的情况排除掉，统一处理正整数。统计位数可以采取二分法查找最高位的 1，但做了几轮测试就会发现二分法存在漏位的问题。

不过这只在偶数位发生，奇数位不受影响。因此为了排除这个影响，我暴力地用 x |= (x << 1) 的办法让最高位的 1 左移 1 位。

floatScale2

/* 
 * floatScale2 - Return bit-level equivalent of expression 2*f for
 *   floating point argument f.
 *   Both the argument and result are passed as unsigned int's, but
 *   they are to be interpreted as the bit-level representation of
 *   single-precision floating point values.
 *   When argument is NaN, return argument
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
unsigned floatScale2(unsigned uf) {
  int exp = (uf >> 23) & 0xFF;
  // Special
  if (exp == 0xFF)
    return uf;
  // Denormalized
  if (exp == 0)
    return ((uf & 0x007fffff) << 1) | (uf & (1 << 31));
  // Normalized
  return uf + (1 << 23);
}

只需要简单地取出指数部分，甚至不需要拆解，排除 INF、NaN、非规格化的情况之后，剩下规格化的处理是指数部分的位进一。

floatFloat2Int

/* 
 * floatFloat2Int - Return bit-level equivalent of expression (int) f
 *   for floating point argument f.
 *   Argument is passed as unsigned int, but
 *   it is to be interpreted as the bit-level representation of a
 *   single-precision floating point value.
 *   Anything out of range (including NaN and infinity) should return
 *   0x80000000u.
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
int floatFloat2Int(unsigned uf) {
  int TMIN = 1 << 31;
  int exp = ((uf >> 23) & 0xFF) - 127;
  // Out of range
  if (exp > 31)
    return TMIN;
  if (exp < 0)
    return 0;
  int frac = (uf & 0x007fffff) | 0x00800000;
  // Left shift or right shift
  int f = (exp > 23) ? (frac << (exp - 23)) : (frac >> (23 - exp));
  // Sign
  return (uf & TMIN) ? -f : f;
}

首先拆分单精度浮点数的指数和基数，指数部分减去 127 偏移量，用来排除临界条件。大于 31 时，超过 32 位 Two’s Complement 的最大范围，小于 0 则忽略不计，根据题意分别返回 0x80000000 和 0。

之后根据指数部分是否大于 23 来判断小数点位置。如果大于，说明小数部分全部在小数点左边，需要左移；如果小于则需要右移。最后补上符号位。

floatPower2

/* 
 * floatPower2 - Return bit-level equivalent of the expression 2.0^x
 *   (2.0 raised to the power x) for any 32-bit integer x.
 *
 *   The unsigned value that is returned should have the identical bit
 *   representation as the single-precision floating-point number 2.0^x.
 *   If the result is too small to be represented as a denorm, return
 *   0. If too large, return +INF.
 * 
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. Also if, while 
 *   Max ops: 30 
 *   Rating: 4
 */
unsigned floatPower2(int x) {
  int exp = x + 127;
  // 0
  if (exp <= 0)
    return 0;
  // INF
  if (exp >= 0xFF)
    return 0x7f800000;
  return exp << 23;
}

加 127 得到指数阶码，超过表示范围则返回 0 和 INF。由于小数点后面都是 0，只需左移指数部分。

小结

现在 Mac 已无法运行 32 位的代码检查工具 dlc，不过可以先跑逻辑测试，等写完再放到 Linux 机跑一遍 dlc 测试。

原以为这点知识在学校掌握得还可以，随书习题和前几道 lab 也的确简单，实际做到后面有许多卡壳的点，浮点数的概念都模糊了，真是一边翻书一边做，快两天才完成。书本的这章我还是甭跳了，继续刷去吧。

CSAPP-Bomb Lab

AiDaiP

2019年1月26日 08:00

CSAPP-Bomb Lab

刚看到这东西有个大胆的想法，直接IDA pro F5

我就是饿死，死外面，从这跳下去，也不会用IDA pro

phase_1

08048b20 <phase_1>:
 8048b20:	55                   	push   %ebp
 8048b21:	89 e5                	mov    %esp,%ebp
 8048b23:	83 ec 08             	sub    $0x8,%esp
 8048b26:	8b 45 08             	mov    0x8(%ebp),%eax
 8048b29:	83 c4 f8             	add    $0xfffffff8,%esp
 8048b2c:	68 c0 97 04 08       	push   $0x80497c0
 8048b31:	50                   	push   %eax
 8048b32:	e8 f9 04 00 00       	call   8049030 <strings_not_equal>
 #比较0x80497c0对应的字符串和输入的字符串
 8048b37:	83 c4 10             	add    $0x10,%esp
 8048b3a:	85 c0                	test   %eax,%eax
 #相等跳转8048b43
 8048b3c:	74 05                	je     8048b43 <phase_1+0x23>
 #不相等爆炸
 8048b3e:	e8 b9 09 00 00       	call   80494fc <explode_bomb>
 8048b43:	89 ec                	mov    %ebp,%esp
 8048b45:	5d                   	pop    %ebp
 8048b46:	c3                   	ret    
 8048b47:	90                   	nop

strings_not_equal判断字符串是否相等的函数，前面两个push是这个函数的传入实参，查看0x80497c0 是Public speaking is very easy.，eax应该是输入的字符串。

就是炸了如果相等跳转到8048b43，不相等调用爆炸。答案是``Public speaking is very easy.``

phase_2

08048b48 <phase_2>:
 8048b48:	55                   	push   %ebp
 8048b49:	89 e5                	mov    %esp,%ebp
 8048b4b:	83 ec 20             	sub    $0x20,%esp
 8048b4e:	56                   	push   %esi
 8048b4f:	53                   	push   %ebx
 8048b50:	8b 55 08             	mov    0x8(%ebp),%edx
 8048b53:	83 c4 f8             	add    $0xfffffff8,%esp
 8048b56:	8d 45 e8             	lea    -0x18(%ebp),%eax
 8048b59:	50                   	push   %eax
 8048b5a:	52                   	push   %edx
 8048b5b:	e8 78 04 00 00       	call   8048fd8 <read_six_numbers>
 #读取六个数字
 #-0x18(%ebp)a[0]，-0x14(%ebp)a[1]，-0x10(%ebp)a[2]，-0xC(%ebp)a[3]，-0x8(%ebp)a[4]，-0x4(%ebp)a[5]
 8048b60:	83 c4 10             	add    $0x10,%esp
 8048b63:	83 7d e8 01          	cmpl   $0x1,-0x18(%ebp)
 #a[1]和1比较
 8048b67:	74 05                	je     8048b6e <phase_2+0x26>
 #等于1跳转到8048b6e
 8048b69:	e8 8e 09 00 00       	call   80494fc <explode_bomb>
 #不是1爆炸
 8048b6e:	bb 01 00 00 00       	mov    $0x1,%ebx
 #ebx=1
 8048b73:	8d 75 e8             	lea    -0x18(%ebp),%esi
 8048b76:	8d 43 01             	lea    0x1(%ebx),%eax
 #esi=a[1]，eax=ebx+1
 8048b79:	0f af 44 9e fc       	imul   -0x4(%esi,%ebx,4),%eax
 #eax=eax*[(esi+ebx*4)-0x4]
 8048b7e:	39 04 9e             	cmp    %eax,(%esi,%ebx,4)
 #比较eax和(esi+ebx*4)
 8048b81:	74 05                	je     8048b88 <phase_2+0x40>
 #相等继续循环
 8048b83:	e8 74 09 00 00       	call   80494fc <explode_bomb>
 #不相等爆炸
 8048b88:	43                   	inc    %ebx
 8048b89:	83 fb 05             	cmp    $0x5,%ebx
 8048b8c:	7e e8                	jle    8048b76 <phase_2+0x2e>
 #ebx=5时跳出循环，每次循环ebx+1
 8048b8e:	8d 65 d8             	lea    -0x28(%ebp),%esp
 8048b91:	5b                   	pop    %ebx
 8048b92:	5e                   	pop    %esi
 8048b93:	89 ec                	mov    %ebp,%esp
 8048b95:	5d                   	pop    %ebp
 8048b96:	c3                   	ret    
 8048b97:	90                   	nop

输入六个数字，先判断第一个数字是否为1，然后开始循环

for(i=1;i<=5;i++)
{
    if(a[i]!=a[i-1]*(i+1))
        explode_bomb();
}

答案为1 2 6 24 120 720

phase_3

08048b98 <phase_3>:
 8048b98:	55                   	push   %ebp
 8048b99:	89 e5                	mov    %esp,%ebp
 8048b9b:	83 ec 14             	sub    $0x14,%esp
 8048b9e:	53                   	push   %ebx
 8048b9f:	8b 55 08             	mov    0x8(%ebp),%edx
 8048ba2:	83 c4 f4             	add    $0xfffffff4,%esp
 8048ba5:	8d 45 fc             	lea    -0x4(%ebp),%eax
 8048ba8:	50                   	push   %eax
 8048ba9:	8d 45 fb             	lea    -0x5(%ebp),%eax
 8048bac:	50                   	push   %eax
 8048bad:	8d 45 f4             	lea    -0xc(%ebp),%eax
 8048bb0:	50                   	push   %eax
 8048bb1:	68 de 97 04 08       	push   $0x80497de
 8048bb6:	52                   	push   %edx
 8048bb7:	e8 a4 fc ff ff       	call   8048860 <sscanf@plt>
 8048bbc:	83 c4 20             	add    $0x20,%esp
 8048bbf:	83 f8 02             	cmp    $0x2,%eax
 #eax是sscanf的返回值，和2比较，小于等于2爆炸，正确输入%d %c %d返回3，跳转到8048bc9
 #-0xc(%ebp)是第一个数字，-0x5(%ebp)是字母，-0x4(%ebp)是第二个数字
 8048bc2:	7f 05                	jg     8048bc9 <phase_3+0x31>
 8048bc4:	e8 33 09 00 00       	call   80494fc <explode_bomb>
 8048bc9:	83 7d f4 07          	cmpl   $0x7,-0xc(%ebp)
 #输入的第一个数字和7比较，大于7跳到8048c88爆炸
 8048bcd:	0f 87 b5 00 00 00    	ja     8048c88 <phase_3+0xf0>
 8048bd3:	8b 45 f4             	mov    -0xc(%ebp),%eax
 8048bd6:	ff 24 85 e8 97 04 08 	jmp    *0x80497e8(,%eax,4)
 8048bdd:	8d 76 00             	lea    0x0(%esi),%esi
 8048be0:	b3 71                	mov    $0x71,%bl
 #bl=0x71
 8048be2:	81 7d fc 09 03 00 00 	cmpl   $0x309,-0x4(%ebp)
 #第二个数字和309比较
 8048be9:	0f 84 a0 00 00 00    	je     8048c8f <phase_3+0xf7>
 #相等跳转到8048c8f比较第二个字母，不相等爆炸
 8048bef:	e8 08 09 00 00       	call   80494fc <explode_bomb>
 8048bf4:	e9 96 00 00 00       	jmp    8048c8f <phase_3+0xf7>
 8048bf9:	8d b4 26 00 00 00 00 	lea    0x0(%esi,%eiz,1),%esi
 8048c00:	b3 62                	mov    $0x62,%bl
 8048c02:	81 7d fc d6 00 00 00 	cmpl   $0xd6,-0x4(%ebp)
 8048c09:	0f 84 80 00 00 00    	je     8048c8f <phase_3+0xf7>
 8048c0f:	e8 e8 08 00 00       	call   80494fc <explode_bomb>
 8048c14:	eb 79                	jmp    8048c8f <phase_3+0xf7>
 8048c16:	b3 62                	mov    $0x62,%bl
 8048c18:	81 7d fc f3 02 00 00 	cmpl   $0x2f3,-0x4(%ebp)
 8048c1f:	74 6e                	je     8048c8f <phase_3+0xf7>
 8048c21:	e8 d6 08 00 00       	call   80494fc <explode_bomb>
 8048c26:	eb 67                	jmp    8048c8f <phase_3+0xf7>
 8048c28:	b3 6b                	mov    $0x6b,%bl
 8048c2a:	81 7d fc fb 00 00 00 	cmpl   $0xfb,-0x4(%ebp)
 8048c31:	74 5c                	je     8048c8f <phase_3+0xf7>
 8048c33:	e8 c4 08 00 00       	call   80494fc <explode_bomb>
 8048c38:	eb 55                	jmp    8048c8f <phase_3+0xf7>
 8048c3a:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
 8048c40:	b3 6f                	mov    $0x6f,%bl
 8048c42:	81 7d fc a0 00 00 00 	cmpl   $0xa0,-0x4(%ebp)
 8048c49:	74 44                	je     8048c8f <phase_3+0xf7>
 8048c4b:	e8 ac 08 00 00       	call   80494fc <explode_bomb>
 8048c50:	eb 3d                	jmp    8048c8f <phase_3+0xf7>
 8048c52:	b3 74                	mov    $0x74,%bl
 8048c54:	81 7d fc ca 01 00 00 	cmpl   $0x1ca,-0x4(%ebp)
 8048c5b:	74 32                	je     8048c8f <phase_3+0xf7>
 8048c5d:	e8 9a 08 00 00       	call   80494fc <explode_bomb>
 8048c62:	eb 2b                	jmp    8048c8f <phase_3+0xf7>
 8048c64:	b3 76                	mov    $0x76,%bl
 8048c66:	81 7d fc 0c 03 00 00 	cmpl   $0x30c,-0x4(%ebp)
 8048c6d:	74 20                	je     8048c8f <phase_3+0xf7>
 8048c6f:	e8 88 08 00 00       	call   80494fc <explode_bomb>
 8048c74:	eb 19                	jmp    8048c8f <phase_3+0xf7>
 8048c76:	b3 62                	mov    $0x62,%bl
 8048c78:	81 7d fc 0c 02 00 00 	cmpl   $0x20c,-0x4(%ebp)
 8048c7f:	74 0e                	je     8048c8f <phase_3+0xf7>
 8048c81:	e8 76 08 00 00       	call   80494fc <explode_bomb>
 8048c86:	eb 07                	jmp    8048c8f <phase_3+0xf7>
 8048c88:	b3 78                	mov    $0x78,%bl
 8048c8a:	e8 6d 08 00 00       	call   80494fc <explode_bomb>
 8048c8f:	3a 5d fb             	cmp    -0x5(%ebp),%bl
 #输入的字母的ascii码和bl比较
 8048c92:	74 05                	je     8048c99 <phase_3+0x101>
 #不相等爆炸
 8048c94:	e8 63 08 00 00       	call   80494fc <explode_bomb>
 8048c99:	8b 5d e8             	mov    -0x18(%ebp),%ebx
 8048c9c:	89 ec                	mov    %ebp,%esp
 8048c9e:	5d                   	pop    %ebp
 8048c9f:	c3                   	ret

调用了sscanf，先看一眼0x80497de

%d %c %d应输入”数字字母数字“

8048bc9后一堆cmpl je看着像switch语句

答案


第一个数字	0	1	2	3	4	5	6	7
字母	q	b	b	k	o	t	v	b
第二个数字	777	214	755	251	160	458	780	524

phase_4

08048ce0 <phase_4>:
 8048ce0:	55                   	push   %ebp
 8048ce1:	89 e5                	mov    %esp,%ebp
 8048ce3:	83 ec 18             	sub    $0x18,%esp
 8048ce6:	8b 55 08             	mov    0x8(%ebp),%edx
 8048ce9:	83 c4 fc             	add    $0xfffffffc,%esp
 8048cec:	8d 45 fc             	lea    -0x4(%ebp),%eax
 8048cef:	50                   	push   %eax
 8048cf0:	68 08 98 04 08       	push   $0x8049808
 8048cf5:	52                   	push   %edx
 8048cf6:	e8 65 fb ff ff       	call   8048860 <sscanf@plt>
 8048cfb:	83 c4 10             	add    $0x10,%esp
 8048cfe:	83 f8 01             	cmp    $0x1,%eax
 #eax是sscanf的返回值，正确输入返回1，如果eax不是1跳转到8048d09爆炸
 8048d01:	75 06                	jne    8048d09 <phase_4+0x29>
 8048d03:	83 7d fc 00          	cmpl   $0x0,-0x4(%ebp)
 #输入的数字和0比较，小于等于0爆炸
 8048d07:	7f 05                	jg     8048d0e <phase_4+0x2e>
 8048d09:	e8 ee 07 00 00       	call   80494fc <explode_bomb>
 8048d0e:	83 c4 f4             	add    $0xfffffff4,%esp
 8048d11:	8b 45 fc             	mov    -0x4(%ebp),%eax
 #eax=-0x4(%ebp)，eax是func4的传入实参
 8048d14:	50                   	push   %eax
 8048d15:	e8 86 ff ff ff       	call   8048ca0 <func4>
 8048d1a:	83 c4 10             	add    $0x10,%esp
 8048d1d:	83 f8 37             	cmp    $0x37,%eax
 #eax是func4的返回值，eax!=0x37爆炸
 8048d20:	74 05                	je     8048d27 <phase_4+0x47>
 8048d22:	e8 d5 07 00 00       	call   80494fc <explode_bomb>
 8048d27:	89 ec                	mov    %ebp,%esp
 8048d29:	5d                   	pop    %ebp
 8048d2a:	c3                   	ret    
 8048d2b:	90                   	nop

还是调用sscanf，看一眼0x8049808

%d应输入一个数字

调用了func4，看一眼

8048ca0 <func4>:
 8048ca0:	55                   	push   %ebp
 8048ca1:	89 e5                	mov    %esp,%ebp
 8048ca3:	83 ec 10             	sub    $0x10,%esp
 8048ca6:	56                   	push   %esi
 8048ca7:	53                   	push   %ebx
 8048ca8:	8b 5d 08             	mov    0x8(%ebp),%ebx
 8048cab:	83 fb 01             	cmp    $0x1,%ebx
 #ebx和1比较，小于等于1跳转到8048cd0
 8048cae:	7e 20                	jle    8048cd0 <func4+0x30>
 8048cb0:	83 c4 f4             	add    $0xfffffff4,%esp
 8048cb3:	8d 43 ff             	lea    -0x1(%ebx),%eax
 #eax=ebx-1
 8048cb6:	50                   	push   %eax
 8048cb7:	e8 e4 ff ff ff       	call   8048ca0 <func4>
 8048cbc:	89 c6                	mov    %eax,%esi
 #esi=eax
 8048cbe:	83 c4 f4             	add    $0xfffffff4,%esp
 8048cc1:	8d 43 fe             	lea    -0x2(%ebx),%eax
 #eax=abx-2
 8048cc4:	50                   	push   %eax
 8048cc5:	e8 d6 ff ff ff       	call   8048ca0 <func4>
 8048cca:	01 f0                	add    %esi,%eax
 #eax=esi+eax
 8048ccc:	eb 07                	jmp    8048cd5 <func4+0x35>
 8048cce:	89 f6                	mov    %esi,%esi
 8048cd0:	b8 01 00 00 00       	mov    $0x1,%eax
 #eax=1，如果ebx==1，func4返回1
 8048cd5:	8d 65 e8             	lea    -0x18(%ebp),%esp
 8048cd8:	5b                   	pop    %ebx
 8048cd9:	5e                   	pop    %esi
 8048cda:	89 ec                	mov    %ebp,%esp
 8048cdc:	5d                   	pop    %ebp
 8048cdd:	c3                   	ret    
 8048cde:	89 f6                	mov    %esi,%esi

func4里call func4，这玩意是个递归

int func4(int x)
{
    if(x <= 1)
        return 1;
    else
        return func4(x - 1) + func4(x - 2);
}

暴力破解一波

#include<stdio.h>
int func4(int x)
{
    if(x <= 1)
        return 1;
    else
        return func4(x - 1) + func4(x - 2);
}
  
int main()
{
	int a = 0,i = 0;
	while(a != 55)
	{
		i++;
		a = func4(i);
	}
	printf("%d",i);
}

答案是9

phase_5

08048d2c <phase_5>:
 8048d2c:	55                   	push   %ebp
 8048d2d:	89 e5                	mov    %esp,%ebp
 8048d2f:	83 ec 10             	sub    $0x10,%esp
 8048d32:	56                   	push   %esi
 8048d33:	53                   	push   %ebx
 8048d34:	8b 5d 08             	mov    0x8(%ebp),%ebx
 8048d37:	83 c4 f4             	add    $0xfffffff4,%esp
 8048d3a:	53                   	push   %ebx
 8048d3b:	e8 d8 02 00 00       	call   8049018 <string_length>
 8048d40:	83 c4 10             	add    $0x10,%esp
 8048d43:	83 f8 06             	cmp    $0x6,%eax
 #输入一个字符串，长度和6比较，不是6爆炸,ebx是输入的字符串
 8048d46:	74 05                	je     8048d4d <phase_5+0x21>
 8048d48:	e8 af 07 00 00       	call   80494fc <explode_bomb>
   
 8048d4d:	31 d2                	xor    %edx,%edx
 #edx=0
 8048d4f:	8d 4d f8             	lea    -0x8(%ebp),%ecx
 8048d52:	be 20 b2 04 08       	mov    $0x804b220,%esi
 #esi是isrveawhobpnutfg
 8048d57:	8a 04 1a             	mov    (%edx,%ebx,1),%al
 #al=edx+ebx*1
 8048d5a:	24 0f                	and    $0xf,%al
 #al=al & 0xf
 8048d5c:	0f be c0             	movsbl %al,%eax
 #eax=al
 8048d5f:	8a 04 30             	mov    (%eax,%esi,1),%al
 #al=eax+esi*1
 8048d62:	88 04 0a             	mov    %al,(%edx,%ecx,1)
 #edx+ecx*1=al
 8048d65:	42                   	inc    %edx
 #edx=edx+1
 8048d66:	83 fa 05             	cmp    $0x5,%edx
 #edx<=5继续循环
 8048d69:	7e ec                	jle    8048d57 <phase_5+0x2b>
   
 8048d6b:	c6 45 fe 00          	movb   $0x0,-0x2(%ebp)
 8048d6f:	83 c4 f8             	add    $0xfffffff8,%esp
   
 8048d72:	68 0b 98 04 08       	push   $0x804980b
 8048d77:	8d 45 f8             	lea    -0x8(%ebp),%eax
 8048d7a:	50                   	push   %eax
 8048d7b:	e8 b0 02 00 00       	call   8049030 <strings_not_equal>
 8048d80:	83 c4 10             	add    $0x10,%esp
 8048d83:	85 c0                	test   %eax,%eax
 8048d85:	74 05                	je     8048d8c <phase_5+0x60>
 8048d87:	e8 70 07 00 00       	call   80494fc <explode_bomb>
 8048d8c:	8d 65 e8             	lea    -0x18(%ebp),%esp
 8048d8f:	5b                   	pop    %ebx
 8048d90:	5e                   	pop    %esi
 8048d91:	89 ec                	mov    %ebp,%esp
 8048d93:	5d                   	pop    %ebp
 8048d94:	c3                   	ret    
 8048d95:	8d 76 00             	lea    0x0(%esi),%esi

循环开始之前取出0x804b220到esi，0x804b220是isrveawhobpnutfg

最后调用了strings_not_equal，看一眼804980b

giants

循环后得到的字符串和giants比较

char a[7],b[17]='isrveawhobpnutfg',c[5]
for(i=0;i<=5;i++)
{
    int x=(int)(c[i]&0xf);
    a[i]=b[x];
}

&0xf后应为0x0f 0x00 0x05 0x0b 0x0d 0x01

高四位随便改，低四位不动，可以得到多组答案

其中一组为opukma

phase_6

08048d98 <phase_6>:
 8048d98:	55                   	push   %ebp
 8048d99:	89 e5                	mov    %esp,%ebp
 8048d9b:	83 ec 4c             	sub    $0x4c,%esp
 8048d9e:	57                   	push   %edi
 8048d9f:	56                   	push   %esi
 8048da0:	53                   	push   %ebx
 8048da1:	8b 55 08             	mov    0x8(%ebp),%edx
 8048da4:	c7 45 cc 6c b2 04 08 	movl   $0x804b26c,-0x34(%ebp)
 8048dab:	83 c4 f8             	add    $0xfffffff8,%esp
 8048dae:	8d 45 e8             	lea    -0x18(%ebp),%eax
 8048db1:	50                   	push   %eax
 8048db2:	52                   	push   %edx
 8048db3:	e8 20 02 00 00       	call   8048fd8 <read_six_numbers>
 #输入六个数字，-0x18(%ebp)是首地址
 8048db8:	31 ff                	xor    %edi,%edi
 #edi=0
 8048dba:	83 c4 10             	add    $0x10,%esp
 8048dbd:	8d 76 00             	lea    0x0(%esi),%esi
 8048dc0:	8d 45 e8             	lea    -0x18(%ebp),%eax
 #eax是数组首地址
 8048dc3:	8b 04 b8             	mov    (%eax,%edi,4),%eax
 #eax=eax+adi*4
 8048dc6:	48                   	dec    %eax
 #eax=eax-1
 8048dc7:	83 f8 05             	cmp    $0x5,%eax
 #eax和5比较，大于5爆炸
 8048dca:	76 05                	jbe    8048dd1 <phase_6+0x39>
 8048dcc:	e8 2b 07 00 00       	call   80494fc <explode_bomb>
   
 8048dd1:	8d 5f 01             	lea    0x1(%edi),%ebx
 #ebx=edi+1
   
 8048dd4:	83 fb 05             	cmp    $0x5,%ebx
 8048dd7:	7f 23                	jg     8048dfc <phase_6+0x64>
 #ebx和5比较小于等于5跳转到8048dfc
 8048dd9:	8d 04 bd 00 00 00 00 	lea    0x0(,%edi,4),%eax
 8048de0:	89 45 c8             	mov    %eax,-0x38(%ebp)
 #-0x38(%ebp)=eax
 8048de3:	8d 75 e8             	lea    -0x18(%ebp),%esi
 #esi是输入数组的首地址
 8048de6:	8b 55 c8             	mov    -0x38(%ebp),%edx
 #edx=-0x38(%ebp)
 8048de9:	8b 04 32             	mov    (%edx,%esi,1),%eax
 8048dec:	3b 04 9e             	cmp    (%esi,%ebx,4),%eax
 8048def:	75 05                	jne    8048df6 <phase_6+0x5e>
 8048df1:	e8 06 07 00 00       	call   80494fc <explode_bomb>
 #edx+esi*1!=esi+ebx*4跳转到8048df6，相等爆炸
 8048df6:	43                   	inc    %ebx
 #ebx=ebx+1
 8048df7:	83 fb 05             	cmp    $0x5,%ebx
 8048dfa:	7e ea                	jle    8048de6 <phase_6+0x4e>
 8048dfc:	47                   	inc    %edi
 8048dfd:	83 ff 05             	cmp    $0x5,%edi
 8048e00:	7e be                	jle    8048dc0 <phase_6+0x28>
 #第一个循环结束
 8048e02:	31 ff                	xor    %edi,%edi
 #edi=0
 8048e04:	8d 4d e8             	lea    -0x18(%ebp),%ecx
 #ecx是数组首地址
 8048e07:	8d 45 d0             	lea    -0x30(%ebp),%eax
 #eax是地址-0x30(%ebp)
 8048e0a:	89 45 c4             	mov    %eax,-0x3c(%ebp)
 8048e0d:	8d 76 00             	lea    0x0(%esi),%esi
 8048e10:	8b 75 cc             	mov    -0x34(%ebp),%esi
 #esi=-0x34(%ebp)
 8048e13:	bb 01 00 00 00       	mov    $0x1,%ebx
 #ebx=1
 8048e18:	8d 04 bd 00 00 00 00 	lea    0x0(,%edi,4),%eax
 #eax=edi*4
 8048e1f:	89 c2                	mov    %eax,%edx
 #edx=eax
 8048e21:	3b 1c 08             	cmp    (%eax,%ecx,1),%ebx
 8048e24:	7d 12                	jge    8048e38 <phase_6+0xa0>
 #ebx小于eax+ecx*1进入循环
 8048e26:	8b 04 0a             	mov    (%edx,%ecx,1),%eax
 8048e29:	8d b4 26 00 00 00 00 	lea    0x0(%esi,%eiz,1),%esi
 8048e30:	8b 76 08             	mov    0x8(%esi),%esi
 #esi=esi+0x8
 8048e33:	43                   	inc    %ebx
 #ebx=ebx+1
 8048e34:	39 c3                	cmp    %eax,%ebx
 8048e36:	7c f8                	jl     8048e30 <phase_6+0x98>
 #ebx小于eax继续循环
 8048e38:	8b 55 c4             	mov    -0x3c(%ebp),%edx
 #edx=-0x3c(%ebp)
 8048e3b:	89 34 ba             	mov    %esi,(%edx,%edi,4)
 #edx+edi*4=esi
 8048e3e:	47                   	inc    %edi
 8048e3f:	83 ff 05             	cmp    $0x5,%edi
 8048e42:	7e cc                	jle    8048e10 <phase_6+0x78>
 #edi小于等于5继续循环
 8048e44:	8b 75 d0             	mov    -0x30(%ebp),%esi
 8048e47:	89 75 cc             	mov    %esi,-0x34(%ebp)
 8048e4a:	bf 01 00 00 00       	mov    $0x1,%edi
 #edi=1
 8048e4f:	8d 55 d0             	lea    -0x30(%ebp),%edx
 8048e52:	8b 04 ba             	mov    (%edx,%edi,4),%eax
 8048e55:	89 46 08             	mov    %eax,0x8(%esi)
 #eax=esi+0x8
 8048e58:	89 c6                	mov    %eax,%esi
 #esi=eax
 8048e5a:	47                   	inc    %edi
 8048e5b:	83 ff 05             	cmp    $0x5,%edi
 8048e5e:	7e f2                	jle    8048e52 <phase_6+0xba>
 #edi小于等于5继续循环
 8048e60:	c7 46 08 00 00 00 00 	movl   $0x0,0x8(%esi)
 8048e67:	8b 75 cc             	mov    -0x34(%ebp),%esi
 8048e6a:	31 ff                	xor    %edi,%edi
 #edi=0
 8048e6c:	8d 74 26 00          	lea    0x0(%esi,%eiz,1),%esi
 8048e70:	8b 56 08             	mov    0x8(%esi),%edx
 #eax=esi+0x8
 8048e73:	8b 06                	mov    (%esi),%eax
 #eax=esi
 8048e75:	3b 02                	cmp    (%edx),%eax
 8048e77:	7d 05                	jge    8048e7e <phase_6+0xe6>
 #eax>=edx跳转到8048e7e，否则爆炸
 8048e79:	e8 7e 06 00 00       	call   80494fc <explode_bomb>
 8048e7e:	8b 76 08             	mov    0x8(%esi),%esi
 8048e81:	47                   	inc    %edi
 #edi++
 8048e82:	83 ff 04             	cmp    $0x4,%edi
 8048e85:	7e e9                	jle    8048e70 <phase_6+0xd8>
 #edi小于等于4继续循环
 8048e87:	8d 65 a8             	lea    -0x58(%ebp),%esp
 8048e8a:	5b                   	pop    %ebx
 8048e8b:	5e                   	pop    %esi
 8048e8c:	5f                   	pop    %edi
 8048e8d:	89 ec                	mov    %ebp,%esp
 8048e8f:	5d                   	pop    %ebp
 8048e90:	c3                   	ret    
 8048e91:	8d 76 00             	lea    0x0(%esi),%esi

开头有个804b26c，是<node1>

node1:0xfd
node2:0x2d5
node3:0x12d
node4:0x3e5
node5:0xd4
node6:0x1b0

第一个循环是一个嵌套循环，每次循环先判断当前元素是否大于五再判断是否存在与当前元素相同的元素，所以数组内元素应大于等于0且小于5，且每个元素不同

第二个循环应该是根据输入的数据对node进行排序，储存到一个新数组中

0x8(%esi),%esi应该是指向下一个node\

for(i=0; i<=5; i++)
{
    if (a[i]>5)
        explode_bomb();
    for (j=i+1; j<=5; j++)
    {
        if (a[i]== a[j])
            explode_bomb();
    }
}
for(i=0;i<=5;i++)
{
    for(j=1;j<a[i];j++)
    	node = node.next;
    s[i]=node;
}

第三个循环根据排序后的新数组重新排列node

第四个循环判断排序后的node是否满足条件，显然是从大到小排列

答案是4 2 6 3 1 5

搞完了，没有真香

阅读视图

Matrix Transposition

Cache Overview

32x32 Case

61x67 Case

64x64 Case

Conclusion

Understanding a Cache

1. The Anatomy of a Cache (SSS, EEE, BBB, mmm)

2. Address Decomposition

3. The “Search and Match” Process

Lab Requirements

Input

CLI

Caveats

The Codes

Data Models

Handling Command-Line Arguments

Initialize Cache

Handling File Input

Parsing Addresses

Loading Cache

Other Operations

Print Summary

Summary

題目要求

環境

前置知識

1. 啟動與退出 (Startup & Exit)

2. 斷點管理 (Breakpoints)

3. 執行控制 (Execution Control)

4. 查看數據 (Inspection)

5. 堆棧與上下文 (Stack & Context)

6. 提升體驗：TUI 模式 (Text User Interface)

反匯編

strings

Phase 1

Phase 2

Phase 3

Phase 4

偏置

Phase 5

Phase 6

隱藏關

總結

bitXor

tmin

isTmax

allOddBits

negate

isAsciiDigit

conditional

isLessOrEqual

logicalNeg

howManyBits

floatScale2

IEEE 754

Case A: 非規格化 (Denormalized)

Case B: 規格化 (Normalized)

Case C: 特殊值 (Special Values)

floatPower2

小結

基本概念

提取位

清除位

反轉位

設定位

構造掩碼

條件掩碼

總結

題幹

思路

發現問題

未定義行為

觀察出錯程式碼

修改

為什麼這個“可能”有效？

結論：如何正確面對未定義行為

Matrix Transposition

Cache Overview

1. The Anatomy of a Cache ( $S$ , $E$ , $B$ , $m$ )

1. The Anatomy of a Cache ( $S$ , $E$ , $B$ , $m$ )