---title: 2020q3 Project (cache-lab) tags: sysprog---# 2020q3: cache-lab contributed by < `TsundereChen` >> GitHub Repo: [TsundereChen/csapp-cache-lab](https://github .com/TsundereChen/csapp-cache-lab)> [README](https://csapp.cs.cmu.edu/3e/README-cachelab)## Files```- cahcelab.[c,h] Lab Assistant files - csim.c Cache Simulator - csim-ref Executable reference cache simulator - Driver.py Driver, run test-csim and test-trans - Makefile - test-csim Cache simulator test tool - test-trans.c Tool for test transposition function-tracegen.c helper for test-trans-trans.c Transpose function-traces/trace files used by test-csim.c```## Environment* Valgrind must be installed on the system## Part A - Downloading the program argument - [getopt.h](https://www.gnu.org/software/libc/manual/html_node/Using-Getopt.html) - Create arguments according to csim```clike=void parseArg( int argc , char * argv[]) { if (argc == 1) { printf("Usage: ./csim [-hv] -s-MI-B-T\n"); exit(1); }; int opt; while ((opt = getopt(argc, argv, "hvs:E:b:t:")) != -1) { switch (option) { case 'h': printf("Gebruik: ./csim [-hv] -s-MI-B-T\n"); printf("Options:\n"); printf(" -h Print this help message.\n"); printf(" -v Optional extended flag.\n"); printf(" -sNumber of index bits set.\n"); printf(" -EThe number of lines in the set.\n"); printf(" -bBlock offset bits.\n"); printf(" -tTrace file.\n"); printf("\n"); printf("Examples:\n"); printf(" linux> ./csim -s 4 -E 1 -b 4 -ttraces/yi.trace \ n"); printf(" linux> ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace\n"); output(1); case 'v': full = true; break; 's' case: setIndexBit = atoi(optarg); break; 'E' case: linePerSet = atoi(optarg); break; 'b' case: blockOffsetBit = atoi(optarg); gap; 't' case: traceFile = fopen(optarg, "r"); pause; default: printf("Usage: ./csim [-hv] -s-MI-B-T\n"); exit(1); } }}``### Cache-struct - The cache looks like this - Below you will see find the structure line or cache - Part A of this exercise doesn't require us to modify the data in the cache, we just need to check that the correct bit is 1 and that the tag matches input``clike=typedef struct cacheData { int valid; int lru; uint64_t tag; } cacheData;```- Input - input files are stored in the `traces/` folder - all input data have the same format, e.g. ```clike= L 10.1 M 20 , 1 L 22.1 S 18.1 ,``` - Four instructions - I (load instruction) - Not used in this lab - L (load) - M (change) - Load + save - Two steps - S (save) - Use `sscanf` to read input - We must extract this information from the `` - Tag - Cacheset - Offset - Since we already know the size of the cache set and the size of the offset, we can use a bitwise operation to extract this information ``` clike=while(fgets(buffer, sizeof(buffer) , trace file ) != NULL ){ counter++; if(buffer[0] == 'I') continue; else{ sscanf(buffer, "%c%lx,%d", &operation, &address, &size); unsigned int tag = parseTag(setIndexBit, blockOffsetBit, address); unsigned int cacheset = parseCacheSet(setIndexBit, blockOffsetBit, address); unsigned int offset = parseOffset(setIndexBit, blockOffsetBit, address); switch(operation){ case 'L': checkCache(tag, cache set, offset, counter); pause; case 'M': checkCache(tag, cache set, offset, counter); checkCache(tag, cache set, offset, counter); pause; "S" case: checkCache(tag, cache set, offset, counter); pause; } } }```- Hit/Miss cache - If the correct bit is 1 and the tag matches, it is a match - If there is no match, it is a match. - We need to search the cache and see if there is an invalid cache - If so, we activate this cache and enter the tag information - If there is no match and all cache lines are valid - Then it's a ban, you need to find the least used cache cache, then replace it. - By counting the execution line, we can achieve LRU - The smaller the LRU, the less often it is used``` clike=void checkCache(int tag, int cacheSet, int offset, int counter) { for(int i = 0; i < linePerSet; i++){ if(cache[cacheSet][i].valid == 1 && cache[cacheSet][i].tag == tag){ cache[cacheSet][i].lru = counter; hits++; if(full) printf("HIT\n"); give; } } misses +; if(full) printf("ERROR\n"); int n_lru = 0; for(int i = 0; i < linePerSet; i++){ if(cache[cacheSet][i].valid == 0){ cache[cacheSet][i].valid = 1; cache[cacheSet][i].tag = tag; cache[cacheSet][i].lru = counter; give; } else if(cache[cacheSet][n_lru].lru > cache[cacheSet][i].lru) n_lru = i; } evictions++; if(verbose) printf("EXECTION!\n"); cache[cacheSet][n_lru].valid = 1; cache[cacheSet][n_lru].tag = tag; cache[cacheSet][n_lru].lru = counter; return;};```## Part B- You need to write a matrix transpose function - Three test cases - 32 x 32 - 64 x 64 - 61 x 67 - s = 5, E = 1, b = 5 - Cache size is 32*2 - Cache block size is 32 bytes - Can hold 8 integers at a time ### 32 x 32 - Instead of processing one integer at a time as the system is also fetching nearby data at the same time, it can store more numbers at once process integers, use cache and generate faster results. - Since the cache block size is 32 bytes and can hold 8 integers, we move 8 integers at a time. ```clike=// Case 1 for ( int i = 0; i < N; i += 8) { for ( int j = 0; j < M; j += 8) { for (int k = i; k < i + 8; k++) { int t0 = A[k][j]; int t1 = A[k][j+1]; int t2 = A[k][j+2]; int t3 = A[k][j+3]; int t4 = A[k][j+4]; int t5 = A[k][j+5]; int t6 = A[k][j+6]; int t7 = A[k][j+7]; B[j][k] = t0; B[j + 1] [k] = t1; B[j + 2] [k] = t2; B[j + 3] [k] = t3; B[j + 4][k] = t4; B[j + 5][k] = t5; B[j + 6] [k] = t6; B[j + 7] [k] = t7; } } }```:::warningWhy doesn't the following code use the cache?```clike=for (int i = 0; i < N; i += 8) { for (int j = 0; j < M ;j += 8) { for (int k = i; k < i + 8; k++) { B[j + 0][k] = A[k][j]; B[j + 1] [k] = A[k] [j+1]; B[j + 2] [k] = A[k] [j+2]; B[j + 3] [k] = A[k] [j+3]; B[j + 4][k] = A[k][j+4]; B[j + 5][k] = A[k][j+5]; B[j + 6] [k] = A[k] [j+6]; B[j + 7] [k] = A[k] [j+7]; } }}```:::### 64x64- See blog [深入理解计算机系统 (CS:APP) - 高速缓存实验 Cache Lab 解析](https://billc.io/2019/05/ csapp - cachelab /)- Process using 4 blocks.```clike=int t0, t1, t2, t3, t4, t5, t6, t7; for (int i = 0; i < N; i += 8) { for (int j = 0; j < M; j += 8) { for (int k = i; k < i + 4; k++) { t0 = A[k][j]; t1 = A[k][j+1]; t2 = A[k][j+2]; t3 = A[k][j+3]; t4 = A[k][j+4]; t5 = A[k][j+5]; t6 = A[k][j+6]; t7 = A[k][j+7]; B[j][k] = t0; B[j + 1] [k] = t1; B[j + 2] [k] = t2; B[j + 3] [k] = t3; B[j + 0][k + 4] = t7; B[j + 1] [k + 4] = t6; B[j + 2][k+4] = t5; B[j + 3] [k + 4] = t4; } for (int h = 0; h < 4; h++) { t0 = A[i + 4][j + 3 - h]; t1 = A[i + 5][j + 3 - h]; t2 = A[i + 6][j + 3 - h]; t3 = A[i + 7][j + 3 - h]; t4 = A[i + 4][j + 4 + h]; t5 = A[i + 5][j + 4 + h]; t6 = A[i + 6][j + 4 + h]; t7 = A[i + 7][j + 4 + h]; B[j + 4 + h] [i + 0] = B[j + 3 - h] [i + 4]; B[j + 4 + h] [i + 1] = B[j + 3 - h] [i + 5]; B[j + 4 + h] [i + 2] = B[j + 3 - h] [i + 6]; B[j + 4 + h] [i + 3] = B[j + 3 - h] [i + 7]; B[j + 3 - h][i + 4] = t0; B[j + 3 - h] [i + 5] = t1; B[j + 3 - h] [i + 6] = t2; B[j + 3 - h] [i + 7] = t3; B[j + 4 + h] [i + 4] = t4; B[j + 4 + h] [i + 5] = t5; B[j + 4 + h] [i + 6] = t6; B[j + 4 + h] [i + 7] = t7; } }}```### 61 x 67- Due to the method used in `32 x 32` and the change of steps, we can conclude that processing 16 integers at a time works best```clike= for ( int i = 0; i < N; i += 16) { for (int j = 0; j < M; j += 16) { for (int k = i; k < i + 16 && k < N ; k++ ) { for (int l = j; l#define MATRIX_SIZE 8void transpose(){ int A[MATRIX_SIZE][MATRIX_SIZE], B[MATRIX_SIZE][MATRIX_SIZE]; for(int i = 0; i < MATRIX_SIZE; i++){ for(int j = 0; j < MATRIX_SIZE; j++){ A[i][j] = (i+1) * (j+1); } }; for(int j = 0; j < ROZMIAR_MACIERZY; j++){ for(int i = 0; i < ROZMIAR_ MATRIX; i++){ int t0 = A[i][j]; B[j][i] = t0; } } return;}``` - Konfiguracja z suchymi pamięciami podręcznymi - 32 elementy, 4 elementy bezpośrednio w kaart gebracht Hitpercentage: 96,62%  - 32-items Volledig associatief uit 4 worden Hitpercentage: 97,16%  - 32-invoer 4-woorden 2-weg set associatief Hitpercentage: 97, 08%  - Haal 4 variabelen tegelijk op```clike=void transpose(){ int A[MATRIX_SIZE][MATRIX_SIZE], B[ MATRIX_SIZE][MATRIX_SIZE]; for(int i = 0; i < MATRIX_SIZE; i++){ for(int j = 0; j < MATRIX_SIZE; j++){ A[i][j] = (i+1) * (j+1); } }; for(int j = 0; j < MATRIX_SIZE; j+=4){ for(int i = 0; i < MATRIX_SIZE; i++){ int t0 = A[i][j]; int t1 = A[i][j+1]; int t2 = A[i][j+2]; int t3 = A[i][j+3]; B[j][i] = t0; B[j+1][i] = t1; B[j+2][i] = t2; B[j+3][i] = t3; } } return;}``` - Konfiguracja z suchymi pamięciami podręcznymi - 32 elementy, 4 elementy bezpośrednio w kaart gebracht Hitpercentage: 93,81%  - 32-items Volledig associatief uit 4 worden Hitpercentage: 96,54%  - 32-invoer 4-woorden 2-weg set associatief Hitpercentage: 96, %  - 32-elementy, 4-woorden, łączny procent stowarzyszeń Hitpercentage: 95,34%  - 32-elementy, 4 słowa, 2-weg zestaw skojarzeń Procent trafień: 95,86%  | | Bezpośrednio w kaart gebracht | Stowarzyszenie Volledig | Stowarzyszenie 2-wegset || --------- | --------- | ------------------ | ---------------------------------- || 1 tegelijk | 96,62% | 97,16% | 97,08% || 4 tegelijk | 93,81% | 96,54% | 96,72% || 8 tegelijk | 94,2% | 95,34% | 95,86% || 4 tegelijk, drie voor lus | 95,56% | 96,08% | 96,42% |#### N = 32- Metoda fikcyjna - Konfiguracja z suchymi pamięciami podręcznymi - 32 elementy w 4 drewnianych bezpośrednio w kaart gebracht Hitpercentage: 93,81%  - 32 elementy, 4 drewniane, volledig associatief Hitpercentage: 95,55%  - 32 elementy, 4 drewniane 2-weg set associatief Hitpercentage: 94,75 %  - Haal 4 variabelen tegelijk op - Konfiguracja z suchymi pamięciami podręcznymi - 32 vermeldingen i 4 drewniane bezpośrednie w kaart gebracht Procent trafień: 89,87%  - 32 vermeldingen, volledig associatief, 4 woorden Hitpercentage: 91,19%  — 32 elementy, 4 drewniane, 2 zestawy skojarzeń Hitpercentage: 91,19%  — Posiada 8 zmiennych tegelijk op - Konfiguracja z suchymi pamięciami podręcznymi - 32 - 4-drewniane bezpośrednie w kaart gebracht Hitpercentage: 93,36%  - 32-inzendingen 4-woorden volledig associatief Hitpercentage: 96,32%  - 32-items, 4 woorden, 2-weg set associatief Hitpercentage: 94,86% ! [n32_2-way_set_associative](https://i.imgur.com/agCjGC8. png) - 4 różne opcje tegelijk op, suche lus - Konfiguracja z suchymi pamięciami podręcznymi - 32 elementy i 4 drewniane bezpośrednie w kaart gebracht Hitpercentage: 86,15%! [n32_direct_mapped] (https://i.imgur.com/pixiqBd.png) - 32 vermeldingen, 4 woorden, volledig associatief Hitpercentage: 94,36%! [n32_fully_associative](https://i.imgur.com/T0ZjHk5.png) - 32 vermeldingen, 4 drewniane, 2-weg set associatief Hitpercentage: 94,03%  | | Bezpośrednio w kaart gebracht | Stowarzyszenie Volledig | Stowarzyszenie 2-wegset || --------- | --------- | ------------------ | ---------------------------------- || 1 tegelijk | 93,81% | 95,55% | 94,75% || 4 tegelijk | 89,87% | 91,19% | 91,19% || 8 tegelijk | 93,36% | 96,32% | 94,86% || 4 tegelijk, drie voor lus | 86,15% | 94,36% | 94,03% || 8 tegelijk, drie voor lus | 90,11% | 94,03% | 90,11% |#### N = 64| | Bezpośrednio w kaart gebracht | Stowarzyszenie Volledig | Stowarzyszenie 2-wegset || --------- | --------- | ------------------ | ---------------------------------- || 1 tegelijk | 90,88% | 92,17% | 92,13% || 4 tegelijk | 89,75% | 95,54% | 91,09% || 8 tegelijk | 89,65% | 95,44% | 90,78% || 4 tegelijk, drie voor lus | 88,57% | 94,22% | 89,94% || 8 tegelijk, drie voor lus | 90,90% | 94,58% | 92,05% |## Referentie- [CS:APP 3/e cachelab](https://hackmd.io/@pSnFKx_GTlmTWXn4A8lpKw/SkeA19jWN?type=view)- [CSAPP:Lab4-Cache Lab](https:// zhuanlan.zhihu.com/p/142942823)
apply
z
By clicking below, you consent to ourTerms of Service.
Log in with Facebook Subscribe via Twitter Register via GitHub Sign in to Dropbox