Machine Release 4: Cache Simulation and Optimization (2023)

Precipitate

This lab will help you understand the effect of caching on the performance of C programs.

The laboratory consists of two parts. In the first part, you write a small C program (about 200-300 lines) that simulates cache behavior. In the second part, you optimize a small matrix transposition to reduce cache misses.

Reference trace files

You come to work to fix this machine problem04-cachemap.

Zsongsthe subfolder contains the collectiontracking reference fileswhich we will use to evaluate the accuracy of the cache simulator you will write in Part A. Trace files are generated by a Linux program calledelection door. For example, writing

linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=ja ls -l

on the command line runs an executable programand and, saves a trace of each of its memory accesses in the order they occur and prints them furtherdifficult.

Election doorsmemory traces have the following form:

I 0400d7d4,8 M 0421c7f0,4 L 04f6b868,8 S 7ff0005c8,8

Each line indicates one or two memory accesses. The format of each line is

[space] edit address, size

Zoperationthe field indicates the type of memory access:Windicates instruction load,Lupload data,Sdata storage andMdata modification (i.e. data loading followed by data storage). There is never room for everyoneW. There will always be a place for everyoneM,L, IS. ZAddressfield specifies a 64-bit hexadecimal memory address. Thematthe field specifies the number of bytes the operation can access.

Part A: Writing a cache simulator

In part A you write a cache simulator in "csim.c" that requires a fileelection doormemory trace as input, simulates the behavior of cache hits/misses on that trace, and returns the total number of hits, misses, and rejects.

We have provided you with a binary executable areference cache simulator, calledcsim-refwhich simulates cache behavior of arbitrary size and associativity on aelection doortrace file. Uses the LRU replacement rule (least used) when selecting the cache rule to delete.

The credential simulator uses the following command-line arguments:

Usage: ./csim-ref [-hv] -s-MI-B-T
  • -H: Additional help tag that prints usage information

  • -w: An optional extended tag that displays tracking information

  • -S: Number of index bits set (Z = 2Sis the number of sets)

  • -MI: Associativity (number of rows in a set)

  • -B: Number of block bits (B = 2Bis the block size)

  • -T: Nameelection doornumber to repeat

Command line arguments are based on the format (S,mi, IB) from CS:APP. For example:

linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace hits: 4 misses: 5 evictions: 3

The same example in verbose mode:

linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace L 10.1 miss M 20.1 miss L 22.1 hit S 18.1 hit L 110.1 miss eviction L 210.1 miss eviction M 12.1 miss eviction hit hits:4 misses:5 evictions:3

Your task in part A is to populate the "csim.c" file so that it takes the same command line arguments and produces identical output as the reference simulator. Note that this file is almost completely empty. You will have to write it from scratch.

Part A Program Rules

  • Yourcsim.cFile must be submitted without warning to receive credit.

  • Your simulator should run fine randomlyS,mi, IB. This means you have to dynamically allocate disk space for the simulator data structuresMallocfunction.

  • In this lab, we are only interested in data cache performance, so your simulator should ignore all instruction cache accesses (lines starting withW). Remember itelection dooralways betWin the first column (no leading space) andM,L, ISin the second column (with a space at the beginning). This can help you analyze the trace.

  • To get credit for Part A, you need to call a functiondrukujSamenvatting, with the total number of hits, misses and falls, at the end of yoursmainfunction:

    printSummary(hit_count, miss_count, ontruiming_count);
  • For the purposes of this lab, assume that memory accesses are properly aligned so that a single memory access never crosses block boundaries. This assumption allows you to ignore request sizes inelection doorsongs.

Part B: Matrix transposition optimization

In part B you write a transpose functiontrans.cresulting in minimal cache misses.

LeaveAshow matrix iAI Jselect the component in the ithrow and jth column. ThetransposedvanAnoticedAT, is an array such thatAI J= ONETji.

To help you get started, we have provided an example transpose function in "trans.c" that calculates the transpose functionN × MstickerAand saves the resultsM × NstickerB:

char trans_desc[] = "Display transposed file scans"; ongeldig trans(int M, int N, int A[N][M], int B[M][N])

The transposed function example is valid but inefficient because the access pattern causes a relatively high number of cache misses.

Your task in Part B is to write a similar function calledtransponer_submit, reducing the number of cache misses in arrays of different sizes:

char transpose_submit_desc[] = "Transponowanie danych"; ongeldig transpose_submit(int M, int N, int A[N][M], int B[M][N]);

It doesthis ischange description string ("Uploading transposition") for youtransponer_submitfunction. The autograder looks for this string to determine which transpose function to evaluate for credit.

Programming rules for Part B

  • Enter your codetrans.cmust compile without warning to receive credit.

  • You can define up to 12 local type variablesintby the conversion function.

  • You must not replace the previous row with a type variablebloodor using bit tricks to store more than one value in one variable.

  • Your transpose function should not use recursion.

  • If you choose to use helper functions, you shouldn't have more than 12 local variables on the stack at any one time between your helper functions and the top-level transpose function. For example, if your transpose declares 8 variables and then calls a function that uses 4 variables that calls another function that uses 2, you have 14 variables on the stack and you're breaking the rule.

  • Your transpose function should not change field A. However, you can do whatever you want with the contents of field B.

  • You are NOT allowed to define strings in your code or use any variantsMalloc.

Rating

This section describes how your work will be assessed. The full score of this lab is 53 points:

  • Part A: 27 points

  • Part B: 26 points

Part A rating

In part A, we will run the cache simulator using various parameters and cache traces. There are eight test cases, each worth 3 points, except for the last case, which is worth 6 points:

linux> ./csim -s 1 -E 1 -b 1 -t traces/yi2.trace linux> ./csim -s 4 -E 2 -b 4 -t traces/yi.trace linux> ./csim -s 2 -E 1 -b 4 -t traces/dave.trace linux> ./csim -s 2 -E 1 -b 3 -t traces/trans.trace linux> ./csim -s 2 -E 2 -b 3 -t traces/trans.trace linux> ./csim -s 2 -E 4 -b 3 -t traces/trans.trace linux> ./csim -s 5 -E 1 -b 5 -t traces/trans.trace linux> . /csim -s 5 -E 1 -b 5 -t tragovi/dugi.trag

You can use the reference simulatorcsim-refto get the correct answer for each of these test cases. When debugging use-wthe option of a detailed report on each hit and miss.

For each test case, logging the exact number of hits, misses, and rejects in the cache gives full credit to that test case. Each reported hit, miss, and deletion is worth 1/3 of the credit for that test case. This means that if a given test case is worth 3 points and your simulator makes the correct number of hits and misses but reports the wrong number of throws, you will receive 2 points.

Part B evaluation

For part B, we evaluate the correctness and performance of yourstransponer_submitfunction on three output arrays of different sizes:

  • 32 × 32(M=32,N=32)

  • 64 × 64(M=64,N=64)

  • 61 × 67(M=61,N=67)

Efficiency (26 points)

For each matrix size, the performance of yourtransponer_submitthe function is evaluated withelection doorto extract the address trace for your function, then use the credential simulator to recreate that trace in the parameter cache (s=5,E=1,b=5).

Your performance score for each cube scales linearly with the number of misses,M, up to a certain threshold:

  • 32 × 32: I have 8 pointsmeters < 300I have 0 ptsmeters > 600

  • 64 × 64: I have 8 pointsmeters < 1300I have 0 ptsmeters > 2000

  • 61 × 67: I have 10 pointsmeters < 2,000I have 0 ptsmeters > 3000

Your code must be correct and conform to development rules to receive performance points for a specific size. Your code only needs to be valid for these three cases, and you can optimize it specifically for these three cases. In particular, it's a good idea for your function to explicitly validate input sizes and implement separate code optimized for each case.

job at laboratory

Work on Part A

We have provided you with an automatic grading program calledbody clipwhich tests the validity of the cache simulator against reference traces. Remember to compile the simulator before running the test:

linux> makelinux> ./test-csim Your simulator reference Simulator scores (s,E,b) Misses Throws Hits Misses Throws 3 (1,1,1) 9 8 6 9 8 6 traces/yi2.trace 3 (4 , 2 , 4) 4 5 2 4 5 2 tracks/yi.track 3 (2,1,4) 2 3 1 2 3 1 tracks/dave.track 3 (2,1,3) 167 71 67 167 71 67 tracks / trans. track 3 (2,2,3) 201 37 29 201 37 29 tracks/trans. track 3 (2,4,3) 212 26 10 212 26 10 tracks/trans. track 3 (5,1,5) 231 7 0 231 7 0 tracks/trans track 6 (5,1,5) 265189 21775 21743 265189 21775 21743 tracks/long track 27

For each test, it shows the number of points scored, cache parameters, input trace file, and a comparison of simulator and reference simulator results.

Here are some tips and suggestions for working on Part A:

  • Do some initial debugging on small traces liketraces/chokes.traces.

  • The reference simulator is optional-wan argument that provides verbose output, with hits, misses, and discards that occur as a result of each memory access. You don't need to implement this function in yourscsim.ccode, but we strongly recommend that you do so. Helps with debugging by allowing simulator behavior to be compared directly with the reference simulator in reference trace files.

  • We recommend usingat the topa function to parse command line arguments. You need the following header files:

    #Engagement#Engagement#Engagement

    see "guy 3 on top" for details.

  • Each data load (L) or write (S) operation can result in a maximum of one cache failure (assuming the latter uses the write and allocation policy). A data change operation (M) is treated as a load followed by a write to the same address. Therefore, an M operation may result in two cache hits, or a miss and a hit, and a possible rejection.

Work on Part B

We have provided you with an automatic grading program calledtest-trans.c, which checks the correctness and performance of each of the transposition functions registered in the automatic evaluation program.

You can store up to 100 versions of the transpose function in yourtrans.cfile. Each transposed version has the following form:

/* Header comment */ char trans_simple_desc[] = "Log transposition"; void trans_simple(int M, int N, int A[N][M], int B[M][N]) { /* uw gettranssponeerde erde code hier */ }

Register a specific transpose function with autograder by calling the form:

registerTransFunction(trans_simple, trans_simple_desc);

uregistry functionsroutine and„trans.c”. At runtime, the autograder will evaluate each registered transpose function and print the results. Of course, it must be one of the registered functionstransponer_submitfeature you are applying for a loan:

registerTransFunction(transpose_submit, transpose_submit_desc);

Look at the standardtrans.cfunction to see an example of how it works.

The autograder takes the matrix size as input. It has been usedelection doorgenerate a trace of each recorded transpose function. It then evaluates each trace by running the reference simulator in the parameter cache (s=5,E=1,b=5).

For example, to test the recorded transpose functions on a32 × 32board, refreshtest-trans, and then run it with the appropriate values ​​forMIN:

linux > makelinux > ./test-trans -M 32 -N 32 Step 1: Evaluate the correctness of the registered transpose functions: func 0 (Submit transpose): valid: 1func 1 (Transpose simple line scan): valid: 1func 2 (convert column scan) : correctness: 1f. 3 (using zig-zag access pattern): correct: 1. Step 2: Generate memory traces for registered transposed functions. Step 3: Evaluate the performance of the registered transposed functions (s=5, E=1 , b=5)func 0 (send transpose): hits:1766, mises:287, evictions:255func 1 (simple scan transpose): hits: 870, misses: 1183, evictions: 1151func 2 (eng. scan-wise-columns scan transposition): hits: 870, misses: 1183, evictions: 1151func 3 (using zig-zag access pattern): hits: 1076, misses: 977 , permissions : 945 Summary for official submission (func 0): valid = 1 missing = 287

In this example, we recorded four different transpose functionstrans.c. Ztest-transthe program tests each of the registered functions, displays the results for each, and extracts the results for official reporting.

Here are some tips and suggestions for working on Part B.

  • Ztest-transthe program keeps a trace for the functionWin the "trace.f" fileWThese trace files are invaluable debugging tools that can help you understand exactly where the hits and misses come from for any transpose function. To debug a specific function, simply run its trace in the reference simulator with the extended option:

    linux> ./csim-ref -v -s 5 -E 1 -b 5 -t trace.f0S 68312c,1 miss L 683140,8 miss L 683124,4 hit L 683120,4 hit L 603124,4 miss ontruiming S 6431a0 4 mgły...
  • Since your transpose function is evaluated on a directly mapped cache, collision misses are a potential problem. Be aware of the possibility of conflicting code misses, especially along the diagonal. Try to come up with patterns of approach that will reduce the number of these misguided conflicts.

  • Blocking is a useful technique to reduce cache misses. Seehttp://csapp.cs.cmu.edu/public/waside/waside-blocking.pdfFor more information.

All together

We shared one with youdriving programme, called./vozač.pywhich performs a full simulator evaluation and transposes the code. This is the same program that the instructor uses to evaluate your hands. The driver usesbody cliprate your simulator and use ittest-transto evaluate the submitted transpose function on three matrix sizes. It then prints a summary of your results and points earned.

To run the driver, type:

linux> ./sterownik.py

coercion

To submit your work, commit all changes to "csim.c" and "trans.c" and upload them to Github. Remember we willthis isuse one of the other files in your repository to evaluate your work (i.e. we're using a new set of support files), so make sure you don't rely on changes made outside of those files!

References

Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated: 07/26/2023

Views: 5993

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.