- study plan
601.229 (S20): Task 3: Cache simulator
Thursday, March 26Monday, March 30 until 23:00
Update 16/3: minor change to the scoring rubric to handle invalid parameters elegantly
Update 18/03: Clarified that load and save traces can access up to 4 bytes
This problem focuses on simulation and estimationcaches. We'll give you sometraces of memoryfrom real comparison programs. You are implementing a programsimulatehow different caches work on these paths. Then use programs and default prompts to figure it outbest overall cache configuration.
Your rating is determined as follows:
- Smooth handling of invalid parameters: 2%
- Accurate load counter: 8%
- Exact number of stores: 8%
- Equipment accuracy hits: 14%
- Improve loading errors: 14%
- Valid Trading Hits: 14%
- Exact trade deficiencies: 14%
- Accurate total cycles: 6%
- Best cache ratio: 10%
- Design, coding style: 10%
For numeric results
Total number of cyclesthe result should be within ±10% while other results should be accurate.
Be sure to followstyle guidelines.
Your program should run without errors or memory leaks. Memory errors such as incorrect read or write, or use of uninitialized memory will result in a maximum of 10 points deduction. A memory leak will result in a maximum of 5 points deduction.
You can use C or C++ for this command. You can usestandardlibrary of the selected language as much as you like, but that's itthis iscan use any additional (custom) library.
One benefit of choosing C++ is that you can use built-in container data structures like
vectoretc. (Note, however, that it is entirely possible to create a simple and robust implementation of this program using dynamically allocated arrays.) Whatever language you use, we strongly recommend modular, well-designed coding and developing data types and functions to deal with program complexity. Strive for simplicity.
You must have one
Create a fileCorrect
cleanremoves all object files and executables and
make a simcompiles and links your program to create an executable named
as soon as possible
Your code should compile nicely with gcc 7.x
-Wall -Wextra -pedanttranslator flags.Important: your
Create a file moralityuse these options. if yours
Create a fileworkthis isIf you compile your code with these options, you will lose all points for design and coding style.
Deel(a): Cache simulator
You design and implement acache simulatorwhich can be used to study and compare the performance of different cache configurations. Your simulator will readmemory access pathsimulate, based on standard input, what the cache would do based on some parameters in response to these memory access patterns, and finally generate some summary statistics on standard output. Let's start with the memory access trace file format:
s 0x1ffffff50 1l 0x1fffff58 1l 0x1fffff88 6l 0x1fffff90 2l 0x1fffff98 2l 0x200000e0 2l 0x200000e8 2l 0x200000f0 2l 0x200000f8 2l 0x3 0 031f10 3s 0x3004d960 0s 0x3004d968 1s 0x3004caa0 1s 0x3004d970 1s 0x3004d980 6l 0x30000008 1l 0x1fffff58 4l 0x3004d978 4l 0 x 1ff fff68 4l 0x1fffff68 2s 0x3004d980 9l 0x30000008 1
As you can see, each memory access performed by the program is logged on a separate line. There are three "boxes" separated by a white space. The first field is either
Sdepending on whether the processor is "loading" or "writing" memory. The second field is a 32-bit memory address, shown in hexadecimal; the
0xat the beginning means "next hexadecimal" and is not part of the address itself. You canignorethird field for this task.
Note that it must be assumed that each payload or storage in the trace can access no more than 4 bytes of data, and that no payload or storage can access data that spans multiple cache blocks (called "lines").
Your cache simulator is configured with the following cache design parameters given as command line arguments (see below):
- number of sets in cache (positive power-2)
- number of blocks in each set (positive power of 2)
- number of bytes in each block (positive power of 2, minimum 4)
no write allocation
lu(least used) or
Note that certain combinations of these design parameters consider directly mapped, assembly-associated, and fully associated caches:
- a cache with n sets of 1 block each is directly mapped
- a cache containing n sets of m blocks each being m-set-associated
- a cache with 1 set of n blocks is fully associative
The smallest cache you should be able to simulate has 1 pool with 1 block of 4 bytes; this cache can only remember one 4 byte memory reference and nothing else; therefore, it can only be useful if successive memory references in the trace go to exactly the same address.You should probably use this small cache for basic health checks.
A few reminders about the other three parameters: Thewrite-assignThe parameter specifies what will happen to acache skippedwhilestore:
write-assignwe cache the appropriate memory blockpastthe trade continues
no write allocationNo cache when trading spellsthis ischange cache
Please note that this parameter works with the following. TheprescribeThe parameter specifies whether the transactionincessantsaves to memoryimmediatelyor not:
prescribestore writes to both cache and memory
write backthe store writes to the cacheJustand indicates a blockcompetes; if the block is deleted later, it must be put back into memory before overwriting
Linking doesn't make sense
no write allocationS
write backbecause we couldn't actually cache the store!
The last parameter only applies to associative caches: indirectly allocated caches have no choice of which block to delete!
lu(least used) remove a block that wasn'tapproachedthe longest
fifo(first in, first out) we clear the block that was thereto the cachethe longest
Your cache simulator should assume it loads/writes from/to the cacheSomeCPU cycle; loads/saves from/to memory100CPU cyclesCanadian deerAmount to be transferred of 4 bytes. There are many cache items in the processors you usethis isfor example, they must simulate write buffers or clever ways to fill cache blocks; Getting all of the above options right is a bit tricky to implement, so we'll stop there.
We expect to be able to simulate as follows:
./csim 256 4 16 write-assign writeback lru < some trace file
This would simulate a cache of 256 sets of 4 blocks each (also a 4-way associative cache), with each block containing 16 bytes of memory; the cache allocates writes but not writes (so it rewrites instead) and discards the last used block if necessary. (As a side note, this cache has a total size of 16,384 bytes (16KB) if we ignore the space required for tags and other meta information.)
When the simulation is complete, the cache simulator is expected to print the following summary informationExactlythe format given below:
Total loads:graafTotal stores:graafFull of hits:graafCharging errors:graafShop hits:graafThe shop is missing:graafTotal number of cycles:graaf
Zgraafthe value is simply the number of occurrences. As a concrete example, here is an example of a program call on one of the trace examples,
./csim 256 4 16 write-allocation writeback fifo < gcc.trace
This call should produce the following result:
Total Loads: 318197 Total Writes: 197486 Load Hits: 314171 Load Misses: 4026 Loads Saved: 188047 Write Misses: 9439 Total Cycles: 9845283
Please note that due to slight variations in a reasonable interpretation of the simulator specifications, your
Total number of cyclesthe value may be a bit on the low side, but should be pretty close. For all other calculations, the simulator output must exactly match the above results.
Mistrongwe encourage you to use Piazza to publish routes and simulator results so that you can compare your results with those of other students.
Here are some tips you can use for testing and empirical evaluation:
swimming. pathare traces of real programs, so you should consider using them in your empirical evaluation.
Your simulation only handles hits and misses, you don't need it at any pointTruecached data; therefore, the trace files do not contain this information at all.
Don't try to implement all the options at once, but start writing a simulator that can only run directly mapped caches with write and non-write maps. Once that works, extend it step by step to make the rest of the design parameters work. Also regularly check out the sound simulator with simple handcrafted tracks from which you can still manually deduce what the behavior should be.
Please note that the correct number of cycles is only 6% of the total number of orders. Before you worry too much about counting cycles, make sure your loads and inventory are modeled correctly with the exact number of hits.
Part (b): The best stash
In part (b), you use both memory traces and a simulator to determine what cache configuration it hasbest overall performance. There are several properties to consider: hit rate, miss penalties, total cache size (including overhead), etc.
Read, describe in detail what experiments you did (and why!), what results you got (and how!), and what you think is the best cache configuration of all.
The memory traces above are from a similar programming task conducted by Steven Swanson at the University of California, San Diego. Thank you Steven!
Create a zip file containing the file
Create a file, source and header files, and
Readfile. All files must be in the top-level folder of the zip file. For example, if the zip file is named
unzip -l toewijzing3.zipcan generate the following output:
Archive: permission3.zip Length Date Time Name------------ ---------- ----- ---- 15225 2020-02-25 12:27 main .c 149 2020-02-25 12:27 Makefile 12075 2020-02-25 12:28 README ---------- ------- 27449 3 files
The exact output will almost certainly vary depending on the structure of the cache simulator.
Upload your zip file toAssessment scopeIfExercise 3.