- CSF@JHU
- Dom
- study plan
- Diagram
- Occupied
- Sources
601.229 (S20): Task 3: Cache simulator
Year:Thursday, March 26Monday, March 30 until 23:00
Update 16/3: minor change to the scoring rubric to handle invalid parameters elegantly
Update 18/03: Clarified that load and save traces can access up to 4 bytes
Recognition: This job was originally created byPeter HappyDoits CSF version.
This problem focuses on simulation and estimationcaches. We'll give you sometraces of memoryfrom real comparison programs. You are implementing a programsimulatehow different caches work on these paths. Then use programs and default prompts to figure it outbest overall cache configuration.
Evaluation criteria
Your rating is determined as follows:
- Smooth handling of invalid parameters: 2%
- Accurate load counter: 8%
- Exact number of stores: 8%
- Equipment accuracy hits: 14%
- Improve loading errors: 14%
- Valid Trading Hits: 14%
- Exact trade deficiencies: 14%
- Accurate total cycles: 6%
- Best cache ratio: 10%
- Design, coding style: 10%
For numeric resultsTotal number of cycles
the result should be within ±10% while other results should be accurate.
Be sure to followstyle guidelines.
Your program should run without errors or memory leaks. Memory errors such as incorrect read or write, or use of uninitialized memory will result in a maximum of 10 points deduction. A memory leak will result in a maximum of 5 points deduction.
Programming languages
You can use C or C++ for this command. You can usestandardlibrary of the selected language as much as you like, but that's itthis iscan use any additional (custom) library.
One benefit of choosing C++ is that you can use built-in container data structures likecapable of
,vector
etc. (Note, however, that it is entirely possible to create a simple and robust implementation of this program using dynamically allocated arrays.) Whatever language you use, we strongly recommend modular, well-designed coding and developing data types and functions to deal with program complexity. Strive for simplicity.
You must have oneCreate a file
Correct
clean
removes all object files and executables andto do
zmake a sim
compiles and links your program to create an executable namedas soon as possible
Your code should compile nicely with gcc 7.x-Wall -Wextra -pedant
translator flags.Important: yourCreate a file
moralityuse these options. if yoursCreate a file
workthis isIf you compile your code with these options, you will lose all points for design and coding style.
Deel(a): Cache simulator
You design and implement acache simulatorwhich can be used to study and compare the performance of different cache configurations. Your simulator will readmemory access pathsimulate, based on standard input, what the cache would do based on some parameters in response to these memory access patterns, and finally generate some summary statistics on standard output. Let's start with the memory access trace file format:
s 0x1ffffff50 1l 0x1fffff58 1l 0x1fffff88 6l 0x1fffff90 2l 0x1fffff98 2l 0x200000e0 2l 0x200000e8 2l 0x200000f0 2l 0x200000f8 2l 0x3 0 031f10 3s 0x3004d960 0s 0x3004d968 1s 0x3004caa0 1s 0x3004d970 1s 0x3004d980 6l 0x30000008 1l 0x1fffff58 4l 0x3004d978 4l 0 x 1ff fff68 4l 0x1fffff68 2s 0x3004d980 9l 0x30000008 1
As you can see, each memory access performed by the program is logged on a separate line. There are three "boxes" separated by a white space. The first field is eitherl
zS
depending on whether the processor is "loading" or "writing" memory. The second field is a 32-bit memory address, shown in hexadecimal; the0x
at the beginning means "next hexadecimal" and is not part of the address itself. You canignorethird field for this task.
Note that it must be assumed that each payload or storage in the trace can access no more than 4 bytes of data, and that no payload or storage can access data that spans multiple cache blocks (called "lines").
Your cache simulator is configured with the following cache design parameters given as command line arguments (see below):
- number of sets in cache (positive power-2)
- number of blocks in each set (positive power of 2)
- number of bytes in each block (positive power of 2, minimum 4)
write-assign
zno write allocation
prescribe
zwrite back
lu
(least used) orfifo
evictions
Note that certain combinations of these design parameters consider directly mapped, assembly-associated, and fully associated caches:
- a cache with n sets of 1 block each is directly mapped
- a cache containing n sets of m blocks each being m-set-associated
- a cache with 1 set of n blocks is fully associative
The smallest cache you should be able to simulate has 1 pool with 1 block of 4 bytes; this cache can only remember one 4 byte memory reference and nothing else; therefore, it can only be useful if successive memory references in the trace go to exactly the same address.You should probably use this small cache for basic health checks.
A few reminders about the other three parameters: Thewrite-assignThe parameter specifies what will happen to acache skippedwhilestore:
- Do
write-assign
we cache the appropriate memory blockpastthe trade continues - Do
no write allocation
No cache when trading spellsthis ischange cache
Please note that this parameter works with the following. TheprescribeThe parameter specifies whether the transactionincessantsaves to memoryimmediatelyor not:
- Do
prescribe
store writes to both cache and memory - Do
write back
the store writes to the cacheJustand indicates a blockcompetes; if the block is deleted later, it must be put back into memory before overwriting
Linking doesn't make senseno write allocation
Swrite back
because we couldn't actually cache the store!
The last parameter only applies to associative caches: indirectly allocated caches have no choice of which block to delete!
- Do
lu
(least used) remove a block that wasn'tapproachedthe longest - Do
fifo
(first in, first out) we clear the block that was thereto the cachethe longest
Your cache simulator should assume it loads/writes from/to the cacheSomeCPU cycle; loads/saves from/to memory100CPU cyclesCanadian deerAmount to be transferred of 4 bytes. There are many cache items in the processors you usethis isfor example, they must simulate write buffers or clever ways to fill cache blocks; Getting all of the above options right is a bit tricky to implement, so we'll stop there.
We expect to be able to simulate as follows:
./csim 256 4 16 write-assign writeback lru < some trace file
This would simulate a cache of 256 sets of 4 blocks each (also a 4-way associative cache), with each block containing 16 bytes of memory; the cache allocates writes but not writes (so it rewrites instead) and discards the last used block if necessary. (As a side note, this cache has a total size of 16,384 bytes (16KB) if we ignore the space required for tags and other meta information.)
When the simulation is complete, the cache simulator is expected to print the following summary informationExactlythe format given below:
Total loads:graafTotal stores:graafFull of hits:graafCharging errors:graafShop hits:graafThe shop is missing:graafTotal number of cycles:graaf
Zgraafthe value is simply the number of occurrences. As a concrete example, here is an example of a program call on one of the trace examples,gcc.trag
:
./csim 256 4 16 write-allocation writeback fifo < gcc.trace
This call should produce the following result:
Total Loads: 318197 Total Writes: 197486 Load Hits: 314171 Load Misses: 4026 Loads Saved: 188047 Write Misses: 9439 Total Cycles: 9845283
Please note that due to slight variations in a reasonable interpretation of the simulator specifications, yourTotal number of cycles
the value may be a bit on the low side, but should be pretty close. For all other calculations, the simulator output must exactly match the above results.
Mistrongwe encourage you to use Piazza to publish routes and simulator results so that you can compare your results with those of other students.
Sample hints
Here are some tips you can use for testing and empirical evaluation:
- assignment03_traces/gcc.trace
- assign03_traces/read01.trace
- assign03_tracks/read02.track
- assign03_tracks/read03.track
- alokacja03_traces/swimming.trace
- assign03_traces/write01.traces
- assign03_traces/write02.traces
gcc.trag
Iswimming. path
are traces of real programs, so you should consider using them in your empirical evaluation.
Tips
Your simulation only handles hits and misses, you don't need it at any pointTruecached data; therefore, the trace files do not contain this information at all.
Don't try to implement all the options at once, but start writing a simulator that can only run directly mapped caches with write and non-write maps. Once that works, extend it step by step to make the rest of the design parameters work. Also regularly check out the sound simulator with simple handcrafted tracks from which you can still manually deduce what the behavior should be.
Please note that the correct number of cycles is only 6% of the total number of orders. Before you worry too much about counting cycles, make sure your loads and inventory are modeled correctly with the exact number of hits.
Part (b): The best stash
In part (b), you use both memory traces and a simulator to determine what cache configuration it hasbest overall performance. There are several properties to consider: hit rate, miss penalties, total cache size (including overhead), etc.Read
, describe in detail what experiments you did (and why!), what results you got (and how!), and what you think is the best cache configuration of all.
Profits
The memory traces above are from a similar programming task conducted by Steven Swanson at the University of California, San Diego. Thank you Steven!
coercion
Create a zip file containing the fileCreate a file
, source and header files, andRead
file. All files must be in the top-level folder of the zip file. For example, if the zip file is nameddodijeliti3.zip
, Positionunzip -l toewijzing3.zip
can generate the following output:
Archive: permission3.zip Length Date Time Name------------ ---------- ----- ---- 15225 2020-02-25 12:27 main .c 149 2020-02-25 12:27 Makefile 12075 2020-02-25 12:28 README ---------- ------- 27449 3 files
The exact output will almost certainly vary depending on the structure of the cache simulator.
Upload your zip file toAssessment scopeIfExercise 3.