- CSF@JHU
- dom
- study plan
- Diagram
- Occupied
- Sources
601.229 (F20): Zadanie 3: Symulator pamięci podręcznej
October 22 update: reconsideration of final results
Tongue:
- Milestone 1: Friday 16 October until 23:00 (no late days)
- Milestone 2: Tuesday, October 27 until 11:00 PM (no late days)
- Last term: Friday, October 30 until 11:00 PM (maximum 2 late days; more by permission only)
Cooperation: Couples (groups of 2 people) or independently. Register your group (or just yourself) viaten linkuntil Wednesday, October 7.
Use in the late days: Up to 2 days delay without special authorization request (both partners must have late days to use). If you are going to use more than 2 days delay, please post it privately on Piazza.
Recognition: This job was originally created byPeter is pleasedDoits CSF version.
This problem focuses on simulation and estimationcaches. We'll give you sometraces of memoryfrom real comparison programs. You run the program forsimulatehow different caches work on these paths. Then use your programs and default prompts to determinebest overall cache configuration.
Lecture part 1
1 year:Friday October 16 to 11 p.m. (10 points)
This entry requires you to start working on your code and load at least one entry into itrange.
It's worth checking that the read traces are working properly. It would also be a good idea to set up some underlying data structures and functions to run the actual cache simulation, even if they are not fully implemented yet.
Assembly, part 2
Year 2:Tuesday, October 27 to 23:00 (10 points)
For this issue, you need to implement LRU cache simulation.
Lecture 3rd part
Year 3:Friday, October 30 to 23:00 (80 points)
All functions must be written and follow the full command specifications.
Evaluation criteria
Milestone results are defined as a combination of effort and code functionality. Your final grade is determined as follows:
- Grateful handling of invalid parameters: 1.5%
- Accurate number of charges: 7.5%
- Exact number of stores: 7.5%
- Charge accuracy hits: 12%
- Accurate missed loads: 12%
- Valid trade hits: 12%
- Accurate Trading Misses: 12%
- Accurate total cycles: 5.5%
- Best cache ratio: 10%
- Design, coding style and contribution: 10%
- Effort shown in milestone 1: 10%
For numerical resultsTotal Cycles
the result should be within ±10% while other results should be accurate.
Be sure to followstyle guidelines.
Your program should run without memory errors or leaks. Memory errors such as an incorrect read or write, or the use of uninitialized memory will result in a deduction of up to 10 points. A memory leak will result in a deduction of up to 5 points.
Programming languages
You can use C or C++ to execute this command. You can usestandardlibrary of your chosen language as much as you want, but it isthis iscan use any additional (custom) library.
One of the benefits of choosing C++ is the ability to use built-in container data structures such ascapable of
,vector
, etc. (Note, however, that a simple and robust implementation of this program is entirely possible using dynamically allocated arrays.) deal with the complexity of the program. Strive for simplicity.
You serve onePlik Makefile
Correct
clean
removes all object files and executables andto do
zmake a sim
compiles and links your program to create an executable calledas soon as possible
Your code should compile nicely using gcc 7.x-Wall -Wextra --pedant
translator flags.Important: yourPlik Makefile
morause these options. if yoursPlik Makefile
workthis isCompile your code with these options, you will lose all design and coding style points.
Deel(a): Cache simulator
You design and implementcache simulatorwhich can be used to study and compare the performance of different cache configurations. Your simulator will readmemory access tracesimulate from standard input what the cache would do based on certain parameters in response to these memory access patterns, and finally generate some summary statistics on standard output. Let's start with the memory access trace file format:
s 0x1fffff50 1l 0x1fffff58 1l 0x1fffff88 6l 0x1fffff90 2l 0x1fffff98 2l 0x200000e0 2l 0x200000e8 2l 0x200000f0 2l 0x200000f8 2l 0x3 0 031f10 3s 0x3004d960 0s 0x3004d968 1s 0x3004caa0 1s 0x3004d970 1s 0x3004d980 6l 0x30000008 1l 0x1ffffff58 4l 0x3004d978 4l 0 x 1ffff68 4l 0x1ffffff68 2s 0x3004d980 9l 0x30000008 1
As you can see, each memory access performed by the program is logged on a separate line. There are three "boxes" separated by white space. The first field is eitherI
zS
depending on whether the processor is "loading" from memory or "writing" to it. The second field is a 32-bit hexadecimal memory address; the0x
at the beginning means "next is hexadecimal" and is not part of the address itself. You canignorethird field for this task.
Note that it must be assumed that each payload or warehouse in the tracking cannot access more than 4 bytes of data, and that no payload or warehouse can access data that spans multiple cache blocks (called "lines").
Your cache simulator is configured with the following cache design parameters given as command line arguments (see below):
- number of sets in cache (positive power-2)
- number of bricks in each set (positive power of 2)
- number of bytes in each block (positive power of 2, minimum 4)
write-assign
zassignment without saving
prescribe
zwrite back
lru
(recently used) orfifo
evictions
Note that certain combinations of these design parameters are responsible for directly mapped, associative, and fully associative caches:
- a cache with n sets of 1 block each is directly mapped
- a cache with n sets of m blocks, each of which is an m-way associative set
- a cache with 1 set of n blocks is fully associative
The smallest cache you should be able to simulate has 1 pool with 1 block of 4 bytes; this cache can only remember one 4 byte memory reference and nothing else; therefore, it can only be useful if successive memory references in the trace lead to the exact same address.You should probably use this little cache for basic health testing.
A few reminders about the other three parameters: Thewrite-assignparameter specifies what happens to amiss the cachewhilestore:
- Do
write-assign
we cache the appropriate memory blockpastthe trade continues - Do
assignment without saving
miss the cache when trading spellsthis ismodify the cache
Note that this parameter interacts with the following. Theprescribeparameter indicates whether the transactionincessantsaves to memoryimmediatelyor not:
- Do
prescribe
store writes to both cache and memory - Do
write back
store cachesJustand indicates a blockcompetes; if the block is later deleted, it must be put back into memory before being replaced
The combination makes no senseassignment without saving
Swrite back
because we couldn't really write to the store cache!
The last parameter is only relevant for associative caches: indirectly allocated caches cannot select a block to delete!
- Do
lru
(last used) Delete a block that was not thereapproachedthe longest - Do
fifo
(first in, first out) we remove the block that wasto the cachethe longest
Your cache simulator should assume that it loads/saves from/to cachetakeSomeCPU cycle; loads/writes to/from memory100CPU cyclesCanadian deerThe 4-byte amount to transfer. There is a lot of cache in the real CPUs you dothis isfor example, they must simulate write buffers or clever ways to fill cache blocks; getting all of the above options right is already a bit demanding, so let's leave it at that.
We expect you to be able to run the simulator as follows:
./csim 256 4 16 write-assign writeback lru < trace file
This would simulate a cache with 256 sets of 4 blocks each (also a 4-way associative cache), with each block containing 16 bytes of memory; the cache allocates writes but not writes (so writes back instead) and discards the last used block as needed. (By the way, note that this cache has a total size of 16384 bytes (16KB) if we ignore the space required for tags and other meta information.)
When the simulation completes, the cache simulator should print the following summary informationExactlyformat given below:
Total loads:grafTotal number of stores:grafLoad Hits:grafCharging errors:grafShopping hits:grafThe shop is missing:grafTotal number of cycles:graf
Zgrafthe value is simply the number of occurrences. As a concrete example, here is an example of a program call on one of the trace examples,gcc.trag
:
./csim 256 4 16 write-assign writeback fifo < gcc.trace
This call should give the following output:
Total Charges: 318,197 Total Signups: 197,486 Load Views: 314,171 Failed Loads: 4,026 Maintenance Fees: 188,047 Failed Loads: 9,439 Total Cycles: 9,845,283
Please note that due to slight differences in reasonable interpretation of the simulator specifications, yourTotal Cycles
the value may vary slightly but should be fairly close. For all other calculations, the simulator output must exactly match the above results.
Mistrongwe encourage you to use Piazza to post routes and simulator results so you can compare your results with those of other students.
Sample prompts
Here are some tips you can use for testing and empirical evaluation:
- hw3_traces/gcc.trace
- hw3_traces/read01.trace
- hw3_traces/read02.trace
- hw3_traces/read03.trace
- hw3_tracks/swim.tracks
- hw3_traces/write01.trag
- hw3_traces/write02.trag
gcc.trag
Iswimming. path
are traces of actual programs, so you should consider using them in your empirical evaluation.
Tips
Your simulation is for hits and misses only, you don't need it at any timeTruecache data; therefore, the trace files do not contain this information at all.
Don't try to implement all the options at once, start writing a simulator that can only run directly mapped caches with write-to-write and non-write mapping. Once that works, develop step by step to keep the rest of the design parameters working. Also regularly check the healthy simulator with simple hand-made traces from which you can still manually infer what the behavior should be.
Keep in mind that the correct number of cycles is only worth 6% of the total job rating. Make sure loads and warehouses are modeled correctly with accurate hit and miss counts before worrying too much about counting cycles.
Part (b): Best Cache, Contribution
In part (b), you use both memory traces and a simulator to determine what cache configuration it hasbest overall performance. You have to consider several properties: success rates, penalties for missing, total cache size (including overhead), etc.Read
, detail what experiments you did (and why!), what results you got (and how!), and what you think is the best cache configuration of all.
Finally, write a short summary of how you divided the work between partners and what each person contributed. This section is not necessary if you worked alone.
Profits
The memory traces above are from a similar programming task done by Steven Swanson at the University of California, San Diego. Thank you Steve!
coercion
Create a zip file with yourPlik Makefile
, source and header files, andRead
file. All files must be in the root directory of the ZIP file. For example, if a zip file is calleddodijeliti3.zip
, bevelunzip -l toewijzing3.zip
can generate the following output:
Archive: Assign3.zip Length Date Time Name--------- ---------- ----- ---- 15225 2020-02-25 12:27 main.c 149 2020-02-25 12:27 Makefile 12075 2020-02-25 12:28 README--------- ------- 27449 3 files
The exact results will almost certainly vary depending on, for example, how you structured your cache simulator program.
Upload your ZIP file torangeIfExercise 3.