A simple python caching demonstration
Published in · Read for 6 minutes · 16 listopada 2022 r
--
Caching is about storing frequently requested data closer to those who request it. This increases the speed of access to information.
Practical example: You are in a large library for research. You have a subject related to physics and you need to refer to Newton's Principia Mathematica regularly. When you need that book, or if you go to the shelf where the book is, open the book, find the right places, make a note, and go back to your desk. You can also take the book with you and keep it on your desk where you work, or you can even make photocopies of relevant chapters and carry them with you. Isn't another way wiser? The storage space here is your bag or the table where you put your books.
Caching is an optimization technique. We store frequently used data in areas of memory to which we have faster and cheaper access.
The caching process provides speed but is limited in scope. We cannot cache data indefinitely. For example, in the library, the number of books we can put on the table is limited.
Due to this limitation, you must use common sense during the caching process. Caching strategies have been developed for this purpose. Depending on the circumstances required by the situation, either technique may be preferred. Let's look at some strategies.
- FIFO - First-In/First-Out: In FIFO logic, the first written data is discarded first when the capacity is full. Consider an array containing the letters A through G. Let this array be full. When H arrives, we discard A. We add H. This strategy can be used when access to incoming data is more important.
- LIFO – Last In/First Out: Similar to a stack data structure: last in, first out. This type can be used if access to the oldest data is more important.
- MRU - Recently Used: Recently used data is deleted. It can be used if the possibility of re-access increases as the date of use of the data ages.
- LFU - Least Used: Least used data is discarded. It's important to protect data that is being misused.
- LRU - Least Used: The most recently used data has been discarded. More recently used data is more likely to be accessed again.
Python implementation
Let's see how to do itLRU bufferingin Python.
We have a simple function calledBa. I put a print statement to indicate that the function will be executed when called.
def foo(x: int, y: int) -> int:
print(f"run foo with x: {x} y: {y}")
return x**yfoo(3,4)
#execution foo met x: 3 y: 4
#Off: 81
Let's use it nowlru_cacheso-called decoratorfunctional aidsPython package. If you don't know what a decorator is, you can read the article below.
Anyway… I put a decorator on topBafunction. Then I calledBarun many times in a row. As you can see, it just launched this feature for the first time. The function is cached.
iz functools import lru_cache@lru_cache
def foo(x: int, y: int) -> int:
print(f"run foo with x: {x} y: {y}")
return x**y
print(foo(3,4))
print(foo(3,4))
print(foo(3,4))
print(foo(3,4))
print(foo(3,4))
print(foo(3,4))
print(foo(3,4))
„””
run foo with x:3 y:4
81
81
81
81
81
81
81
„””
I changed the passed values. It was first started with new values and then cached.
print(foo(3,4))
print(foo(3,4))print(foo(3.5))
print(foo(3.5))
„””
run foo with x:3 y:4
81
81
run foo with x:3 y:5
243
243
„””
We can define the maximum cache size.
@lru_cache(maxgrootte=2)
def foo(x: int, y: int) -> int:
print(f"run foo with x: {x} y: {y}")
return x**y
Now only 2 values are cached. And the last used value is discarded when the capacity is full.
print(foo(3,4))
#execution foo met x: 3 y: 4
#81
print(foo(3.5))
#execution foo met x: 3 y: 5
#243print(foo(3,4))
#81
print(foo(3.5))
#243
Now let's introduce one more thing;
print(foo(2,4))
#execution foo met x: 2 y: 4
#16
Least used was discarded:print(foo(3,4))
print(foo(3.5))
#243 <- still no cacheprint(foo(3,4))
#execution foo s x:3 y:4 <- restarted because it was dumped earlier
#81
Consider a very expensive operation that returns the same result every time in a different use case. You can cache it.
@lru_cache(maxgrootte=1)
def set_operation() -> str:
print("$$$$$$$$$$$$$$")
return "I am an expensive result"print(total_operation())
print(total_operation())
print(total_operation())
print(total_operation())
print(total_operation())
„””
$$$$$$$$$$$
I am an expensive result
I am an expensive result
I am an expensive result
I am an expensive result
I am an expensive result
„””
The original cache function can be accessed using__wrapped__
@lru_cache(maxgrootte=2)
def foo(x: int, y: int) -> int:
print(f"run foo with x: {x} y: {y}")
return x**yprint(foo(3,4))
i = foo.__wrapped__
print(s)
print(i(3,4))
„””
run foo with x:3 y:4
81
run foo with x:3 y:4
81
„””
A classic example is the Fibonacci function:
def calculate_fibonacci(n: int) -> int:
print("Fibonacci value calculation for ", n)
if n == 0 or n == 1:
return 1
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)calculate_fibonacci(5)
„””
Fibonacci value calculation for 5
Fibonacci value calculation for 4
Calculating the Fibonacci value of the number 3
Fibonacci value calculation for 2
Fibonacci value calculation for 1
Fibonacci value calculation for 0
Fibonacci value calculation for 1
Fibonacci value calculation for 2
Fibonacci value calculation for 1
Fibonacci value calculation for 0
Calculating the Fibonacci value of the number 3
Fibonacci value calculation for 2
Fibonacci value calculation for 1
Fibonacci value calculation for 0
Fibonacci value calculation for 1
„””
Recursion function... To calculate the value for n=5, it is performed many times due to the nature of the Fibonacci sequence.
@lru_cache(maxgrootte=16)
def calculate_fibonacci(n: int) -> int:
print("Fibonacci value calculation for ", n)
if n == 0 or n == 1:
return 1
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)calculate_fibonacci(5)
„””
Fibonacci value calculation for 5
Fibonacci value calculation for 4
Calculating the Fibonacci value of the number 3
Fibonacci value calculation for 2
Fibonacci value calculation for 1
Fibonacci value calculation for 0
„””
Let's compare the calculation time for a large number:
start = time.time()
calculate_fibonacci(35)
print(time.time()-start)„””
no caching: 31.114999055862427
satisfied caching: 0.00023484230041503906
„””
Finally, let's create a custom function that stores data in JSON format for an API request.
uvoz json
import frozendef fetch_api_data(url: str, json_path: str, update_cache: bool = False):
„””
url: The URL of the request
json_path: put json data
update_cache: A boolean value for the update operation
„””
updating_cache:
#if we update there will be nothing at the moment
#so a new json is created
cached_data = Nema
Other:
to try:
from open(json_path, 'r') as file:
recorded_days = json.load(file)
print("Data was collected from local cache!\n")
osim (FileNotFoundError, json.JSONDecodeError) kao e:
print(f"An error occurred while reading the JSON file: {e}\n")
cached_data = Nema
# if there is no data available in the cache
if not saved_data:
#get request data
print("Get new data from URL\n")
saved_data = requests.get(url).json()
from open(json_path, "w") as file:
print("Create a new JSON cache file\n")
json.dump(saved_data, file)
restore cached data
url = "https://dummyjson.com/comments"
json_path = "cachebestand.json"
podaci = fetch_api_data(url, json_path)
print (data)
„””
Error reading JSON file: [Errno 2] File or directory does not exist: 'cachefile.json'
extract new data from url
Create a new JSON cache file
{'comments': [{'id': 1, 'body': 'That's great thinking!', .........
„””
#SECOND CALL
podaci = fetch_api_data(url, json_path)
print (data)
„””
Data is collected from local cache!
{'comments': [{'id': 1, 'content': 'Dit......
„””
If there is a JSON file that caches data, the function will read that file and return the data. If the JSON file does not exist, an update was requested, or a read error occurred, it collects the data by searching for it by URL and writes it to a new JSON file.
As a result, caching is an important optimization method that speeds up data retrieval. Thanks to caching, we can implement faster and more efficient applications.
That's it for now. Thank you for reading.
Read more...
Sources
https://realpython.com/lru-cache-python/
https://www.mathsisifun.com/numbers/fibonacci-sequence.html
https://www.fortinet.com/resources/cyberglossary/what-is-caching
https://docs.python.org/3/library/functools.html
https://www.youtube.com/watch?v=K0Q5twtYxWY