By Dan Bader — Get free post notificationsHere.
Speed up your Python programs with a powerful yet useful caching technique called "memorization".

In this article, I present a practical way to speed up Python code, the so-calledremember(sometimes also writtenremember):
Memorization is a specific type of caching used as a software optimization technique.
The cache stores the results of operations for later use. For example, your web browser will most likely use the cache to load this tutorial page faster if you visit it again in the future.
So when I say memoization and python, I mean memoizing or caching the output of a function based on its input. Memorize finds its root in "memorandum", which means "to remember".
Memorization allows you to optimize a Python function by caching its results based on the parameters passed to it. Once you remember the function, it calculates its result only once for each set of parameters you call it with. Each call after the first is fetched from the cache quickly.
In this tutorial, you'll see how and when to use this simple yet powerful concept in Python, so you can use it to optimize your own programs, and in some cases, make them run much faster.
Why and when to use memoization in Python programs?
The answer is complex code:
When analyzing code, I look at it in terms of the time it takes to execute and the amount of memory it uses. When I look at code that takes a long time to execute or consumes a lot of memory, it calls that codeDuration.
This is expensive code because it takes a lot of resources, space, and time to run. Running expensive code requires resources from other programs on your computer.
If you want to speed up expensive parts of your Python application, memorization can be a great technique. Let's take a closer look at memorization before we get our hands dirty and implement it ourselves!
All of the code examples I use in this tutorial are written in Python 3, but the general techniques and patterns shown here obviously apply to Python 2 as well.
Memorization algorithm explained
The basic memory algorithm looks like this:
- Set the cache data structure for function results
- Each time the function is called, do one of the following:
- Return cached result if exists;z
- Call a function to calculate the missing result, then update the cache before sending the result back to the caller
If there is enough cache, it basically guarantees that the function results for a given set of function arguments will only be computed once.
Once we get the result from the cache, we don't need to run the cached function again for the same set of inputs. Instead, we can just fetch the result from the cache and return it immediately.
Let's write a brand new note decorator
Then I will implement the above save-as algorithmPython decorator, which is a convenient way to implement a generic functionenvelopesfor python:
A decorator is a function that takes another function as input and has a function as output.
This allows us to implement our memorization algorithm in a general reusable way. Sounds a bit confusing? Don't worry, we'll go step by step and things will become clearer when you see the real code.
Evoremember()
a decorator implementing the above caching algorithm:
Surely to remind(boj): Cache = proverb() Surely remember_function(*argument): I argument ty Cache: give Cache[argument] result = boj(*argument) Cache[argument] = result give result give remember_function
This decorator takes a function and returns awrappedversion of the same function that implements the caching logic (remember_function
).
I use onePython dictionaryas a cache here. In Python, using a key to look up a value in a dictionary is quick. This isproverb
a good choice as a data structure to cache function results.
Each time the decorated function is called, we check to see if the parameters are already cached. If so, the cached result is returned. So instead of recalculating the result, we quickly fetch it from the cache.
Bam, memory!
If the result is not in the cache, we need to update the cache so we can save some time in the future. So we first calculate the missing result, cache it, and then send it back to the caller.
[As I said, decorators are an important concept for any intermediate to advanced Python developer to master. Look at mineVodic for decorating pythonfor a detailed introduction if you want to know more. ]
Let's test our note decorator on a recursive Fibonacci sequence function. First, I define a python function that computes the nthFibonacci-getal:
Surely Fibonacci(N): I N == 0: give 0 Elif N == 1: give 1 give Fibonacci(N - 1) + Fibonacci(N - 2)
TenFibonacci
will serve as an example of an "expensive" calculation. Calculating the nth Fibonacci number in this way hasO(2^n)time complexity: it takes exponential time to complete.
It makes everythingDurationit really works.
Then I'll run a benchmark to get an idea of how computationally expensive this feature is. Python is built-intime
The module allows me to measure the execution time of any python command in seconds.
Here's how I measure execution timeFibonacci
function I just defined usingPython is built-intime
module:
>>> import time>>> time.time("Fibonacci (35)", global=global(), numer=1)5.1729652720096055
As you can see, on my computer it takes about five seconds to calculate the 35th number in the Fibonacci sequence. This is quite a slow and expensive operation.
⏰ Sidebar:time time
Arguments
Python is built-intime
moduleIt allows me to measure the execution time of any python command in seconds. Here is a brief note on the arguments I am going to presenttime time
in the example above:
Since I get this reference point in file aA Python interpreter (REPL) is available.I need to set the environment for this benchmark by setting it
global
to the current set of global variables retrieved from sglobal()
set.(Video) Emulating switch/case Statements in Python with DictionariesStandard
time()
will repeat the benchmark multiple times to make the measured execution time more accurate. But that's why singleFibonacci (35)
the conversation has been going on for a few seconds. I limit the number of executions to one secondnumer
argument. In this experiment, I'm interested in time data, and millisecond accuracy is not required.
Let's see if we can speed this up by using the caching function of our memoize decorator:
>>> remember_fibonacci = to remind(Fibonacci)>>> time.time("remembered_fibonacci(35)", global=global(), numer=1)4,941958484007046
It still takes about five seconds to return the saved function on first run. Still not enough...
We get a similar execution time because the first time I ran the saved function, I cleared the results cacheCold- we started with an empty cache, meaning there were no pre-calculated results to speed up the function call.
Let's run our benchmark a second time:
>>> time.time("remembered_fibonacci(35)", global=global(), numer=1)1.9930012058466673e-06
Let's talk now!
Notificatione-06
suffix at the end of this floating point number? Second rideremember_fibonacci
it only took about 2microsecondscomplete. That's 0.0000019930012058466673 seconds - really nice acceleration!
Instead of recursively calculating the 35th Fibonacci number, ourto remind
the decorator just fetched the result from the cache and immediately returned it, which led to an amazing speedup in the second benchmark.
Checking the cache of function results
To really show how backstage remembering works, I want to show the contents of the result cache of the function used in the previous example:
>>> remember_fibonacci.__close__[0].cell_content{(35,): 9227465}
To explore the cache, I reached "inside".remember_fibonacci
function using it__close__
atrybut. TheCache
dict is the first local variable and is stored in cell 0. I wouldn't recommend using this technique in production code, but it's a nice debugging trick 🙂
As you can see, the cache dictionary maps argument sets to eachremember_fibonacci
the function call that has taken place so far with the result of the function (nth Fibonacci number).
For example,(35,)
is a tuple of arguments forremember_fibonacci(35)
function call and is connected to9227465
which is the 35th Fibonacci number:
>>> Fibonacci(35)9227465
Let's run another little experiment to show how function results caching works. I'll callremember_fibonacci
a few more times to fill up the cache, then we check the contents:
>>> remember_fibonacci(1)1>>> remember_fibonacci(2)1>>> remember_fibonacci(3)2>>> remember_fibonacci(4)3>>> remember_fibonacci(5)5>>> remember_fibonacci.__close__[0].cell_content{(35,): 9227465, (1,): 1, (2,): 1, (3,): 2, (4,): 3, (5,): 5}
As you can see,Cache
the dictionary now also contains cached results for several other entries in the fileremember_fibonacci
function. This allows us to quickly get these results from the cache instead of slowly recalculating them.
A quick warning about the naive caching implementation in ourto remind
decorator:In this example, the size of the cache is not limited, which means that the cache can grow as needed. This is usually not a good idea as it can lead to out of memory errors in programs.
For any type of caching used in programs, it makes sense to limit the amount of data stored in the cache at one time. This is usually achieved by setting a hard limit on the size of the cache, or by defining an expiration policy that removes old items from the cache at some point.
Take into account thatto remind
the function we wrote earlier is a simplified implementation for demonstration purposes. The next part of this tutorial shows how to use a "production-ready" implementation of the memoization algorithm in Python programs.
Memorizing python withfunctools.lru_cache
Now that you know how to implement memoize yourself, I'll show you that you can achieve the same result using pythonfunctools.lru_cache
decorator for extra comfort.
One of the things I like most about Python is that the simplicity and beauty of the syntax go hand in hand with the beauty and simplicity of the philosophy. Python "contains a battery", meaning that Python comes bundled with many commonly used libraries and modulesonly import declaration removed!
I findfunctools.lru_cache
be a perfect example of this philosophy. Thelru_cache
decorator is an easy-to-use python implementation of memoization from the standard library. Once you know when to use itlru_cache
, you can quickly speed up your application with just a few lines of code.
Let's look again at our Fibonacci sequence example. This time I will show you how to add memorization withfunctools.lru_cache
decorator:
import functional aids@functools.lru_cache(largest size=128)Surely Fibonacci(N): I N == 0: give 0 Elif N == 1: give 1 give Fibonacci(N - 1) + Fibonacci(N - 2)
Attentionlargest size
the argument I am referring tolru_cache
to limit the number of items stored in the cache at one time.
I will use it againtime
module to run a simple benchmark so I can get an idea of the performance impact this optimization has:
>>> import time>>> time.time("Fibonacci (35)", global=global(), numer=1)3.056201967410743e-05>>> time.time("Fibonacci (35)", global=global(), numer=1)1.554988557472825e-06
You may wonder why this time we get the result of the first pass much faster. Shouldn't the cache also be "cold" on first run?
The difference is what I used in this example@lru_cache
decorator when defining the function. That is, recursive callsfibonacci()
this time they are also searched in the cache.
Decoratingfibonacci()
function z@lru_cache
decorator which I actually made it fromdynamic programminga solution where each subproblem is solved only once by storing the subproblem solutions and retrieving them from the cache next time.
In this case, it's just a side effect, but I'm sure you're starting to see the beauty and power of using the note decorator and its usefulness for implementing other dynamic programming algorithms.
Why should you preferfunctools.lru_cache
In general, Python's implementation of memoization provides:functools.lru_cache
is much more comprehensive than our ad hoc notes feature as seen inCPython source code.
For example, it offers a useful feature that allows you to retrieve caching statisticscache information
method:
>>> Fibonacci.cache information()Cache information(hits=34, fog=36, largest size=From home, curry size=36)
Again, as seen inCache information
export, Pythonlru_cache()
I remembered the recursive calls tofibonacci()
. If we look at the cache information of the saved function, you'll recognize why it's faster than our version on first run: the cache is hit 34 times.
As I stated earlier,functools.lru_cache
It also allows you to limit the number of results stored in the cache using optionslargest size
parameter. By publishingmax size=None
you can force unlimited cache, which I usually don't recommend.
It is alsoimported
A logical parameter that can be setHe says
to tell the cache to cache function arguments of different types separately. For example,Fibonacci (35)
IFibonacci sequence (35.0)
they would be treated as different calls with different results.
Another useful feature is the ability to reset the result cache at any timecache_clear
method:
>>> Fibonacci.cache_clear()>>> Fibonacci.cache information()Cache information(hits=0, fog=0, largest size=128, curry size=0)
If you want to know more about the intricacies of its operationlru_cache
I advise you to contact an interior designerPython standard library documentation.
In short, you should never use your own remember function. Python is built-inlru_cache()
is immediately available, more extensive and proven.
Alert caching: what can be remembered?
Ideally, you want to memorize functions that are deterministic.
Surely deterministic_addition(X, G): give X + G
Heredeterministic_adder()
is a deterministic function because it always returns the same result for the same pair of parameters. For example, if you pass to functions 2 and 3, the number 5 will always be returned.
Compare this behavior with the followingnon-deterministicfunction:
van date and time import date and timeSurely non-deterministic_adder(X, G): # Check if today is Monday (working day 0) I date and time.NO().working day() == 0: give X + G + X give X + G
This function is not deterministic because the output for a given input varies depending on the day of the week: Running this function on Monday returns the cacheoutdated dataany other day of the week.
In general, I believe that any function that updates a record or returns information that changes over time is a bad choice to remember.
Or howsays Phil Carton:
There are only two hard things in computer science: invalidating caches and naming things.
– Phil Carlton
🙂
Python Memorization: A Brief Summary
This Python tutorial shows how to use memoization to optimize a function by caching its output based on the parameters passed to it.
Once you remember the function, it calculates its result only once for each set of parameters you call it with. Each call after the first is fetched from the cache quickly.
You've seen how to write your own note decorator from scratch, and why you'll probably want to take advantage of Python's built-in featurelru_cache()
proven implementation in the production code:
- Memorization is a software optimization technique that stores and returns the result of a function call based on its parameters.
- If your code meets certain criteria, memorizing documents can be a good method to speed up your application.
- You can enter a comprehensive memorization function,
lru_cache()
from the Python standard libraryfunctional aids
module.
FAQs
How do you cache a function result in Python? ›
Python's functools module comes with the @lru_cache decorator, which gives you the ability to cache the result of your functions using the Least Recently Used (LRU) strategy. This is a simple yet powerful technique that you can use to leverage the power of caching in your code.
How do you memoize a function result in Python? ›To memoize a function in Python, we can use a utility supplied in Python's standard library—the functools. lru_cache decorator. Now, every time you run the decorated function, lru_cache will check for a cached result for the inputs provided. If the result is in the cache, lru_cache will return it.
What is the result of caching functions? ›Caching means that if you call a function several times with the exact same input, the function is only actually run the first time. The result is stored in a cache of some sort (more practical details later!).
What is memorization in Python? ›Memoization is a technique of recording the intermediate results so that it can be used to avoid repeated calculations and speed up the programs. It can be used to optimize the programs that use recursion. In Python, memoization can be done with the help of function decorators.
How to check result type in Python? ›To determine the type of a variable in Python, use the built-in type() function. In Python, everything is an object. As a result, when you use the type() function to print the type of a variable's value to the console, it returns the class type of the object.
How do you convert a function result to a string in Python? ›The str() function takes in any python data type and converts it into a string. But use of the str() is not the only way to do so. This type of conversion can also be done using the “%s” keyword, the .format function or using f-string function.
What does it mean to memoize a function? ›In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.
How does cache memoize work? ›In programming, memoization is an optimization technique that makes applications more efficient and hence faster. It does this by storing computation results in cache, and retrieving that same information from the cache the next time it's needed instead of computing it again.
Should I memorize Python functions? ›In reality, when learning Python or any other programming language you do not have to memorise everything you learn about or remember every piece of syntax. (Most) Developers don't have an entire library stuck in their heads with every piece of code snippet and function they have ever learned.
What is the difference between Memoize and cache in Python? ›Memoization generally implies passing the cache as an additional argument (in an helper function). Memoization will optimize functions that need to compute values several times for a single access. Caching will optimize functions that are called several times with the same parameters.
How to track memory usage of Python program? ›
Working with Python Memory Profiler
You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.
Function caching allows us to cache the return values of a function depending on the arguments. It can save time when an I/O bound function is periodically called with the same arguments.
Does caching use memory? ›Memory cache: This is a small portion of main memory (RAM) set aside as a temporary storage area for frequently accessed data. Memory caching helps to improve the performance of applications by reducing the time it takes to access data from slower storage media like hard disk drives or networks.
What are the two function of cache memory? ›How does cache memory work? Cache memory temporarily stores information, data and programs that are commonly used by the CPU. When data is required, the CPU will automatically turn to cache memory in search of faster data access. This is because server RAM is slower and is further away from the CPU.
How do you save a result in a file in Python? ›To write to a file in Python using a for statement, you can follow these steps: Open the file using the open() function with the appropriate mode ('w' for writing). Use the for statement to loop over the data you want to write to the file. Use the file object's write() method to write the data to the file.
Can a function return data as a result in Python? ›return() in Python
The return() statement, like in other programming languages ends the function call and returns the result to the caller. It is a key component in any function or method in a code which includes the return keyword and the value that is to be returned after that.
A function by definition always returns something. Even if you don't specify it, there is an implicit return None at the end of a python function. You can check for a "return" with the inspect module.
How to capture data in Python? ›- Find the URL that you want to scrape.
- Inspecting the Page.
- Find the data you want to extract.
- Write the code.
- Run the code and extract the data.
- Store the data in the required format.