Memorization in Python: Results of the caching function - dbader.org (2023)

By Dan Bader — Get free post notificationsHere.

Speed ​​up your Python programs with a powerful yet useful caching technique called "memorization".

Memorization in Python: Results of the caching function - dbader.org (1)

In this article, I present a practical way to speed up Python code, the so-calledremember(sometimes also writtenremember):

Memorization is a specific type of caching used as a software optimization technique.

The cache stores the results of operations for later use. For example, your web browser will most likely use the cache to load this tutorial page faster if you visit it again in the future.

So when I say memoization and python, I mean memoizing or caching the output of a function based on its input. Memorize finds its root in "memorandum", which means "to remember".

Memorization allows you to optimize a Python function by caching its results based on the parameters passed to it. Once you remember the function, it calculates its result only once for each set of parameters you call it with. Each call after the first is fetched from the cache quickly.

In this tutorial, you'll see how and when to use this simple yet powerful concept in Python, so you can use it to optimize your own programs, and in some cases, make them run much faster.

Why and when to use memoization in Python programs?

The answer is complex code:

When analyzing code, I look at it in terms of the time it takes to execute and the amount of memory it uses. When I look at code that takes a long time to execute or consumes a lot of memory, it calls that codeDuration.

This is expensive code because it takes a lot of resources, space, and time to run. Running expensive code requires resources from other programs on your computer.

If you want to speed up expensive parts of your Python application, memorization can be a great technique. Let's take a closer look at memorization before we get our hands dirty and implement it ourselves!

All of the code examples I use in this tutorial are written in Python 3, but the general techniques and patterns shown here obviously apply to Python 2 as well.

Memorization algorithm explained

The basic memory algorithm looks like this:

  1. Set the cache data structure for function results
  2. Each time the function is called, do one of the following:
    • Return cached result if exists;z
    • Call a function to calculate the missing result, then update the cache before sending the result back to the caller

If there is enough cache, it basically guarantees that the function results for a given set of function arguments will only be computed once.

Once we get the result from the cache, we don't need to run the cached function again for the same set of inputs. Instead, we can just fetch the result from the cache and return it immediately.

Let's write a brand new note decorator

Then I will implement the above save-as algorithmPython decorator, which is a convenient way to implement a generic functionenvelopesfor python:

A decorator is a function that takes another function as input and has a function as output.

This allows us to implement our memorization algorithm in a general reusable way. Sounds a bit confusing? Don't worry, we'll go step by step and things will become clearer when you see the real code.

(Video) Inspecting Python Modules and Classes With "dir()" And "help()"

Evoremember()a decorator implementing the above caching algorithm:

Surely to remind(boj): Cache = proverb() Surely remember_function(*argument): I argument ty Cache: give Cache[argument] result = boj(*argument) Cache[argument] = result give result give remember_function

This decorator takes a function and returns awrappedversion of the same function that implements the caching logic (remember_function).

I use onePython dictionaryas a cache here. In Python, using a key to look up a value in a dictionary is quick. This isproverba good choice as a data structure to cache function results.

Each time the decorated function is called, we check to see if the parameters are already cached. If so, the cached result is returned. So instead of recalculating the result, we quickly fetch it from the cache.

Bam, memory!

If the result is not in the cache, we need to update the cache so we can save some time in the future. So we first calculate the missing result, cache it, and then send it back to the caller.

[As I said, decorators are an important concept for any intermediate to advanced Python developer to master. Look at mineVodic for decorating pythonfor a detailed introduction if you want to know more. ]

Let's test our note decorator on a recursive Fibonacci sequence function. First, I define a python function that computes the nthFibonacci-getal:

Surely Fibonacci(N): I N == 0: give 0 Elif N == 1: give 1 give Fibonacci(N - 1) + Fibonacci(N - 2)

TenFibonacciwill serve as an example of an "expensive" calculation. Calculating the nth Fibonacci number in this way hasO(2^n)time complexity: it takes exponential time to complete.

It makes everythingDurationit really works.

Then I'll run a benchmark to get an idea of ​​how computationally expensive this feature is. Python is built-intimeThe module allows me to measure the execution time of any python command in seconds.

Here's how I measure execution timeFibonaccifunction I just defined usingPython is built-intimemodule:

>>> import time>>> time.time("Fibonacci (35)", global=global(), numer=1)5.1729652720096055

As you can see, on my computer it takes about five seconds to calculate the 35th number in the Fibonacci sequence. This is quite a slow and expensive operation.

⏰ Sidebar:time timeArguments

Python is built-intimemoduleIt allows me to measure the execution time of any python command in seconds. Here is a brief note on the arguments I am going to presenttime timein the example above:

  • Since I get this reference point in file aA Python interpreter (REPL) is available.I need to set the environment for this benchmark by setting itglobalto the current set of global variables retrieved from sglobal()set.

    (Video) Emulating switch/case Statements in Python with Dictionaries

  • Standardtime()will repeat the benchmark multiple times to make the measured execution time more accurate. But that's why singleFibonacci (35)the conversation has been going on for a few seconds. I limit the number of executions to one secondnumerargument. In this experiment, I'm interested in time data, and millisecond accuracy is not required.

Let's see if we can speed this up by using the caching function of our memoize decorator:

>>> remember_fibonacci = to remind(Fibonacci)>>> time.time("remembered_fibonacci(35)", global=global(), numer=1)4,941958484007046

It still takes about five seconds to return the saved function on first run. Still not enough...

We get a similar execution time because the first time I ran the saved function, I cleared the results cacheCold- we started with an empty cache, meaning there were no pre-calculated results to speed up the function call.

Let's run our benchmark a second time:

>>> time.time("remembered_fibonacci(35)", global=global(), numer=1)1.9930012058466673e-06

Let's talk now!

Notificatione-06suffix at the end of this floating point number? Second rideremember_fibonacciit only took about 2microsecondscomplete. That's 0.0000019930012058466673 seconds - really nice acceleration!

Instead of recursively calculating the 35th Fibonacci number, ourto remindthe decorator just fetched the result from the cache and immediately returned it, which led to an amazing speedup in the second benchmark.

Checking the cache of function results

To really show how backstage remembering works, I want to show the contents of the result cache of the function used in the previous example:

>>> remember_fibonacci.__close__[0].cell_content{(35,): 9227465}

To explore the cache, I reached "inside".remember_fibonaccifunction using it__close__atrybut. TheCachedict is the first local variable and is stored in cell 0. I wouldn't recommend using this technique in production code, but it's a nice debugging trick 🙂

As you can see, the cache dictionary maps argument sets to eachremember_fibonaccithe function call that has taken place so far with the result of the function (nth Fibonacci number).

For example,(35,)is a tuple of arguments forremember_fibonacci(35)function call and is connected to9227465which is the 35th Fibonacci number:

>>> Fibonacci(35)9227465

Let's run another little experiment to show how function results caching works. I'll callremember_fibonaccia few more times to fill up the cache, then we check the contents:

>>> remember_fibonacci(1)1>>> remember_fibonacci(2)1>>> remember_fibonacci(3)2>>> remember_fibonacci(4)3>>> remember_fibonacci(5)5>>> remember_fibonacci.__close__[0].cell_content{(35,): 9227465, (1,): 1, (2,): 1, (3,): 2, (4,): 3, (5,): 5}

As you can see,Cachethe dictionary now also contains cached results for several other entries in the fileremember_fibonaccifunction. This allows us to quickly get these results from the cache instead of slowly recalculating them.

(Video) Improving the Learning Experience on Real Python | Real Python Podcast #53

A quick warning about the naive caching implementation in ourto reminddecorator:In this example, the size of the cache is not limited, which means that the cache can grow as needed. This is usually not a good idea as it can lead to out of memory errors in programs.

For any type of caching used in programs, it makes sense to limit the amount of data stored in the cache at one time. This is usually achieved by setting a hard limit on the size of the cache, or by defining an expiration policy that removes old items from the cache at some point.

Take into account thatto remindthe function we wrote earlier is a simplified implementation for demonstration purposes. The next part of this tutorial shows how to use a "production-ready" implementation of the memoization algorithm in Python programs.

Memorizing python withfunctools.lru_cache

Now that you know how to implement memoize yourself, I'll show you that you can achieve the same result using pythonfunctools.lru_cachedecorator for extra comfort.

One of the things I like most about Python is that the simplicity and beauty of the syntax go hand in hand with the beauty and simplicity of the philosophy. Python "contains a battery", meaning that Python comes bundled with many commonly used libraries and modulesonly import declaration removed!

I findfunctools.lru_cachebe a perfect example of this philosophy. Thelru_cachedecorator is an easy-to-use python implementation of memoization from the standard library. Once you know when to use itlru_cache, you can quickly speed up your application with just a few lines of code.

Let's look again at our Fibonacci sequence example. This time I will show you how to add memorization withfunctools.lru_cachedecorator:

import functional aids@functools.lru_cache(largest size=128)Surely Fibonacci(N): I N == 0: give 0 Elif N == 1: give 1 give Fibonacci(N - 1) + Fibonacci(N - 2)

Attentionlargest sizethe argument I am referring tolru_cacheto limit the number of items stored in the cache at one time.

I will use it againtimemodule to run a simple benchmark so I can get an idea of ​​the performance impact this optimization has:

>>> import time>>> time.time("Fibonacci (35)", global=global(), numer=1)3.056201967410743e-05>>> time.time("Fibonacci (35)", global=global(), numer=1)1.554988557472825e-06

You may wonder why this time we get the result of the first pass much faster. Shouldn't the cache also be "cold" on first run?

The difference is what I used in this example@lru_cachedecorator when defining the function. That is, recursive callsfibonacci()this time they are also searched in the cache.

Decoratingfibonacci()function z@lru_cachedecorator which I actually made it fromdynamic programminga solution where each subproblem is solved only once by storing the subproblem solutions and retrieving them from the cache next time.

In this case, it's just a side effect, but I'm sure you're starting to see the beauty and power of using the note decorator and its usefulness for implementing other dynamic programming algorithms.

Why should you preferfunctools.lru_cache

In general, Python's implementation of memoization provides:functools.lru_cacheis much more comprehensive than our ad hoc notes feature as seen inCPython source code.

For example, it offers a useful feature that allows you to retrieve caching statisticscache informationmethod:

>>> Fibonacci.cache information()Cache information(hits=34, fog=36, largest size=From home, curry size=36)

Again, as seen inCache informationexport, Pythonlru_cache()I remembered the recursive calls tofibonacci(). If we look at the cache information of the saved function, you'll recognize why it's faster than our version on first run: the cache is hit 34 times.

(Video) Python Code Review: Refactoring a Web Scraper, PEP 8 Style Guide Compliance, requirements.txt

As I stated earlier,functools.lru_cacheIt also allows you to limit the number of results stored in the cache using optionslargest sizeparameter. By publishingmax size=Noneyou can force unlimited cache, which I usually don't recommend.

It is alsoimportedA logical parameter that can be setHe saysto tell the cache to cache function arguments of different types separately. For example,Fibonacci (35)IFibonacci sequence (35.0)they would be treated as different calls with different results.

Another useful feature is the ability to reset the result cache at any timecache_clearmethod:

>>> Fibonacci.cache_clear()>>> Fibonacci.cache information()Cache information(hits=0, fog=0, largest size=128, curry size=0)

If you want to know more about the intricacies of its operationlru_cacheI advise you to contact an interior designerPython standard library documentation.

In short, you should never use your own remember function. Python is built-inlru_cache()is immediately available, more extensive and proven.

Alert caching: what can be remembered?

Ideally, you want to memorize functions that are deterministic.

Surely deterministic_addition(X, G): give X + G

Heredeterministic_adder()is a deterministic function because it always returns the same result for the same pair of parameters. For example, if you pass to functions 2 and 3, the number 5 will always be returned.

Compare this behavior with the followingnon-deterministicfunction:

van date and time import date and timeSurely non-deterministic_adder(X, G): # Check if today is Monday (working day 0) I date and time.NO().working day() == 0: give X + G + X give X + G

This function is not deterministic because the output for a given input varies depending on the day of the week: Running this function on Monday returns the cacheoutdated dataany other day of the week.

In general, I believe that any function that updates a record or returns information that changes over time is a bad choice to remember.

Or howsays Phil Carton:

There are only two hard things in computer science: invalidating caches and naming things.

– Phil Carlton

🙂

Python Memorization: A Brief Summary

This Python tutorial shows how to use memoization to optimize a function by caching its output based on the parameters passed to it.

Once you remember the function, it calculates its result only once for each set of parameters you call it with. Each call after the first is fetched from the cache quickly.

(Video) Programming is too abstract and I'm getting frustrated... 😠

You've seen how to write your own note decorator from scratch, and why you'll probably want to take advantage of Python's built-in featurelru_cache()proven implementation in the production code:

  • Memorization is a software optimization technique that stores and returns the result of a function call based on its parameters.
  • If your code meets certain criteria, memorizing documents can be a good method to speed up your application.
  • You can enter a comprehensive memorization function,lru_cache()from the Python standard libraryfunctional aidsmodule.

FAQs

How do you cache a function result in Python? ›

Python's functools module comes with the @lru_cache decorator, which gives you the ability to cache the result of your functions using the Least Recently Used (LRU) strategy. This is a simple yet powerful technique that you can use to leverage the power of caching in your code.

How do you memoize a function result in Python? ›

To memoize a function in Python, we can use a utility supplied in Python's standard library—the functools. lru_cache decorator. Now, every time you run the decorated function, lru_cache will check for a cached result for the inputs provided. If the result is in the cache, lru_cache will return it.

What is the result of caching functions? ›

Caching means that if you call a function several times with the exact same input, the function is only actually run the first time. The result is stored in a cache of some sort (more practical details later!).

What is memorization in Python? ›

Memoization is a technique of recording the intermediate results so that it can be used to avoid repeated calculations and speed up the programs. It can be used to optimize the programs that use recursion. In Python, memoization can be done with the help of function decorators.

How to check result type in Python? ›

To determine the type of a variable in Python, use the built-in type() function. In Python, everything is an object. As a result, when you use the type() function to print the type of a variable's value to the console, it returns the class type of the object.

How do you convert a function result to a string in Python? ›

The str() function takes in any python data type and converts it into a string. But use of the str() is not the only way to do so. This type of conversion can also be done using the “%s” keyword, the .format function or using f-string function.

What does it mean to memoize a function? ›

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.

How does cache memoize work? ›

In programming, memoization is an optimization technique that makes applications more efficient and hence faster. It does this by storing computation results in cache, and retrieving that same information from the cache the next time it's needed instead of computing it again.

Should I memorize Python functions? ›

In reality, when learning Python or any other programming language you do not have to memorise everything you learn about or remember every piece of syntax. (Most) Developers don't have an entire library stuck in their heads with every piece of code snippet and function they have ever learned.

What is the difference between Memoize and cache in Python? ›

Memoization generally implies passing the cache as an additional argument (in an helper function). Memoization will optimize functions that need to compute values several times for a single access. Caching will optimize functions that are called several times with the same parameters.

How to track memory usage of Python program? ›

Working with Python Memory Profiler

You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.

What is function caching in Python? ›

Function caching allows us to cache the return values of a function depending on the arguments. It can save time when an I/O bound function is periodically called with the same arguments.

Does caching use memory? ›

Memory cache: This is a small portion of main memory (RAM) set aside as a temporary storage area for frequently accessed data. Memory caching helps to improve the performance of applications by reducing the time it takes to access data from slower storage media like hard disk drives or networks.

What are the two function of cache memory? ›

How does cache memory work? Cache memory temporarily stores information, data and programs that are commonly used by the CPU. When data is required, the CPU will automatically turn to cache memory in search of faster data access. This is because server RAM is slower and is further away from the CPU.

How do you save a result in a file in Python? ›

To write to a file in Python using a for statement, you can follow these steps: Open the file using the open() function with the appropriate mode ('w' for writing). Use the for statement to loop over the data you want to write to the file. Use the file object's write() method to write the data to the file.

Can a function return data as a result in Python? ›

return() in Python

The return() statement, like in other programming languages ends the function call and returns the result to the caller. It is a key component in any function or method in a code which includes the return keyword and the value that is to be returned after that.

How do you check if a function is returning in Python? ›

A function by definition always returns something. Even if you don't specify it, there is an implicit return None at the end of a python function. You can check for a "return" with the inspect module.

How to capture data in Python? ›

To extract data using web scraping with python, you need to follow these basic steps:
  1. Find the URL that you want to scrape.
  2. Inspecting the Page.
  3. Find the data you want to extract.
  4. Write the code.
  5. Run the code and extract the data.
  6. Store the data in the required format.
Aug 8, 2023

Videos

1. Introducing RIG – the Reactive Interaction Gateway by Kevin Bader
(Devoxx)
2. ©MICHAEL BADER Technical Univ.of Munich | Towards Exascale Hyperbolic PDE Engines...| SCFE20 invited
(ICM University of Warsaw)
3. How to find great Python packages on PyPI, the Python Package Repository
(Real Python)
4. Faster Code Execution in Python 3.11 and Improved Tracebacks
(Real Python)
5. ©MICHAEL BADER Technical Univ.of Munich | Towards Exascale Hyperbolic PDE Engines...| SCFE20 invited
(ICM University of Warsaw)
6. Workshop: Building SAAS Products With Flask - Sumukh Sridhara
(FlaskCon)

References

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated: 09/05/2023

Views: 6049

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.