Decorator-Based Caching in Python: From In-Memory to Persistent
If you're iterating on a script that hits an external API or runs a slow computation, every re-run means waiting. A caching decorator wraps any function and short-circuits repeated calls with the stored result — no changes to call sites, no manual cache management.
How Python decorators work
A decorator is a function that takes another function and returns a modified version of it. The @ syntax is just shorthand for reassigning the function name:
def my_decorator(func):
def wrapper():
print("before")
func()
print("after")
return wrapper
@my_decorator
def say_hello():
print("Hello!")
# Equivalent to: say_hello = my_decorator(say_hello)
say_hello()
Output:
before
Hello!
after
The decorator replaces say_hello with wrapper, which calls the original function internally. This is the hook we'll use to intercept calls and serve cached results.
Basic in-memory cache
from functools import wraps
def cached(func):
cache = {}
@wraps(func)
def wrapper(*args, **kwargs):
key = (args, frozenset(kwargs.items()))
if key in cache:
return cache[key]
result = func(*args, **kwargs)
cache[key] = result
return result
return wrapper
The @wraps(func) line preserves the original function's __name__ and __doc__ — without it, every decorated function would appear as wrapper in tracebacks and help() output.
The cache key is a tuple of (args, frozenset(kwargs)). frozenset makes kwargs hashable so they can be used as a dict key regardless of the order they were passed.
@cached
def slow_function(a, b):
import time
time.sleep(5)
return a + b
slow_function(2, 3) # waits 5 seconds → returns 5
slow_function(2, 3) # returns 5 instantly
slow_function(2, 4) # waits 5 seconds → returns 6 (different key)
The cache lives in the wrapper closure, so it persists for the lifetime of the process but disappears when the script exits.
File-based cache for persistence
For scripts you run repeatedly — data pipelines, report generators, anything with expensive upstream calls — you want the cache to survive between runs:
import os
import pickle
import hashlib
import json
from pathlib import Path
from functools import wraps
from json import JSONEncoder
class FallbackEncoder(JSONEncoder):
"""Handles non-JSON-serializable args by falling back to the class name."""
def default(self, o):
return o.__class__.__name__
def file_cached(func):
cache_dir = Path(".cache")
@wraps(func)
def wrapper(*args, **kwargs):
args_str = json.dumps({"args": args, **kwargs}, cls=FallbackEncoder, sort_keys=True)
digest = hashlib.sha1(args_str.encode()).hexdigest()[:12]
cache_file = cache_dir / f"{func.__name__}_{digest}"
if cache_file.exists():
with open(cache_file, "rb") as f:
return pickle.load(f)
result = func(*args, **kwargs)
cache_dir.mkdir(parents=True, exist_ok=True)
with open(cache_file, "wb") as f:
pickle.dump(result, f)
return result
return wrapper
A few design choices worth noting:
- SHA-1 digest as filename — serializing args to JSON and hashing them gives a stable, filesystem-safe key for any argument combination. A 12-character prefix of the hex digest is collision-resistant enough for local caching.
- pickle for storage — pickle handles arbitrary Python objects (dataframes, custom classes, numpy arrays) that JSON can't. For untrusted inputs pickle is unsafe, but for your own script's output it's fine.
FallbackEncoder— if an argument isn't JSON-serializable, falling back to the class name means the key degrades gracefully rather than raising. You lose key uniqueness for complex objects, but it's better than crashing.
@file_cached
def fetch_report(date: str, region: str):
# expensive API call
return api.get_report(date=date, region=region)
fetch_report("2024-01-01", "EMEA") # fetches and writes to .cache/
fetch_report("2024-01-01", "EMEA") # reads from .cache/ — instant
Making the cache directory configurable
Wrapping the decorator in another function lets you pass configuration:
def cached(folder=".cache"):
def decorator(func):
cache_dir = Path(folder)
@wraps(func)
def wrapper(*args, **kwargs):
args_str = json.dumps({"args": args, **kwargs}, cls=FallbackEncoder, sort_keys=True)
digest = hashlib.sha1(args_str.encode()).hexdigest()[:12]
cache_file = cache_dir / f"{func.__name__}_{digest}"
if cache_file.exists():
with open(cache_file, "rb") as f:
return pickle.load(f)
result = func(*args, **kwargs)
cache_dir.mkdir(parents=True, exist_ok=True)
with open(cache_file, "wb") as f:
pickle.dump(result, f)
return result
return wrapper
return decorator
Usage with and without arguments:
@cached() # uses default folder ".cache"
def default_cached_fn(x):
...
@cached(folder="data/.cache") # custom folder
def custom_cached_fn(x):
...
When to use this vs functools.lru_cache
If you only need in-memory caching, Python's built-in functools.lru_cache is better than rolling your own — it's implemented in C and handles LRU eviction:
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive(n):
...
The file-based approach is useful when:
- Results need to survive process restarts
- Computations take minutes rather than seconds (so the pickle I/O overhead is irrelevant)
- You're iterating on downstream logic and don't want to re-fetch upstream data every run