10分钟掌握Python缓存

全文速览

python的不同缓存组件的使用场景和使用样例
cachetools的使用

项目背景

代码检查项目，需要存储每一步检查的中间结果，最终把结果汇总并写入文件中

在中间结果的存储中

可以使用context进行上下文的传递，但是整体对代码改动比较大，违背了开闭原则
也可以利用缓存存储，处理完成之后再统一读缓存并写入文件

在权衡了不同方案后，我决定采用缓存来存储中间结果。接下来，我将探讨 Python 中可用缓存组件。

python缓存分类

决定选择缓存，那么python中都有哪些类型的缓存呢?

1. 使用内存缓存（如 `functools.lru_cache`）

这是最简单的一种缓存方法，适用于小规模的数据缓存。使用 functools.lru_cache 可以对函数结果进行缓存。

from functools import lru_cache

@lru_cache(maxsize=128)

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

2. 使用本地文件缓存（如 `diskcache`）

如果缓存的数据较大，或者需要跨进程共享缓存，可以使用文件系统缓存库，例如 diskcache。

import diskcache as dc

cache = dc.Cache('/tmp/mycache')

@cache.memoize(expire=3600)

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

3. 使用分布式缓存（如 Redis）

对于需要跨多个应用实例共享缓存的数据，可以使用 Redis 这样的分布式缓存系统。

import redis

import pickle

r = redis.StrictRedis(host='localhost', port=6379, db=0)

def expensive_function(param1, param2):

    key = f"{param1}_{param2}"

    cached_result = r.get(key)

    if cached_result:

        return pickle.loads(cached_result)

    result = # 进行一些耗时的操作

    r.set(key, pickle.dumps(result), ex=3600)  # 设置缓存过期时间为1小时

    return result

总结

如果只是简单的小规模缓存，lru_cache 足够；如果需要持久化或分布式缓存，可以考虑使用 diskcache 或 Redis；如果使用了 Web 框架，使用框架自带的缓存功能会更方便。

python内存缓存分类

兼顾速度和成本以及实现的复杂度，最终决定使用内存缓存，在 Python 中，内存缓存组件有许多选择，每种都有其特定的优点和适用场景。以下是一些常见的内存缓存组件：

1. `functools.lru_cache`

lru_cache 是 Python 标准库中的一个装饰器，用于缓存函数的返回结果，基于最近最少使用（LRU）策略。

from functools import lru_cache

@lru_cache(maxsize=128)

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

2. `cachetools`

cachetools 是一个第三方库，提供了多种缓存策略，包括 LRU、LFU、TTL（基于时间的缓存）等。

from cachetools import LRUCache, cached

cache = LRUCache(maxsize=100)

@cached(cache)

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

3. `django.core.cache`

如果使用 Django 框架，Django 自带了缓存框架，支持多种缓存后端，包括内存缓存。

在 settings.py 中配置内存缓存：

CACHES = {

    'default': {

        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',

        'LOCATION': 'unique-snowflake',

    }

}

4. `Flask-Caching`

如果使用 Flask 框架，Flask-Caching 插件可以方便地实现内存缓存。

from flask import Flask

from flask_caching import Cache

app = Flask(__name__)

cache = Cache(app, config={'CACHE_TYPE': 'simple'})

@app.route('/expensive')

@cache.cached(timeout=60)

def expensive_function():

    # 进行一些耗时的操作

    return result

5. `requests_cache`

requests_cache 是一个专门用于缓存 HTTP 请求的库，支持多种缓存后端，包括内存缓存。

import requests

import requests_cache

requests_cache.install_cache('demo_cache', backend='memory', expire_after=3600)

response = requests.get('https://api.example.com/data')

6. `dogpile.cache`

dogpile.cache 是一个更高级的缓存库，提供了灵活的缓存后端和缓存失效策略。

from dogpile.cache import make_region

region = make_region().configure(

    'dogpile.cache.memory',

    expiration_time=3600

)

@region.cache_on_arguments()

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

7. `joblib.Memory`

joblib.Memory 常用于科学计算和数据处理领域，用于缓存函数的计算结果。

from joblib import Memory

memory = Memory(location='/tmp/joblib_cache', verbose=0)

@memory.cache

def expensive_function(param1, param2):

    # 进行一些耗时的操作

    return result

总结

根据具体需求和使用场景选择合适的内存缓存组件。对于简单的缓存需求，可以使用 functools.lru_cache 或 cachetools。对于 Web 应用，django.core.cache 和 Flask-Caching 是不错的选择。对于 HTTP 请求缓存，可以使用 requests_cache。对于科学计算，joblib.Memory 是一个好选择。

cachetools使用

我的项目是一个命令行执行的项目，综合考量最终决定选择cachetools

安装 cachetools

pip install cachetools

实现缓存工具类

from cachetools import LRUCache

from cachetools import Cache

from siada.cr.logger.logger import logger

class CacheUtils:

    """

    缓存工具类

    """

    def __init__(self, cache: Cache = None):

        self.cache = cache if cache else LRUCache(maxsize=100)

    def get_value(self, cache_key: str):

        value = self.cache.get(cache_key, None)

        if value is not None:

            logger.info(f"Cache hit for key: {cache_key}")

        else:

            logger.info(f"Cache miss for key: {cache_key}")

        return value

    def set_key_value(self, cache_key: str, value):

        self.cache[cache_key] = value

        logger.info(f"Set cache key: {cache_key} with value: {value}")

    def set_key_list(self, cache_key: str, value):

        v = self.cache.get(cache_key, None)

        if v is not None:

            v.append(value)

        else:

            self.cache[cache_key] = [value]

    def clear_cache(self):

        self.cache.clear()

# TODO 如果后续生成过程改为多线程并发，需考虑数据竞争问题

cache = CacheUtils()

敬请关注【程序员世杰】