System Design Case Studies

设计 Query Cache（Web 搜索缓存）

为最近查询结果设计 key-value cache

注意：这个文档中的链接会直接指向 System Design 主题索引的相关部分，以避免重复内容。你可以参考链接来了解关键要点、trade-offs 以及替代方案。

第一步：简述用例与约束条件

收集需求，定义 scope。提问来明确 use cases 与 constraints。讨论 assumptions。

没有面试官的情况下，我们先定义 use cases 和 constraints。

用例

仅处理以下用例

用户发起搜索请求，cache hit
用户发起搜索请求，cache miss
服务需要高 availability

约束与假设

关键假设

流量分布不均匀
- 热门 queries 应该常驻 cache
- 需要考虑 TTL / refresh
从 cache 返回需要极低 latency
机器间 latency 要低
cache memory 有限
- 需要决定保留/淘汰策略（LRU）
- 需要缓存数百万 queries
1,000 万用户
每月 100 亿次 queries

估算用量

是否需要做 back-of-the-envelope，请先和面试官确认。

cache 存储 key: query，value: results
- query - 50 bytes
- title - 20 bytes
- snippet - 200 bytes
- 总计：270 bytes
如果 100 亿 queries 都 unique 且全部缓存，需要 2.7 TB/月
- 270 bytes/query * 100 亿/月
- 但 memory 有限，需要 TTL / eviction
QPS 约 4,000

换算参考：

每月约 250 万秒
1 rps ≈ 250 万/月
40 rps ≈ 1 亿/月
400 rps ≈ 10 亿/月

第二步：概要设计

画出 high-level design，包含关键组件。

Imgur

第三步：设计核心组件

深入核心组件的细节。

用例：cache hit

热门 queries 可以放在 Memory Cache（Redis / Memcached）中，降低 read latency，同时避免压垮 Reverse Index Service 和 Document Service。从 memory 顺序读取 1 MB 约 250 微秒，SSD 约 4x，HDD 约 80x。^{<a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1}

由于 cache 容量有限，采用 LRU 淘汰旧条目。

Client → Web Server（reverse proxy）
Web Server → Query API
Query API 处理：
- 解析 query
  - 去 markup
  - 分词
  - 修正 typo
  - 统一大小写
  - 转为 boolean ops
- 查询 Memory Cache
  - hit：更新 LRU 位置并返回
  - miss：
    - 通过 Reverse Index Service 找到匹配文档
    - Document Service 返回 title/snippet
    - 写入 Memory Cache（放到 LRU head）

Cache 实现思路

cache 可用 doubly-linked list + hash table：

linked list：最新放 head，淘汰 tail
hash table：O(1) 查找 node

与面试官确认要写多少 code。

Query API 示例：

class QueryApi(object):

    def __init__(self, memory_cache, reverse_index_service):
        self.memory_cache = memory_cache
        self.reverse_index_service = reverse_index_service

    def parse_query(self, query):
        """Remove markup, break text into terms, deal with typos,
        normalize capitalization, convert to use boolean operations.
        """
        ...

    def process_query(self, query):
        query = self.parse_query(query)
        results = self.memory_cache.get(query)
        if results is None:
            results = self.reverse_index_service.process_search(query)
            self.memory_cache.set(query, results)
        return results

Node 示例：

class Node(object):

    def __init__(self, query, results):
        self.query = query
        self.results = results

LinkedList 示例：

class LinkedList(object):

    def __init__(self):
        self.head = None
        self.tail = None

    def move_to_front(self, node):
        ...

    def append_to_front(self, node):
        ...

    def remove_from_tail(self):
        ...

Cache 示例：

class Cache(object):

    def __init__(self, MAX_SIZE):
        self.MAX_SIZE = MAX_SIZE
        self.size = 0
        self.lookup = {}  # key: query, value: node
        self.linked_list = LinkedList()

    def get(self, query)
        """Get the stored query result from the cache.

        Accessing a node updates its position to the front of the LRU list.
        """
        node = self.lookup[query]
        if node is None:
            return None
        self.linked_list.move_to_front(node)
        return node.results

    def set(self, results, query):
        """Set the result for the given query key in the cache.

        When updating an entry, updates its position to the front of the LRU list.
        If the entry is new and the cache is at capacity, removes the oldest entry
        before the new entry is added.
        """
        node = self.lookup[query]
        if node is not None:
            # Key exists in cache, update the value
            node.results = results
            self.linked_list.move_to_front(node)
        else:
            # Key does not exist in cache
            if self.size == self.MAX_SIZE:
                # Remove the oldest entry from the linked list and lookup
                self.lookup.pop(self.linked_list.tail.query, None)
                self.linked_list.remove_from_tail()
            else:
                self.size += 1
            # Add the new key and value
            new_node = Node(query, results)
            self.linked_list.append_to_front(new_node)
            self.lookup[query] = new_node

什么时候更新 cache

以下变化需要更新 cache：

页面内容变化
页面删除/新增
Page rank 变化

最直接的办法是给 cache entry 设置 TTL。更多取舍见 When to update the cache。

第四步：扩展设计

找出瓶颈并扩展。

Imgur

不要直接跳到最终架构。

建议流程：

Benchmark / Load Test
Profile 找 bottlenecks
针对瓶颈做优化并评估 trade-offs
重复迭代

讨论初始设计的瓶颈以及常见扩展组件，比如 Load Balancer、CDN、Read Replicas 等，并说明 trade-offs。

为了避免重复，这里引用相关 topic：

扩展 Memory Cache

三种典型方案：

每台机器有自己的 cache：简单，但 hit rate 低
每台机器都有完整 cache：简单但浪费内存
cache sharding：复杂但效果好，常用 machine = hash(query) 并配合 consistent hashing

Additional talking points

SQL scaling patterns

NoSQL

Caching

Where to cache
What to cache
- Caching at the database query level
- Caching at the object level
When to update the cache

Asynchronism 和 microservices

Communications

External communication: REST APIs
Internal communication: RPC
Service discovery

Security

参考 Security。

Latency numbers

参考 Latency numbers every programmer should know。

Ongoing

持续 benchmark + monitoring，迭代优化
Scaling 是一个 iterative process

参考与下载

下载 Python 源码（query_cache_snippets.py）