- model abstract layer
- model selection layer
abstract layer:
Clipper employs an LRU eviction policy for the prediction cache, using the standard CLOCK [17] cache eviction algorithm.
batch: queues palace queries
Batching increases throughput via two mechanisms.
First, batching amortizes the cost of RPC calls and internal framework overheads such as copying inputs to GPU memory.
Second, batching enables machine learning frameworks to exploit existing data-parallel optimizations by performing batch inference on many inputs simultaneously (e.g., by using the GPU or BLAS acceleration).
delay batch: 为完全提高吞吐来计算
By placing models in separate containers, we ensure that variability in performance and stability of relatively immature state-of-the-art machine learning frameworks does not interfere with the overall availability of Clipper.
select layer:
However, most of these techniques can be expressed with a simple select, combine, and observe API.