Clipper intro

分为两个层次

model abstract layer
model selection layer

可以支持多模型融合？

这个抽象层，如何快速地接入不同的的ml算法？少于25行的代码，通过rpc，进行分发，

abstract layer:

cache:

Clipper employs an LRU eviction policy for the prediction cache, using the standard CLOCK [17] cache eviction algorithm.

batch: queues palace queries

Batching increases throughput via two mechanisms.

First, batching amortizes the cost of RPC calls and internal framework overheads such as copying inputs to GPU memory.

Second, batching enables machine learning frameworks to exploit existing data-parallel optimizations by performing batch inference on many inputs simultaneously (e.g., by using the GPU or BLAS acceleration).

delay batch: 为完全提高吞吐来计算

By placing models in separate containers, we ensure that variability in performance and stability of relatively immature state-of-the-art machine learning frameworks does not interfere with the overall availability of Clipper.

select layer:

However, most of these techniques can be expressed with a simple select, combine, and observe API.