DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

Clipper intro

原文链接

image-20210105194528618

分为两个层次

  • model abstract layer
  • model selection layer

可以支持多模型融合?

这个抽象层,如何快速地接入不同的的ml算法?少于25行的代码,通过rpc,进行分发,

abstract layer:

cache:

Clipper employs an LRU eviction policy for the prediction cache, using the standard CLOCK [17] cache eviction algorithm.

batch: queues palace queries

Batching increases throughput via two mechanisms.

First, batching amortizes the cost of RPC calls and internal framework overheads such as copying inputs to GPU memory.

Second, batching enables machine learning frameworks to exploit existing data-parallel optimizations by performing batch inference on many inputs simultaneously (e.g., by using the GPU or BLAS acceleration).

delay batch: 为完全提高吞吐来计算

By placing models in separate containers, we ensure that variability in performance and stability of relatively immature state-of-the-art machine learning frameworks does not interfere with the overall availability of Clipper.

select layer:

However, most of these techniques can be expressed with a simple select, combine, and observe API.