dropbox migration to grpc

原文链接

We settled on gRPC primarily because it allowed us to bring forward our existing protobufs. For our use cases, multiplexing HTTP/2 transport and bi-directional streaming were also attractive.

limits:
  dropbox_engine_ocr:
    # All RPC methods.
    default:
      max_concurrency: 32
      queue_timeout_ms: 1000

      rate_acls:
        # OCR clients are unlimited.
        ocr: -1
        # Nobody else gets to talk to us.
        authenticated: 0
        unauthenticated: 0

api部分按照规范可以查询，limits，authenticated, 等等

deadlines->超时？

Another common problem that our legacy RPC clients have to solve is implementing custom exponential backoff and jitter on retries

we wanted to solve circuit-breaking in a more generic way. We started by introducing a LIFO queue between the listener and the workpool. 引入类似延迟队列的方式，由size limit和time limit来统一调度/限制

一些practise:

We switched from RSA 2048 keypairs to ECDSA P-256 to get better performance for signing operations.
Marshaling and unmarshaling can be expensive when you switch to gRPC. For our Go code, we’ve switched to gogo/protobuf which noticeably decreased CPU usage on our busiest Courier servers.

一些想法：

Observability is a feature. Having all the metrics and breakdowns out-of-the-box is invaluable during troubleshooting.
Standardization and uniformity are important. They lower cognitive load, and simplify operations and code maintenance.
Try to minimize the amount of boilerplate code developers need to write. Codegen is your friend here.
Make migration as easy as possible. Migration will likely take way more time than the development itself. Also, migration is only finished after cleanup is performed.
RPC framework can be a place to add infrastructure-wide reliability improvements, e.g. mandatory deadlines, overload protection, etc. Common reliability issues can be identified by aggregating incident reports on a quarterly basis.