DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

dropbox migration to grpc

原文链接

We settled on gRPC primarily because it allowed us to bring forward our existing protobufs. For our use cases, multiplexing HTTP/2 transport and bi-directional streaming were also attractive.

1
2
3
4
5
6
7
8
9
10
11
12
13
limits:
dropbox_engine_ocr:
# All RPC methods.
default:
max_concurrency: 32
queue_timeout_ms: 1000

rate_acls:
# OCR clients are unlimited.
ocr: -1
# Nobody else gets to talk to us.
authenticated: 0
unauthenticated: 0

api部分按照规范可以查询,limits,authenticated, 等等

deadlines->超时?

Another common problem that our legacy RPC clients have to solve is implementing custom exponential backoff and jitter on retries

we wanted to solve circuit-breaking in a more generic way. We started by introducing a LIFO queue between the listener and the workpool. 引入类似延迟队列的方式,由size limit和time limit来统一调度/限制

一些practise:

  • We switched from RSA 2048 keypairs to ECDSA P-256 to get better performance for signing operations.
  • Marshaling and unmarshaling can be expensive when you switch to gRPC. For our Go code, we’ve switched to gogo/protobuf which noticeably decreased CPU usage on our busiest Courier servers.

一些想法:

  1. Observability is a feature. Having all the metrics and breakdowns out-of-the-box is invaluable during troubleshooting.
  2. Standardization and uniformity are important. They lower cognitive load, and simplify operations and code maintenance.
  3. Try to minimize the amount of boilerplate code developers need to write. Codegen is your friend here.
  4. Make migration as easy as possible. Migration will likely take way more time than the development itself. Also, migration is only finished after cleanup is performed.
  5. RPC framework can be a place to add infrastructure-wide reliability improvements, e.g. mandatory deadlines, overload protection, etc. Common reliability issues can be identified by aggregating incident reports on a quarterly basis.