We settled on gRPC primarily because it allowed us to bring forward our existing protobufs. For our use cases, multiplexing HTTP/2 transport and bi-directional streaming were also attractive.
limits:
dropbox_engine_ocr:
# All RPC methods.
default:
max_concurrency: 32
queue_timeout_ms: 1000
rate_acls:
# OCR clients are unlimited.
ocr: -1
# Nobody else gets to talk to us.
authenticated: 0
unauthenticated: 0
api部分按照规范可以查询,limits,authenticated, 等等
deadlines->超时?
Another common problem that our legacy RPC clients have to solve is implementing custom exponential backoff and jitter on retries
we wanted to solve circuit-breaking in a more generic way. We started by introducing a LIFO queue between the listener and the workpool. 引入类似延迟队列的方式,由size limit和time limit来统一调度/限制
一些practise:
- We switched from RSA 2048 keypairs to ECDSA P-256 to get better performance for signing operations.
- Marshaling and unmarshaling can be expensive when you switch to gRPC. For our Go code, we’ve switched to gogo/protobuf which noticeably decreased CPU usage on our busiest Courier servers.
一些想法:
- Observability is a feature. Having all the metrics and breakdowns out-of-the-box is invaluable during troubleshooting.
- Standardization and uniformity are important. They lower cognitive load, and simplify operations and code maintenance.
- Try to minimize the amount of boilerplate code developers need to write. Codegen is your friend here.
- Make migration as easy as possible. Migration will likely take way more time than the development itself. Also, migration is only finished after cleanup is performed.
- RPC framework can be a place to add infrastructure-wide reliability improvements, e.g. mandatory deadlines, overload protection, etc. Common reliability issues can be identified by aggregating incident reports on a quarterly basis.