We settled on gRPC primarily because it allowed us to bring forward our existing protobufs. For our use cases, multiplexing HTTP/2 transport and bi-directional streaming were also attractive.
1 | limits: |
api部分按照规范可以查询,limits,authenticated, 等等
deadlines->超时?
Another common problem that our legacy RPC clients have to solve is implementing custom exponential backoff and jitter on retries
we wanted to solve circuit-breaking in a more generic way. We started by introducing a LIFO queue between the listener and the workpool. 引入类似延迟队列的方式,由size limit和time limit来统一调度/限制
一些practise:
- We switched from RSA 2048 keypairs to ECDSA P-256 to get better performance for signing operations.
- Marshaling and unmarshaling can be expensive when you switch to gRPC. For our Go code, we’ve switched to gogo/protobuf which noticeably decreased CPU usage on our busiest Courier servers.
一些想法:
- Observability is a feature. Having all the metrics and breakdowns out-of-the-box is invaluable during troubleshooting.
- Standardization and uniformity are important. They lower cognitive load, and simplify operations and code maintenance.
- Try to minimize the amount of boilerplate code developers need to write. Codegen is your friend here.
- Make migration as easy as possible. Migration will likely take way more time than the development itself. Also, migration is only finished after cleanup is performed.
- RPC framework can be a place to add infrastructure-wide reliability improvements, e.g. mandatory deadlines, overload protection, etc. Common reliability issues can be identified by aggregating incident reports on a quarterly basis.