go performance optimization workflow:
- determine your performance goals
- profile to identify the areas to improve. This can be CPU, heap allocations, or goroutine blocking
- CPU
- heap allocation
- goroutine blocking
- benchmark to determine the speed up your solution will provide using the built-in benchmarking framework (http://golang.org/pkg/testing/)
- profile again afterwards to verify the issue is gone
- use https://godoc.org/golang.org/x/perf/benchstat or https://github.com/codahale/tinystat to verify that a set of timings are ‘sufficiently’ different for an optimization to be worth the added code complexity
The basic rules of the game are:
- minimize CPU usage
- do less work
- this generally means “a faster algorithm”
- but CPU caches and the hidden constants in O() can play tricks on you
- minimize allocations (which leads to less CPU stolen by the GC)
- make your data quick to access
Basic
- be aware of http://accidentallyquadratic.tumblr.com/
Introductory Pofiling
- pprof
- Writing and running (micro)benchmarks
- -cpuprofile
- -memprofile
- -benchmem
- how to read it pprof output
- macro-benchmarks, net/http/pprof
Tracer
- Techniques specific to the architecture running the code
- introduction to CPU caches
- building intuition around cache-lines: sizes, padding, alignment
- false-sharing
- OS tools to view cache-misses
- branch prediction
Heap Allocations
- Understanding escape analysis
Runtime
- cost of calls via interfaces (indirect calls on the CPU level)
- runtime.convT2E / runtime.convT2I
- type assertions vs. type switches
- defer
- special-case map implementations for ints, strings
Common gotchas with the standard library
- time.After() leaks until it fires
- Reusing HTTP connections…
- ….
Alternate implementions
- Popular replacements for standard library packages:
- encoding/json -> ffjson
- net/http -> fasthttp
- regexp -> ragel (or other regular expression package)
- serialization
- encoding/gob
- protobuf
perf(perf2pprof)