Timbala
“A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.”
Requirements:
- Sharding
- Replication
- High availability and throughputfor data ingestion
OpenTSDB
分成多个里程碑
- 单节点,可存,可查
- 多节点的shared, replication部分,以及手动方式的rebalance
- anti-entropy?
- 研究性的,numa,data/cache locally, SSDs, 等等
最终集中在几点:
- Coordination
- keep coordination to a minimum
- avoid coordination bottlenecks
- Indexing
- each node knows what data is
- Consistent view; knows where each piece of data should reside
- On-disk storage format
- Log-structured merge
- LevelDB
- RocksDB
- LMDB
- B-trees and b-tries (bitwise trie structure) for indexes
- Locality-preserving hashes
- Cluster membership
- node in cluster
- could be static动态更好?
- node dead to stop use
- Data placement (replication/sharding)
- Consistent hashing,
- 1/n of data should be displaced/relocated when a single node fails, partition key
- Failure modes
hashicorp’s memberlist
Consistent hashing:
1 | func Hash(key uint64, numBuckets int) int32 { |
这里的测试挺有意思
Unit tests
data distributed tests 涉及到分配的平均性
data displacement tests 迁移的测试
data displacement failure 迁移失败的处理
jump hash gotcha 进入cluster,所有nodes的jump hash算法的调整
Acceptance tests
Integration tests
Benchmarking