Timbala
“A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.”
Requirements:
- Sharding
- Replication
- High availability and throughputfor data ingestion
OpenTSDB
分成多个里程碑
- 单节点,可存,可查
- 多节点的shared, replication部分,以及手动方式的rebalance
- anti-entropy?
- 研究性的,numa,data/cache locally, SSDs, 等等
最终集中在几点:
- Coordination
- keep coordination to a minimum
- avoid coordination bottlenecks
- Indexing
- each node knows what data is
- Consistent view; knows where each piece of data should reside
- On-disk storage format
- Log-structured merge
- LevelDB
- RocksDB
- LMDB
- B-trees and b-tries (bitwise trie structure) for indexes
- Locality-preserving hashes
- Cluster membership
- node in cluster
- could be static动态更好?
- node dead to stop use
- Data placement (replication/sharding)
- Consistent hashing,
- 1/n of data should be displaced/relocated when a single node fails, partition key
- Failure modes
hashicorp’s memberlist
Consistent hashing:
func Hash(key uint64, numBuckets int) int32 {
var b int64 = -1
var j int64
for j < int64(numBuckets) {
b = j
key = key*2862933555777941757 + 1
j = int64(float64(b+1) * (float64(int64(1)<<31) / float64((key>>33)+1)))
}
return int32(b)
}
这里的测试挺有意思
-
Unit tests
-
data distributed tests 涉及到分配的平均性
-
data displacement tests 迁移的测试
-
data displacement failure 迁移失败的处理
-
jump hash gotcha 进入cluster,所有nodes的jump hash算法的调整
-
-
Acceptance tests
-
Integration tests
-
Benchmarking