clickhouse 实践

目标:

设计方式:

基于硬件设计，内存，cpu，cache，从底层的角度入手，而非单纯的软件角度在外围在处理。。。

解决一个问题，要分场景，不同场景有不同解决方案
- Hash Table
- memcpy
- 甚至对于小规模数据，有一个特化版本, memcpySmallAllowReadWriteOverflow15
- 不排斥新算法，选取实际效果最优的
对于不同数据规模，有不同的实现
- quantileTiming
- uniqCombined
- - 小规模: flat array
  - 中规模: hash table
  - 极大规模: HyperLogLog
- keep in mind low-level details when designing your system
- design based on hardware capabilities
- choose data structures and abstractions based on the needs of the task
- provide specializations for special cases
- try the new, “best” algorithms, that you read about yesterday
- choose algorithm in runtime based on statistics
- benchmark on real datasets
- test for performance regressions in CI
- measure and observe everything
- even in production environment
- and rewrite code all the time

基于硬件的设计，是很大的一个看重点

merge tree，定期合并碎片化文件

存储与计算分离的思考模式？

最开始单纯就是解决group by 问题

算法是最重要，抽象性是其次的，也就是性能是最重要的，普适性并不是一开始考虑的