DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

clickhouse 实践

原文链接

PPT

目标:

  • filter and aggregate as fast as possible
  • GROUP BY

设计方式:

  • NOT top-down

  • 基于硬件能力设计

    • we will do GROUP BY in memory
    • will put all data in a hash table
    • if the hash table is large, it will not fit in L3 cache of CPU
    • if the values of GROUP BY keys are not distributed locally, then we have L3 cache miss for every row in a table
    • L3 cache miss has 70..100 ns latency
    • How many keys per second we can process?

基于硬件设计,内存,cpu,cache,从底层的角度入手,而非单纯的软件角度在外围在处理。。。

  • 解决一个问题,要分场景,不同场景有不同解决方案

    • Hash Table
    • memcpy
    • 甚至对于小规模数据,有一个特化版本, memcpySmallAllowReadWriteOverflow15
    • 不排斥新算法,选取实际效果最优的
  • 对于不同数据规模,有不同的实现

    • quantileTiming

    • uniqCombined

      • 小规模: flat array
      • 中规模: hash table
      • 极大规模: HyperLogLog
    • keep in mind low-level details when designing your system
    • design based on hardware capabilities
    • choose data structures and abstractions based on the needs of the task
    • provide specializations for special cases
    • try the new, “best” algorithms, that you read about yesterday
    • choose algorithm in runtime based on statistics
    • benchmark on real datasets
    • test for performance regressions in CI
    • measure and observe everything
    • even in production environment
    • and rewrite code all the time

基于硬件的设计,是很大的一个看重点

merge tree,定期合并碎片化文件

image-20220118195126518

存储与计算分离的思考模式?

最开始单纯就是解决group by 问题

算法是最重要,抽象性是其次的,也就是性能是最重要的,普适性并不是一开始考虑的