DevilKing's blog

SkipList Tips

Posted on 2018-06-13 In skipList

Skip lists are data structures that use probabilistic balancing rather than strictly enforced balancing. As a result, the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.

mysql 简介

Posted on 2018-06-13 In mysql

原文链接

MVCC(Multi Version Concurrency Control)

redo log 以及undo log部分

innodb中通过B+树作为索引的数据结构，并且主键所在的索引为ClusterIndex(聚簇索引), ClusterIndex中的叶子节点中保存了对应的数据内容。一个表只能有一个主键，所以只能有一个聚簇索引，如果表没有定义主键，则选择第一个非NULL唯一索引作为聚簇索引，如果还没有则生成一个隐藏id列作为聚簇索引。除了Cluster Index外的索引是Secondary Index(辅助索引)。辅助索引中的叶子节点保存的是聚簇索引的叶子节点的值。

主键的索引结构中,既存储了主键值,又存储了行数据,这种结构称为”聚簇索引”

B-树和B+树最重要的一个区别就是B+树只有叶节点存放数据，其余节点用来索引，而B-树是每个索引节点都会有Data域

B-树（B类树）的特定就是每层节点数目非常多，层数很少，目的就是为了就少磁盘IO次数，当查询数据的时候，最好的情况就是很快找到目标索引，然后读取数据，使用B+树就能很好的完成这个目的，但是B-树的每个节点都有data域（指针），这无疑增大了节点大小，说白了增加了磁盘IO次数（磁盘IO一次读出的数据量大小是固定的，单个数据变大，每次读出的就少，IO次数增多，一次IO多耗时啊！），原因1：B+树除了叶子节点其它节点并不存储数据，节点小，磁盘IO次数就少。原因2：B+树所有的Data域在叶子节点，一般来说都会进行一个优化，就是将所有的叶子节点用指针串起来。这样遍历叶子节点就能获得全部数据。

MyISAM中有两种索引，分别是主索引和辅助索引，在这里面的主索引使用具有唯一性的键值进行创建，而辅助索引中键值可以是相同的。MyISAM分别会存一个索引文件和数据文件。它的主索引是非聚集索引。当我们查询的时候我们找到叶子节点中保存的地址，然后通过地址我们找到所对应的信息。

InnoDB索引和MyISAM最大的区别是它只有一个数据文件，在InnoDB中，表数据文件本身就是按B+Tree组织的一个索引结构，这棵树的叶节点数据域保存了完整的数据记录。所以我们又把它的主索引叫做聚集索引。而它的辅助索引和MyISAM也会有所不同，它的辅助索引都是将主键作为数据域。所以，这样当我们查找的时候通过辅助索引要先找到主键，然后通过主索引再找到对于的主键，得到信息。

MyISAM不支持事务，表锁部分，这样，无法支持事务

原文链接

添加索引的一些经验

字段的值的分区度部分，加上索引也无法锁定特别少量的数据

order以及limit会影响查询的速度，对结果的排序部分

Redis Tips

Posted on 2018-06-12 In redis

原文链接

RDB

单文件，压缩好，

基本流程：Redis fork, 子进程把数据全量写到 rdb 文件，写完后替换老的 rdb 文件。
AOF

性能好，能在日志太大时自动rewrite，日志大，速度慢，

Log rewriting 基本流程: fork, 子进程开写新的 AOF 文件，Redis 在内存中暂存数据，写完后把新的写操作应用到 AOF 文件去，最后替换老的 aof 文件

原文链接

Redis keys are expired in two ways: a passive way, and an active way.

A key is passively expired simply when some client tries to access it, and the key is found to be timed out.

Of course this is not enough as there are expired keys that will never be accessed again. These keys should be expired anyway, so periodically Redis tests a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

Specifically this is what Redis does 10 times per second:

Test 20 random keys from the set of keys with an associated expire.
Delete all the keys found expired.
If more than 25% of keys were expired, start again from step 1.

This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%

懒删除，以及定时删除，按照25%的概率进行

bgsave 对应 RDB？

hash

List(利用双端链表实现的list)

Set

Sorted Set增加score部分，进行有序排列

Redis sorted set的内部使用HashMap和跳跃表(SkipList)来保证数据的存储和有序，HashMap里放的是成员到score的映射，而跳跃表里存放的是所有的成员，排序依据是HashMap里存的score,使用跳跃表的结构可以获得比较高的查找效率，并且在实现上比较简单。

关于skipList部分

HyperLogLog

A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We’ll just call them HLL from now) has seen very few elements.

Performance analysis

Posted on 2018-06-12 In performance

原文链接

coordinated bursty tracing and vertical context injection.

Clean Code

Posted on 2018-06-11 In code

原文链接

不止是code的一些clean，更是对自己一些原则的clean以及保持

Self Improvement

Plan on working 60 hours a week: 40 hours for your employer and 20 hours for you.

The minimum things you must know are:

Design patterns: GOF and POSA books.
Design principles: SOLID and component principles.
Methods: XP, Scrum, Lean, Kanban, Waterfall, Structured Analysis, and Structured Design.
Disciplines: TDD, OOD, Structured Programming, CI/CD, and Pair Programming
Artifacts: UML, DFDs, Structure Charts, Petri Nets, State Transition Diagrams and Tables, Flow Charts, and Decision Tables.

Learn things outside of your comfort zone. Learn Prolog and Forth.

Do a simple 10-minute programming exercise in the morning, to warm up, and in the afternoon, to cool down.