DevilKing's blog

Git 内部原理

Posted on 2018-05-22 In git

.git目录下

$ ls
HEAD
branches/
config
description
hooks/
index
info/
objects/
refs/

新版本Git不再使用branches目录

有四個重要的檔案或目錄：HEAD 及 index 檔，objects 及 refs 目錄。這些是 Git 的核心部分。objects 目錄存放所有資料內容，refs 目錄存放指向資料 (分支) 的提交物件的指標，HEAD 檔指向目前分支，index 檔保存了暫存區域資訊

通過 cat-file 命令可以將資料內容取回。該命令是查看 Git 對象的瑞士軍刀。傳入 -p 參數可以讓 cat-file 命令輸出資料內容的類型：

1 2	$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4 test content

Git 以一種類似 UNIX 檔案系統但更簡單的方式來保存內容。所有內容以 tree 或 blob 物件保存，其中 tree 物件對應於 UNIX 中的目錄，blob 物件則大致對應於 inodes 或檔案內容。一個單獨的 tree 物件包含一條或多條 tree 記錄，每一條記錄含有一個指向 blob 或子 tree 物件的 SHA-1 指標，並附有該物件的許可權模式 (mode)、類型和檔案名資訊。

MySQL order by limit

Posted on 2018-05-15 In mysql

原文链接

CREATE TABLE `t_tbl_test_time_08` (
  `f_some_id` int(11) unsigned DEFAULT '0',
  `f_qiye_id` int(11) DEFAULT '0',
  `f_type` tinyint(3) DEFAULT '0' COMMENT '有效联系类型 1: QQ联系，2:拨打电话，3:发送邮件，4:发送短信，5:添加跟进记录，6:拜访客户，7:EC联系，8:更新客户阶段',
  `f_contact_time` timestamp NULL DEFAULT '1970-01-01 16:00:01',
  UNIQUE KEY `some_qiye_type` (`f_some_id`,`f_qiye_id`,`f_type`),
  KEY `f_contact_time` (`f_contact_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

mysql> explain extended select f_some_id from d_ec_some1.t_tbl_test_time_08
where f_qiye_id=5077665 and f_type=9
and f_contact_time > '2017-10-17 14:23:49' and f_contact_time < '2017-10-17 14:23:53'
order by f_some_id limit 300;
+----+-------------+-----------------------+-------+----------------+---------------+---------+------+-------+----------+-------------+
| id | select_type | table                 | type  | possible_keys  | key           | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-----------------------+-------+----------------+---------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | t_tbl_test_time_08    | index | f_contact_time | some_qiye_type | 12     | NULL | 16032 |  2248.49 | Using where |
+----+-------------+-----------------------+-------+----------------+---------------+---------+------+-------+----------+-------------+
1 row in set
 
-- 指定一个索引
mysql> explain extended select f_some_id from d_ec_some1.t_tbl_test_time_08  use index(f_contact_time)
where f_qiye_id=5077665 and f_type=9
and f_contact_time > '2017-10-17 14:23:49' and f_contact_time < '2017-10-17 14:23:53'
order by f_some_id limit 300;
+----+-------------+-----------------------+-------+----------------+----------------+---------+------+--------+----------+---------------------------------------------------------------+
| id | select_type | table                 | type  | possible_keys  | key            | key_len | ref  | rows   | filtered | Extra                                                         |
+----+-------------+-----------------------+-------+----------------+----------------+---------+------+--------+----------+---------------------------------------------------------------+
|  1 | SIMPLE      | t_tbl_test_time_08    | range | f_contact_time | f_contact_time | 5       | NULL | 360478 |      100 | Using index condition; Using where; Using MRR; Using filesort |
+----+-------------+-----------------------+-------+----------------+----------------+---------+------+--------+----------+---------------------------------------------------------------+
1 row in set

第一个explain结果里面，type=index，表示 full index scan。注意这里看到的 index 不代表“查询用到索引了”。全索引扫描比全表扫描并不好到拿去，甚至更慢（因为随机IO）。是否用到正确的索引要看key那一列:
some_qiye_type 索引定义是 (f_some_id,f_qiye_id,f_type)，从key_len=12看得出这三列都用上了（12=4+5+3）。但实际这个执行计划需要200多秒。

第二个explain，在sql里面指定了 use index(f_contact_time)，依据是where条件里面f_contact_time的范围固定4s，猜想数据量不会很大，过滤效果会比较好。

这里的filtered部分，表示执行后的时间？

Extra列:

Using index condition: 用到了索引下推特性。Using where是回表拿数据。关于ICP见文后参考。
Using MRR: 用到了 Multi-Range Read 优化特性。mysql在通过二级索引范围查找的时候，得到的记录在物理上是无序的，为了减少去获取数据的随机IO，它会在内存缓冲区里面先根据rowid快速排序，然后顺序IO去拉取数据。（这个缓冲区大小参数由 read_rnd_buffer_size 控制）
Using filesort: 需要内存排序。对应 order by f_some_id

rows扫描虽然36w行，比前面的 16032 要多，但这个执行计划实际运行只需要0.7s，要快将近300倍。

这里我们祭出优化sql的两大法宝：profiling和optimizer_trace，来尝试找出是什么因素。

profiling：可以定位出sql从接受到返回结果，时间都耗在哪里
optimizer_trace: 跟踪优化器的成本评估过程，可以情况的看到它如何从多个候选索引里，做出选择

mysql> set profiling=1;
mysql> select f_some_id from d_ec_some1.t_tbl_test_time_08 where f_qiye_id=5077665 and f_type=9
and f_contact_time > '2017-10-17 14:23:49' and f_contact_time < '2017-10-17 14:23:53' order by f_some_id limit 300;

mysql> show profile block io,cpu for query 1;
+----------------------+------------+------------+------------+--------------+---------------+
| Status               | Duration   | CPU_user   | CPU_system | Block_ops_in | Block_ops_out |
+----------------------+------------+------------+------------+--------------+---------------+
| starting             | 0.000121   | 0          | 0          |            0 |             0 |
| checking permissions | 3.2E-5     | 0          | 0          |            0 |             0 |
| Opening tables       | 3.7E-5     | 0          | 0          |            0 |             0 |
| init                 | 4.2E-5     | 0          | 0          |            0 |             0 |
| System lock          | 2.9E-5     | 0          | 0          |            0 |             0 |
| optimizing           | 3.3E-5     | 0          | 0          |            0 |             0 |
| statistics           | 0.005796   | 0          | 0.000999   |          448 |             0 |
| preparing            | 4.3E-5     | 0          | 0          |            0 |             0 |
| Sorting result       | 2.8E-5     | 0          | 0          |            0 |             0 |
| executing            | 2.7E-5     | 0          | 0          |            0 |             0 |
| Sending data         | 172.824522 | 189.040262 | 2.441629   |      1504928 |          6896 |
| end                  | 8.3E-5     | 0          | 0          |            0 |             0 |
| query end            | 3E-5       | 0          | 0          |            0 |             0 |
| closing tables       | 3.3E-5     | 0          | 0          |            0 |             0 |
| freeing items        | 7E-5       | 0          | 0          |            0 |             0 |
| logging slow query   | 3.1E-5     | 0          | 0          |            0 |             0 |
| Opening tables       | 3.4E-5     | 0          | 0          |            0 |             0 |
| System lock          | 7E-5       | 0          | 0          |            0 |             8 |
| cleaning up          | 9.5E-5     | 0          | 0          |            0 |             0 |
+----------------------+------------+------------+------------+--------------+---------------+
19 rows in set
 
mysql> show status like "Handler%";
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 4       |
| Handler_mrr_init           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 1       |
| Handler_read_last          | 0       |
| Handler_read_next          | 9430930 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 1       |
+----------------------------+---------+
18 rows in set

那么解决这个问题，就是要去掉干扰因素，或者提供更优的选项给它。

去掉干扰因素
干扰它的索引是 some_qiye_type，是个唯一索引，因为恰好f_some_id开头，索引优化器想当然的用它来避免排序。
去掉这个干扰因素就是调换 f_qiye_id,f_some_id的顺序。调换顺序还有个好处，有f_qiye_id等值条件，可以用在索引检索上。
但是它的负面作用有两个：
- 原本这个表上有 f_some_id 的等值、join ref以及分页查询，调换索引这两个字段顺序后，全都变成慢查询
- f_qiye_id作为第一列，满足条件的值可能会有上百万，对这个查询的改观不大
提供更优的索引
添加索引 (f_qiye_id,f_contact_time) 看起来不错。这样一来，该类查询都会走这个索引

另外这个表上只有一个唯一索引，而且该唯一索引有字段允许null，所以没有主键。
加一个自增主键 f_id bigint unsigned not null
修改f_some_id字段为 f_some_id bigint unsigned NOT NULL
修改f_qiye_id字段为 f_qiye_id int unsigned NOT NULL
修改字段f_type字段为 tinyint NOT NULL
总之一句话：所有（作为索引的）字段，都定义为NOT NULL，f_some_id全部定义为bigint。最终表结构：

CREATE TABLE `t_tbl_test_time_16` (
  `f_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `f_some_id` bigint(20) unsigned NOT NULL,
  `f_qiye_id` int(11) unsigned NOT NULL DEFAULT '0',
  `f_type` tinyint(3) NOT NULL DEFAULT '0' COMMENT '有效联系类型 1: QQ联系，2:拨打电话，3:发送邮件，4:发送短信，5:添加跟进记录，6:拜访客户，7:EC联系，8:更新客户阶段',
  `f_contact_time` timestamp NOT NULL DEFAULT '1970-01-01 16:00:01',
  PRIMARY KEY(f_id),
  UNIQUE KEY `some_qiye_type` (`f_some_id`,`f_qiye_id`,`f_type`) USING BTREE,
  KEY `idx_qiye_time` (`f_qiye_id`,`f_contact_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Aws Lambda Tips

Posted on 2018-05-10 In aws

原文链接

Apex and Terraform

Write Lambda Functions

const aws = require('aws-sdk');

exports.handle = (event, context, callback) => {
  doYourShit();
  callback(null, 'DONE');
};

package main

import (
    "encoding/json"
    "errors"
    "log"

    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    "github.com/vinta/pangu"
)

var (
    // ErrTextNotProvided is thrown when text is not provided in HTTP query string
    ErrTextNotProvided = errors.New("No text was provided in HTTP query string")
)

// Handler is the AWS Lambda function handler
func Handler(request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    log.Printf("request id: %s\n", request.RequestContext.RequestID)

    text, ok := request.QueryStringParameters["t"]
    if !ok {
        errMap := map[string]string{
            "message": ErrTextNotProvided.Error(),
        }
        errMapJSON, _ := json.MarshalIndent(errMap, "", " ")

        return events.APIGatewayProxyResponse{
            Body: string(errMapJSON),
            StatusCode: 400,
        }, nil
    }

    log.Printf("text: %s\n", text)

    textPlainHeaders := map[string]string{
        "content-type": "text/plain; charset=utf-8",
    }
    
    return events.APIGatewayProxyResponse{
        Body: pangu.SpacingText(text),
        Headers: textPlainHeaders,
        StatusCode: 200,
    }, nil
}

func main() {
    lambda.Start(Handler)
}

Large Application With Flask

Posted on 2018-05-10 In flask

原文链接

As we have just been over in the previous section, Flask-way of doing things involves using the tools you are most comfortable with. In our article, we will be using – perhaps – the most common (and sensible) of choices in terms of extensions and libraries (i.e. database extraction layer). These choices will involve:

SQLAlchemy (via Flask-SQLAlchemy)
WTForms (via Flask-WTF)

整个代码架构：

~/LargeApp
    |-- run.py
    |-- config.py
    |__ /env             # Virtual Environment
    |__ /app             # Our Application Module
         |-- __init__.py
         |-- /module_one
             |-- __init__.py
             |-- controllers.py
             |-- models.py                
         |__ /templates
             |__ /module_one
                 |-- hello.html
         |__ /static
         |__ ..
         |__ .
    |__ ..
    |__ .

cd ~/LargeApp
env/bin/pip install flask
env/bin/pip install flask-sqlalchemy
env/bin/pip install flask-wtf

the data model

# Import the database object (db) from the main application module
# We will define this inside /app/__init__.py in the next sections.
from app import db

# Define a base model for other database tables to inherit
class Base(db.Model):

    __abstract__  = True

    id            = db.Column(db.Integer, primary_key=True)
    date_created  = db.Column(db.DateTime,  default=db.func.current_timestamp())
    date_modified = db.Column(db.DateTime,  default=db.func.current_timestamp(),
                                           onupdate=db.func.current_timestamp())

# Define a User model
class User(Base):

    __tablename__ = 'auth_user'

    # User Name
    name    = db.Column(db.String(128),  nullable=False)

    # Identification Data: email & password
    email    = db.Column(db.String(128),  nullable=False,
                                            unique=True)
    password = db.Column(db.String(192),  nullable=False)

    # Authorisation Data: role & status
    role     = db.Column(db.SmallInteger, nullable=False)
    status   = db.Column(db.SmallInteger, nullable=False)

    # New instance instantiation procedure
    def __init__(self, name, email, password):

        self.name     = name
        self.email    = email
        self.password = password

    def __repr__(self):
        return '<User %r>' % (self.name)

Gopher Sg

Posted on 2018-05-07 In golang

Erlang for Go developers

erlang vs golang:

相同点：

在于concurrent部分的设计
标准库的使用

不同点：

deploys on a virtual machine, hot reload
supports distributed communication

erlang runtime

provides behaviours to solve many design problems

很难传递状态，必须通过消息传递

goroutine <-> process (pids)

erlang process:

each process has a mailbox
a process is like a goroutine + channel
this composition creates an ‘actor’
message are serialized through the mailbox

distributed erlang

monitor and link for process? 关于进程间的协同操作，这个挺有用的。

关于supervison trees in go

repo1

Go with Versions

Vgo？

在获取依赖的时候，出现两种情况，too old/ too new

software engineering vs programming engineering

principles of versioning in Go:

Compatibility

如果具有相同的import path, 需要坐到后向兼容

repeatability
cooperation

关于依赖的相互适应部分，

version: major version, minor version, patch version

using latest version is a feature! but only for library builds

The Lost Art of Bondage

cudn?

parse c-header to go source code?

time spent manually annotating functions vs time spent generating code

cuda library

what is the use case of algo?

Distributed Tracing in Grab

OpenTracing API

basictracer-go