DevilKing's blog

Running keras model in golang

Posted on 2018-04-17 In golang

为什么要running在golang上

Current infrastructure is already running Kubernetes / Docker containers and Golang makes the binaries extremely small and efficient 运行比较小
Web frameworks for Go are much faster than the Python ones golang的web性能高
The team aren’t necessarily data scientists working in Python and work in Go 没必要切换语言？
Pushing data internally using GRPC for faster communication between micro services

Binary Classification in Keras

# Use TF to save the graph model instead of Keras save model to load it in Golang
builder = tf.saved_model.builder.SavedModelBuilder("myModel")  
# Tag the model, required for Go
builder.add_meta_graph_and_variables(sess, ["myTag"])  
builder.save()  
sess.close()

采用saveModel的方式

loading and running the model in Go

package main

import (  
    "fmt"

    tf "github.com/tensorflow/tensorflow/tensorflow/go"
)

func main() {  
    // replace myModel and myTag with the appropriate exported names in the chestrays-keras-binary-classification.ipynb
    model, err := tf.LoadSavedModel("myModel", []string{"myTag"}, nil)

    if err != nil {
        fmt.Printf("Error loading saved model: %s\n", err.Error())
        return
    }

    defer model.Session.Close()

    tensor, _ := tf.NewTensor([1][250][250][3]float32{})

    result, err := model.Session.Run(
        map[tf.Output]*tf.Tensor{
            model.Graph.Operation("inputLayer_input").Output(0): tensor, // Replace this with your input layer name
        },
        []tf.Output{
            model.Graph.Operation("inferenceLayer/Sigmoid").Output(0), // Replace this with your output layer name
        },
        nil,
    )

    if err != nil {
        fmt.Printf("Error running the session with input, err: %s\n", err.Error())
        return
    }

    fmt.Printf("Result value: %v \n", result[0].Value())

}

The tensor we input is in the shape [batch size][width][height][channels].

相同版本的python代码

%%time
from keras.preprocessing import image  
from keras.models import load_model  
import numpy as np  
model = load_model("model.h5")  
img = np.zeros((1,250,250,3))  
x = np.vstack([img]) # just append to this if we have more than one image.  
classes = model.predict_classes(x)  
print(classes)

可以尝试比较一下相关的时间问题？

Performance

Recall the model was:

3x3x32 Convolutional Layer
3x3x32 Convolutional Layer
2x2 Max Pool Layer
64 Node Fully Connected Layer with Dropout
1 Sigmoid output Layer

For Python:

CPU: - ~2.72s to warm up and run one inference and ~0.049s for each inference after
GPU: - ~3.52s to warm up and run one inference and ~0.009s for each inference after
Saved Model Size (HDF5) 242MB

For Go:

CPU: - ~0.255s to warm up and run one inference and ~0.045s for each inference after
GPU: - N/A
Saved Model Size(Protobuf binaries) 236MB

use Go to serve up your models in prod

感觉在k8s上运行更好一些？

在于model训练之后，基本稳定后的predict的操作

Line of sight in code

Posted on 2018-04-17 In golang

原文链接

通过在阅读代码的sight来关注code的优化问题，有意思的视角

Line of sight is “a straight line along which an observer has unobstructed vision”

good code, The idea is that another programmer (including your future self) can glance down a single column and understand the expected flow of the code.

Most people focus on the cost of writing code (ever heard “how long will this take to finish?”) But the far greater cost is in maintaining code — especially in successful projects. Making functions obvious, clear, simple and easy to understand is vital to this cause.

Tips for a good line of sight:

Align the happy path to the left; you should quickly be able to scan down one column to see the expected execution flow
Don’t hide happy path logic inside a nest of indented braces 成功路径显而易见
Exit early from your function 尽早退出你的功能
Avoid else returns; consider flipping the if statement 避免无谓的else
Put the happy return statement as the very last line
Extract functions and methods to keep bodies small and readable extract的必要性
If you need big indented bodies, consider giving them their own function 分拆own function

天元突破

Posted on 2018-04-15 In weekly

你认为我是谁。。

我可是要钻向天际的男人。。

不要管那么多，就是燃就够了。。

本周工作：

尝试性完成了dutch新老语种的转换
关于android端的banner部分

本周所学：

关于kaggle部分的使用
对于词典的编码上，bucket部分，转换部分，算是对lstm有一个入门的想法

下周工作：

考试题目
react native框架尝试搭起来，包括蓝牙通信

锻炼部分，坚持下去，包括健身房，跑步上，已经提升到15km，要稳定下去，饮食方面，晚餐尽量少吃，晚上的keep也要坚持10-20分钟的运动

相处上，还是按部就班，尽量保持独立吧，自己要控制lol。。简直像是精虫上脑一样，尽量少lol。。未来还是有很多的不确定性，不要停留原地，不管它，做好自己。。。

技术上，deep learning上加快速度，有点慢。。。其他技术上，整理一下东西出来。。

简历可以开始更新一下了。。考虑之后想做什么方向。。

时间把控上，还是做的不够，关于英语听说部分，安排时间。。

不要有太多的想法，就是学习就够了。。

bolt-内嵌kv存储

Posted on 2018-04-13 In go

repo

However, this limited scope also means that the project is complete.

more featureful version repo

关于transactions的手动实现

// Start a writable transaction.
tx, err := db.Begin(true)
if err != nil {
    return err
}
defer tx.Rollback()

// Use the transaction...
_, err := tx.CreateBucket([]byte("MyBucket"))
if err != nil {
    return err
}

// Commit the transaction and check for error.
if err := tx.Commit(); err != nil {
    return err
}

使用bucket部分

这里面比较有意思的是：通过seek()部分操作

prefix scans
range scans
foreach

天然支持database backups

comparison with other databases

Postgres, MySQL, & other relational databases

sql

Bolt accesses all data by a byte slice key. This makes Bolt fast to read and write data by key but provides no built-in support for joining values together. Bolt runs as a library included in your application so all data access has to go through your application’s process. This brings data closer to your application but limits multi-process access to the data.

LevelDB, RocksDB

their underlying structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes random writes by using a write ahead log and multi-tiered, sorted files called SSTables

Bolt uses a B+tree internally and only a single file.

If you require a high random write throughput (>10,000 w/sec) or you need to use spinning disks then LevelDB could be a good choice. If your application is read-heavy or does a lot of range scans then Bolt could be a good choice.

One other important consideration is that LevelDB does not have transactions. It supports batch writing of key/values pairs and it supports read snapshots but it will not give you the ability to do a compare-and-swap operation safely. Bolt supports fully serializable ACID transactions.(事务上的支持)

LMDB

Bolt was originally a port of LMDB so it is architecturally similar. Both use a B+tree, have ACID semantics with fully serializable transactions, and support lock-free MVCC using a single writer and multiple readers.

on safe actions

on api, LMDB requires a maximum mmap size when opening an mdb_env whereas Bolt will handle incremental mmap resizing automatically

conclusion

Bolt is good for read intensive workloads.
Bolt uses a B+tree internally so there can be a lot of random page access
Bulk loading a lot of random writes into a new bucket can be slow as the page will not split until the transaction is committed. 在一个事务里尽量是连续性的操作
The data structures in the Bolt database are memory mapped so the data file will be endian specific
Because of the way pages are laid out on disk, Bolt cannot truncate data files and return free pages back to the disk. Instead, Bolt maintains a free list of unused pages within its data file. These free pages can be reused by later transactions. 关于bolt的reuse特性

Neural Machine Translation

Posted on 2018-04-12 In ML

repo

数据预处理

原文链接

使用了新的Dataset API部分

1 2	src_dataset=tf.data.TextLineDataset('src_data.txt') tgt_dataset=tf.data.TextLineDataset('tgt_data.txt')

查找表的构造方法：

def create_vocab_tables(src_vocab_file, tgt_vocab_file, share_vocab):
  """Creates vocab tables for src_vocab_file and tgt_vocab_file."""
  src_vocab_table = lookup_ops.index_table_from_file(
      src_vocab_file, default_value=UNK_ID)
  if share_vocab:
    tgt_vocab_table = src_vocab_table
  else:
    tgt_vocab_table = lookup_ops.index_table_from_file(
        tgt_vocab_file, default_value=UNK_ID)
  return src_vocab_table, tgt_vocab_table

使用了tensorflow库中定义的lookup_ops，简化了产生字典的操作

if not output_buffer_size:
    output_buffer_size = batch_size * 1000
  src_eos_id = tf.cast(src_vocab_table.lookup(tf.constant(eos)), tf.int32)
  tgt_sos_id = tf.cast(tgt_vocab_table.lookup(tf.constant(sos)), tf.int32)
  tgt_eos_id = tf.cast(tgt_vocab_table.lookup(tf.constant(eos)), tf.int32)

# 通过zip操作将源数据集和目标数据集合并在一起
# 此时的张量变化 [src_dataset] + [tgt_dataset] ---> [src_dataset, tgt_dataset]
  src_tgt_dataset = tf.data.Dataset.zip((src_dataset, tgt_dataset))
# 数据集分片，分布式训练的时候可以分片来提高训练速度
  src_tgt_dataset = src_tgt_dataset.shard(num_shards, shard_index)
  if skip_count is not None:
    src_tgt_dataset = src_tgt_dataset.skip(skip_count)
# 随机打乱数据，切断相邻数据之间的联系
# 根据文档，该步骤要尽早完成，完成该步骤之后在进行其他的数据集操作
  src_tgt_dataset = src_tgt_dataset.shuffle(
      output_buffer_size, random_seed, reshuffle_each_iteration)
    
  # 将每一行数据，根据“空格”切分开来
  # 这个步骤可以并发处理，用num_parallel_calls指定并发量
  # 通过prefetch来预获取一定数据到缓冲区，提升数据吞吐能力
  # 张量变化举例 ['上海　浦东', '上海　浦东'] ---> [['上海', '浦东'], ['上海', '浦东']]
  src_tgt_dataset = src_tgt_dataset.map(
      lambda src, tgt: (
          tf.string_split([src]).values, tf.string_split([tgt]).values),
      num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)

  # Filter zero length input sequences.
  src_tgt_dataset = src_tgt_dataset.filter(
      lambda src, tgt: tf.logical_and(tf.size(src) > 0, tf.size(tgt) > 0))
# 限制源数据最大长度
  if src_max_len:
    src_tgt_dataset = src_tgt_dataset.map(
        lambda src, tgt: (src[:src_max_len], tgt),
        num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)
  # 限制目标数据的最大长度
  if tgt_max_len:
    src_tgt_dataset = src_tgt_dataset.map(
        lambda src, tgt: (src, tgt[:tgt_max_len]),
        num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)
  # Convert the word strings to ids.  Word strings that are not in the
  # vocab get the lookup table's default_value integer.
  # 通过map操作将字符串转换为数字
  # 张量变化举例 [['上海', '浦东'], ['上海', '浦东']] ---> [[1, 2], [1, 2]]
  src_tgt_dataset = src_tgt_dataset.map(
      lambda src, tgt: (tf.cast(src_vocab_table.lookup(src), tf.int32),
                        tf.cast(tgt_vocab_table.lookup(tgt), tf.int32)),
      num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)
  # Create a tgt_input prefixed with <sos> and a tgt_output suffixed with <eos>.
  
  # 给目标数据加上 sos, eos　标记
  # 张量变化举例 [[1, 2], [1, 2]] ---> [[1, 2], [sos_id, 1, 2], [1, 2, eos_id]]
  src_tgt_dataset = src_tgt_dataset.map(
      lambda src, tgt: (src,
                        tf.concat(([tgt_sos_id], tgt), 0),
                        tf.concat((tgt, [tgt_eos_id]), 0)),
      num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)
  # Add in sequence lengths.
  # 增加长度信息
  # 张量变化举例 [[1, 2], [sos_id, 1, 2], [1, 2, eos_id]] ---> [[1, 2], [sos_id, 1, 2], [1, 2, eos_id], [src_size], [tgt_size]]
  src_tgt_dataset = src_tgt_dataset.map(
      lambda src, tgt_in, tgt_out: (
          src, tgt_in, tgt_out, tf.size(src), tf.size(tgt_in)),
      num_parallel_calls=num_parallel_calls).prefetch(output_buffer_size)

处理过程分析：

开始标记和结束标记，转换成为int32
关于增加sos以及eos标记，为啥src和target添加的标记不同？
关于增加长度信息的意义？

# 数据对齐
# 参数x实际上就是我们的 dataset 对象
def batching_func(x):
    # 调用dataset的padded_batch方法，对齐的同时，也对数据集进行分批
    return x.padded_batch(
        batch_size,
        # 对齐数据的形状
        padded_shapes=(
            # 因为数据长度不定，因此设置None
            tf.TensorShape([None]),  # src
            # 因为数据长度不定，因此设置None
            tf.TensorShape([None]),  # tgt_input
            # 因为数据长度不定，因此设置None
            tf.TensorShape([None]),  # tgt_output
            # 数据长度张量，实际上不需要对齐
            tf.TensorShape([]),  # src_len
            tf.TensorShape([])),  # tgt_len
        # 对齐数据的值
        padding_values=(
            # 用src_eos_id填充到 src 的末尾
            src_eos_id,  # src
            # 用tgt_eos_id填充到 tgt_input 的末尾
            tgt_eos_id,  # tgt_input
            # 用tgt_eos_id填充到 tgt_output 的末尾
            tgt_eos_id,  # tgt_output
            0,  # src_len -- unused
            0))  # tgt_len -- unused

这个数据对齐，没看懂。。

if num_buckets > 1:

   def key_func(unused_1, unused_2, unused_3, src_len, tgt_len):
     # Calculate bucket_width by maximum source sequence length.
     # Pairs with length [0, bucket_width) go to bucket 0, length
     # [bucket_width, 2 * bucket_width) go to bucket 1, etc.  Pairs with length
     # over ((num_bucket-1) * bucket_width) words all go into the last bucket.
     if src_max_len:
       bucket_width = (src_max_len + num_buckets - 1) // num_buckets
     else:
       bucket_width = 10

     # Bucket sentence pairs by the length of their source sentence and target
     # sentence.
     bucket_id = tf.maximum(src_len // bucket_width, tgt_len // bucket_width)
     return tf.to_int64(tf.minimum(num_buckets, bucket_id))

   def reduce_func(unused_key, windowed_data):
     return batching_func(windowed_data)

   batched_dataset = src_tgt_dataset.apply(
       tf.contrib.data.group_by_window(
           key_func=key_func, reduce_func=reduce_func, window_size=batch_size))
 else:
   batched_dataset = batching_func(src_tgt_dataset)

关于分桶操作，得到的结果，就是相似长度的数据放在一起，能够提升计算效率！！

使用迭代器获取处理之后的数据

1
2
3

batched_iter = batched_dataset.make_initializable_iterator()
 (src_ids, tgt_input_ids, tgt_output_ids, src_seq_len,
  tgt_seq_len) = (batched_iter.get_next())