Agenda:
-
The Problem
-
The Solution
-
Conclusion
Problem
接收大量的post json数据的请求,并将这些数据上传至s3服务器上
先期采用worker-tier的方式
-
Sidekiq
-
Resque
-
DelayedJob
-
Elasticbeanstalk Worker Tier
-
RabbitMQ
-
and so on…
如果采用这种方式,便会分离成为两个cluster,一个处理json请求,一个负责将数据传到s3上
但如果用go的话,可以在一个将这两个cluster化身成为两个method来进行
Solution
goroutines
采用的方式便是goroutines,但切忌用navie的方式
// Go through each payload and queue items individually to be posted to S3
for _, payload := range content.Payloads {
go payload.UploadToS3() // <----- DON'T DO THIS
}
考虑到requests的生命周期很短的情况,我们采用chan的方式,chan的方式,其实也类似于内存级的消息队列。
但随之而来的问题,就是buffer的部分,很容易到达limit,你无法控制limit的增长
We have decided to utilize a common pattern when using Go channels, in order to create a 2-tier channel system, one for queuing jobs and another to control how many workers operate on the JobQueue concurrently.
相关的数据结构为:
type Worker struct {
WorkerPool chan chan Job
JobChannel chan Job
quit chan bool
}
首先启动多个worker来进行dispatcher的操作,在dispatcher的操作里,会去先尝试获取一个有效的worker,然后再将这个job传递给这个worker来进行操作,随后,在woerker里,通过jobChannel的方式,获取到相关的job,从而进行s3的上传工作
关键的代码如下:
for {
select {
case job := <-JobQueue:
// a job request has been received
go func(job Job) {
// try to obtain a worker job channel that is available.
// this will block until a worker is idle
jobChannel := <-d.WorkerPool
// dispatch the job to the worker job channel
jobChannel <- job
}(job)
}
}
此处为dispatcher操作
// Start method starts the run loop for the worker, listening for a quit channel in
// case we need to stop it
func (w Worker) Start() {
go func() {
for {
// register the current worker into the worker queue.
w.WorkerPool <- w.JobChannel
select {
case job := <-w.JobChannel:
// we have received a work request.
if err := job.Payload.UploadToS3(); err != nil {
log.Errorf("Error uploading to S3: %s", err.Error())
}
case <-w.quit:
// we have received a signal to stop
return
}
}
}()
}
此处为worker内部的操作
带来的效果是,服务器数量从100台drop到20台。
Conclusion
Simplicity always wins in my book. We could have designed a complex system with many queues, background workers, complex deployments, but instead we decided to leverage the power of Elasticbeanstalk auto-scaling and the efficiency and simple approach to concurrency that Golang provides us out of the box
语言带来的便利性,可能会好于引入其他各种复杂的系统
There is always the right tool for the job