使用go处理一百万请求

原文链接

Agenda:

The Problem
The Solution
Conclusion

Problem

接收大量的post json数据的请求，并将这些数据上传至s3服务器上

先期采用worker-tier的方式

Sidekiq
Resque
DelayedJob
Elasticbeanstalk Worker Tier
RabbitMQ
and so on…

如果采用这种方式，便会分离成为两个cluster，一个处理json请求，一个负责将数据传到s3上

但如果用go的话，可以在一个将这两个cluster化身成为两个method来进行

Solution

goroutines

采用的方式便是goroutines，但切忌用navie的方式

// Go through each payload and queue items individually to be posted to S3
for _, payload := range content.Payloads {
    go payload.UploadToS3()   // <----- DON'T DO THIS
}

考虑到requests的生命周期很短的情况，我们采用chan的方式，chan的方式，其实也类似于内存级的消息队列。

但随之而来的问题，就是buffer的部分，很容易到达limit，你无法控制limit的增长

We have decided to utilize a common pattern when using Go channels, in order to create a 2-tier channel system, one for queuing jobs and another to control how many workers operate on the JobQueue concurrently.

相关的数据结构为：

type Worker struct {
	WorkerPool  chan chan Job
	JobChannel  chan Job
	quit    	chan bool
}

首先启动多个worker来进行dispatcher的操作，在dispatcher的操作里，会去先尝试获取一个有效的worker，然后再将这个job传递给这个worker来进行操作，随后，在woerker里，通过jobChannel的方式，获取到相关的job，从而进行s3的上传工作

关键的代码如下：

for {
		select {
		case job := <-JobQueue:
			// a job request has been received
			go func(job Job) {
				// try to obtain a worker job channel that is available.
				// this will block until a worker is idle
				jobChannel := <-d.WorkerPool

				// dispatch the job to the worker job channel
				jobChannel <- job
			}(job)
		}
	}

此处为dispatcher操作

// Start method starts the run loop for the worker, listening for a quit channel in
// case we need to stop it
func (w Worker) Start() {
	go func() {
		for {
			// register the current worker into the worker queue.
			w.WorkerPool <- w.JobChannel

			select {
			case job := <-w.JobChannel:
				// we have received a work request.
				if err := job.Payload.UploadToS3(); err != nil {
					log.Errorf("Error uploading to S3: %s", err.Error())
				}

			case <-w.quit:
				// we have received a signal to stop
				return
			}
		}
	}()
}

此处为worker内部的操作

带来的效果是，服务器数量从100台drop到20台。

Conclusion

Simplicity always wins in my book. We could have designed a complex system with many queues, background workers, complex deployments, but instead we decided to leverage the power of Elasticbeanstalk auto-scaling and the efficiency and simple approach to concurrency that Golang provides us out of the box

语言带来的便利性，可能会好于引入其他各种复杂的系统

There is always the right tool for the job