DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

Redis connection pool timeout

在jinwu的运行过程中,出现过,在取redis中的数值时

1
2
3
4
5
6
7
8
9
10
11
12
13
14
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout
pkg error redis: connection pool timeout

在将poolsize升至500后,有时会出现这样的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
pkg error ERR max number of clients reached
win id error ERR max number of clients reached
response id error ERR max number of clients reached

出现这样的问题,可能是因为:

  • Redis is busy doing some expensive work (unlikely)
  • you use PubSub or Multi and don’t close it correctly (multi.Close() when multi is not needed any more) so connection is not returned to the pool

猜测的原因,是因为并发数太大,导致redis部分的pool connection没有及时释放。

注意到redis连接option中这样的一个参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
type Options struct {
...

// The maximum number of socket connections.
// Default is 10 connections.
PoolSize int
// Specifies amount of time client waits for connection if all
// connections are busy before returning an error.
// Default is 1 seconds.
PoolTimeout time.Duration
// Specifies amount of time after which client closes idle
// connections. Should be less than server's timeout.
// Default is to not close idle connections.
IdleTimeout time.Duration

...
}

注意到poolTimeout这样的参数,1s的timeout的设置,在高并发的情况,可能有些connection来不及返回,这样导致获取不到connection,这样就会出现timeout的情况

于是,针对现有的情况,做了以下处理

1
2
3
4
5
6
7
8
9
10
11

Cluster = redis.NewClusterClient(&redis.ClusterOptions{
Addrs: addresses,
PoolSize: 1000,
PoolTimeout: 2 * time.Minute,
IdleTimeout: 10 * time.Minute,
ReadTimeout: 2 * time.Minute,
WriteTimeout: 1 * time.Minute,
// Password: password,
})

将pooltimeout以及相关的timeout设置地稍微大一些,是不是可以避免这样因为高并发引起的timeout的bug?

当然,这样的情况,主要是针对接口不是那么实时,不需要在1s以内返回这样的接口

解决思路参考:

connection pool timeout