Redis 性能排查指南

Redis 性能问题

执行同样的命令，时快时慢？
执行 SET、DEL耗时也很久？
突然抖一下，又恢复正常？
稳定运行了很久，突然开始变慢了？

流量越大，性能问题越明显

三大问题

网络问题,还是Redis问题,还是基础硬件问题

排查思路

命令查询

arduino复制代码  <https://redis.io/topics/latency-monitor> 官方文档,使用的命令, **CONFIG SET latency-monitor-threshold 100** 单位为毫秒 100表示一百毫秒,如果高于100ns,需要进行排查问题了,这边给的一些常规建议,这个和机器的配置,负载相关的.

redis server 最好使用物理机, 而不是虚拟机
不要频繁连接,使用长连接
优先使用聚合命令(MSET/MGET), 而不是pipeline
优先使用pipeline, 而不是频繁发送命令(多次网络往返)
对不适合使用pipeline的命令, 可以考虑使用lua脚本
持续发送PING 的命令,正常Redis基准性能,目标Redis基准性

** 实例 60 秒内的最大响应延迟 **

yaml复制代码$ redis-cli -h 127.0.0.1 -p 6379 --intrinsic-latency 60
Max latency so far: 1 microseconds.
Max latency so far: 15 microseconds.
Max latency so far: 17 microseconds.
Max latency so far: 18 microseconds.
Max latency so far: 31 microseconds.
Max latency so far: 32 microseconds.
Max latency so far: 59 microseconds.
Max latency so far: 72 microseconds.

1428669267 total runs (avg latency: 0.0420 microseconds / 42.00 nanoseconds per run).
Worst run took 1429x longer than the average latency.

结果分析: 最大响应延迟为 72 微秒查看一段时间内 Redis 的最小、最大、平均访问延迟

lua复制代码$ redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1
min: 0, max: 1, avg: 0.13 (100 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.12 (99 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.13 (99 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.10 (99 samples) -- 1.01 seconds range
min: 0, max: 1, avg: 0.13 (98 samples) -- 1.00 seconds range
min: 0, max: 1, avg: 0.08 (99 samples) -- 1.01 seconds range
...

每间隔 1 秒，采样 Redis 的平均操作耗时，其结果分布在 0.08 ~ 0.13 毫秒之间

** 查询到最近记录的慢日志 slowLog**

可以看到在什么时间点，执行了哪些命令比较耗时。 slowLog 需要设置慢日志的阈值,命令如下

python复制代码# 命令执行耗时超过 5 毫秒，记录慢日志
CONFIG SET slowlog-log-slower-than 5000
# 只保留最近 500 条慢日志
CONFIG SET slowlog-max-len 500

查询最近的慢日志

bash复制代码127.0.0.1:6379> SLOWLOG get 5
1) 1) (integer) 32693       # 慢日志ID
   2) (integer) 1593763337  # 执行时间戳
   3) (integer) 5299        # 执行耗时(微秒)
   4) 1) "LRANGE"           # 具体执行的命令和参数
      2) "user_list:2000"
      3) "0"
      4) "-1"
2) 1) (integer) 32692
   2) (integer) 1593763337
   3) (integer) 5044
   4) 1) "GET"
      2) "user_info:1000"
...

业务角度分析

是否复杂的命令

使用 SlowLog: 查询执行时间的日志系统. 进行查询执行的时间

分析:

0. 消耗cpu 的计算
1. 数据组装和网络传输耗时严重
2. 命令排队,redis 5之前都是单线程的,虽然IO多路复用的

解决方式:

0. 聚合操作,放在客户端(应用)来进行计算,
1. O(n)命令,N要小,尽量小于 n<= 300

BigKey的操作

现象

set/del 也很慢

申请/释放内存,耗时久

String 很大超过10k, Hash:2w field

规避

避免bigkey (10kb 以下)
UNLINK 替换DEL (Redis 4.0 + lazyfree)
Redis 提供了扫描 bigkey 的命令，执行以下命令就可以扫描出，一个实例中 bigkey 的分布情况，输出结果是以类型维度展示的：

python复制代码$ redis-cli -h 127.0.0.1 -p 6379 --bigkeys -i 0.01

...
-------- summary -------

Sampled 829675 keys in the keyspace!
Total key length in bytes is 10059825 (avg len 12.13)

Biggest string found 'key:291880' has 10 bytes
Biggest   list found 'mylist:004' has 40 items
Biggest    set found 'myset:2386' has 38 members
Biggest   hash found 'myhash:3574' has 37 fields
Biggest   zset found 'myzset:2704' has 42 members

36313 strings with 363130 bytes (04.38% of keys, avg size 10.00)
787393 lists with 896540 items (94.90% of keys, avg size 1.14)
1994 sets with 40052 members (00.24% of keys, avg size 20.09)
1990 hashs with 39632 fields (00.24% of keys, avg size 19.92)
1985 zsets with 39750 members (00.24% of keys, avg size 20.03)

原理:就是 Redis 在内部执行了 SCAN 命令，遍历整个实例中所有的 key，然后针对 key 的类型，分别执行 STRLEN、LLEN、HLEN、SCARD、ZCARD 命令，来获取 String 类型的长度、容器类型（List、Hash、Set、ZSet）的元素个数。友情提醒:

对线上实例进行 bigkey 扫描时，Redis 的 OPS 会突增，为了降低扫描过程中对 Redis 的影响，最好控制一下扫描的频率，指定 -i 参数即可，它表示扫描过程中每次扫描后休息的时间间隔，单位是秒
扫描结果中，对于容器类型（List、Hash、Set、ZSet）的 key，只能扫描出元素最多的 key。但一个 key 的元素多，不一定表示占用内存也多，你还需要根据业务情况，进一步评估内存占用情况

解决方案:

业务应用尽量避免写入 bigkey
如果你使用的 Redis 是 4.0 以上版本，用 UNLINK 命令替代 DEL，此命令可以把释放 key 内存的操作，放到后台线程中去执行，从而降低对 Redis 的影响
如果你使用的 Redis 是 6.0 以上版本，可以开启 lazy-free 机制（lazyfree-lazy-user-del = yes），在执行 DEL 命令时，释放内存也会放到后台线程中执行

集中过期

扩展解释一下要深入了解redis 更要看一下Dict RedisDB

javascript复制代码/* Redis database representation. There are multiple databases identified

 * by integers from 0 (the default database) up to the max configured

 * database. The database number is the 'id' field in the structure. */

typedef struct redisDb {

    dict *dict;                 / The keyspace for this DB ，值value存储 space key val space*/

    dict *expires;              / Timeout of keys with a timeout set，带超时的key space */

    dict *blocking_keys;        / Keys with clients waiting for data (BLPOP)*/

    dict *ready_keys;           / Blocked keys that received a PUSH */

    dict *watched_keys;         / WATCHED keys for MULTI/EXEC CAS */

    int id;                     /* Database ID */

    long long avg_ttl;          /* Average TTL, just for stats 超时的avg ttl*/

} redisDb;

dict

arduino复制代码typedef struct dict {
    dictType *type; //不同的key类型的 val的处理方法
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

每个dict 包含字典dictht，他们用于rehashidx，一般情况下用第一个ht[0] dicht(dict.h/dicht)

arduino复制代码/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table; // 数组
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

dictEntry(dict.h/dictEntry)

arduino复制代码
typedef struct dictEntry {
    void *key;
    union {     //这是union联合体，不同的val有不同值，比如字符串，指针等，在过期键中，只使用了s64来存储失效时间
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; //链表
} dictEntry;

redisDb实例
图片.png

整点变慢

间隔固定时间 slowlog 没有记录 expired_keys 短期突增

过期策略

定期删除 ,可以理解为定时任务默认100ms,随机抽取数据,进行删除惰性删除,获取某个指定的key,进行检测一下,判断这个key是否过期, 调用 expireIfNeeded 对输入键进行检查，并将过期键删除. 基数树 wiki 地址

int expireIfNeeded(redisDb *db, robj *key) {
mstime_t when = getExpire(db,key);
mstime_t now;

kotlin复制代码if (when < 0) return 0; /* No expire for this key */

/* Don't expire anything while loading. It will be done later. */
if (server.loading) return 0;

/* If we are in the context of a Lua script, we claim that time is
 * blocked to when the Lua script started. This way a key can expire
 * only the first time it is accessed and not in the middle of the
 * script execution, making propagation to slaves / AOF consistent.
 * See issue #1525 on Github for more information. */
now = server.lua_caller ? server.lua_time_start : mstime();

/* If we are running in the context of a slave, return ASAP:
 * the slave key expiration is controlled by the master that will
 * send us synthesized DEL operations for expired keys.
 *
 * Still we try to return the right information to the caller,
 * that is, 0 if we think the key should be still valid, 1 if
 * we think the key is expired at this time. */
if (server.masterhost != NULL) return now > when;

/* Return when this key has not expired */
if (now <= when) return 0;

/* Delete the key */
server.stat_expiredkeys++;
propagateExpire(db,key,server.lazyfree_lazy_expire);
notifyKeyspaceEvent(NOTIFY_EXPIRED,
    "expired",key,db->id);
return server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
                                     dbSyncDelete(db,key);
                                     }

图片.png

淘汰策略

一下触发条件是: 当内存不足以容纳新写入的数据时

noeviction 没有空间,插入数据报错
allkeys-lru 最少使用的key,进行删除
allkes-random 随机移除某个key
volatile-lru 移除最近最少使用的key,在配置过期时间的key 中进行找数据
volatile-random 内存不足的时候,随机移除某个key,在设置过期时间的key中找数据
volatile-ttl: 有更早过期时间的key 优先移除,在配置了过期时间的key中找数据

Redis 6 过期将不再基于随机采样，但将采用按过期时间排序的键基数树后续写一篇,专门来说redis 的数据结构

绑定CPU

很多时候，我们在部署服务时，为了提高服务性能，降低应用程序在多个 CPU 核心之间的上下文切换带来的性能损耗，通常采用的方案是进程绑定 CPU 的方式提高性能。 Redis Server 除了主线程服务客户端请求之外，还会创建子进程、子线程。其中子进程用于数据持久化，而子线程用于执行一些比较耗时操作，例如异步释放 fd、异步 AOF 刷盘、异步 lazy-free 等等。如果你把 Redis 进程只绑定了一个 CPU 逻辑核心上，那么当 Redis 在进行数据持久化时，fork 出的子进程会继承父进程的 CPU 使用偏好。而此时的子进程会消耗大量的 CPU 资源进行数据持久化（把实例数据全部扫描出来需要耗费CPU），这就会导致子进程会与主进程发生 CPU 争抢，进而影响到主进程服务客户端请求，访问延迟变大。 这就是 Redis 绑定 CPU 带来的性能问题。

现象

Redis 进行绑定固定一个核心
RDB,AOF rewrite期间比较慢

Socket 简称为s

在多 CPU 架构上，应用程序可以在不同的处理器上运行,可以在s1 运行一段时间保存数据,调度到s2 上运行,如果访问之前的s1的内存数据属于远程内存访问,增加应用程序的延迟. 称之为非统一内存访问架构（Non-Uniform Memory Access，NUMA 架构）。跳跃运行程序时对各自内存的远端访问,

解决方案:最好把网络中断程序和 Redis 实例绑在同一个 CPU Socket 上.Redis 实例就可以直接从本地内存读取网络数据了,图如下: 要注意 NUMA 架构下 CPU 核的编号方法，这样才不会绑错核,可以执行 lscpu 命令，查看到这些逻辑核的编号

在多核cpu对Redis 的影响, 多核CPU运行慢的原因,** context switch**:线程的上下文切换,次数太多了,

0. 一个核运行，需要记录运行到哪里了，切换到另一个核的时候，需要把记录的运行时信息同步到另一个核上。
1. 另一个 CPU 核上的 L1、L2 缓存中，并没有 Redis 实例之前运行时频繁访问的指令和数据，所以，这些指令和数据都需要重新从 L3 缓存，甚至是内存中加载。这个重新加载的过程是需要花费一定时间的。

解决方案:

绑到一个cpu核上,使用命令

1 2	arduino复制代码//绑定到0号核上 taskset -c 0 ./redis-server

我们系统基本都是Linux系统,CPU 模式调整成 Performance，即高性能模式

Redis 在 6.0 版本已经推出了这个功能，我们可以通过以下配置，对主线程、后台线程、后台 RDB 进程、AOF rewrite 进程，绑定固定的 CPU 逻辑核心：

bash复制代码# Redis Server 和 IO 线程绑定到 CPU核心 0,2,4,6
server_cpulist 0-7:2

# 后台子线程绑定到 CPU核心 1,3
bio_cpulist 1,3

# 后台 AOF rewrite 进程绑定到 CPU 核心 8,9,10,11
aof_rewrite_cpulist 8-11

# 后台 RDB 进程绑定到 CPU 核心 1,10,11
# bgsave_cpulist 1,10-1

命令使用

禁止使用 keys 命令.
避免一次查询所有的成员，要使用 scan 命令进行分批的，游标式的遍历.
通过机制，严格控制 Hash, Set, Sorted Set 等结构的数据大小.
将排序，并集，交集等操作放在客户端执行，以减少 Redis 服务器运行压力.
删除 (del) 一个大数据的时候，可能会需要很长时间，所以建议用异步删除的方式 unlink, 它会启动一个新的线程来删除目标数据，而不阻塞 Redis 的主线程.

内存达到 maxmemory

实例的内存达到了 maxmemory 后，你可能会发现，在此之后每次写入新数据，操作延迟变大了。原因: Redis 内存达到 maxmemory 后，每次写入新的数据之前，Redis 必须先从实例中踢出一部分数据，让整个实例的内存维持在 maxmemory 之下，然后才能把新数据写进来。淘汰策略上面已经说了,具体的看上面, 优化方案:

避免存储 bigkey，降低释放内存的耗时
淘汰策略改为随机淘汰，随机淘汰比 LRU 要快很多（视业务情况调整）
拆分实例，把淘汰 key 的压力分摊到多个实例上
如果使用的是 Redis 4.0 以上版本，开启 layz-free 机制，把淘汰 key 释放内存的操作放到后台线程中执行（配置 lazyfree-lazy-eviction = yes）

Rehash

现象

写入的key,偶发性的延迟
rehash + maxmemory 触发大量淘汰！

+ maxmemory = 6GB
+ 当前实力内存 = 5.8GB
+ 正好触发扩容，需申请 512MB
+ 超过 maxmemory 触发大量淘汰

rehash 申请内存，翻倍扩容

控制方式:

key 的数量控制在1亿以下
改源码,达到maxmemory 不进行rehash 升级到redis6.0 同样不会进行rehash操作了

以下聊一下 Rehash 细节

redis 为了性能的考虑,拆分为lazy,active 同步进行,直到rehash完成

lazy
active

代码这里是3.0版本的源码 redis-3.0-annotated-unstable\src\dict.c

`/* This function performs just a step of rehashing, and only if there are

no safe iterators bound to our hash table. When we have iterators in the
middle of a rehashing we can’t mess with the two hash tables otherwise
some element can be missed or duplicated.
在字典不存在安全迭代器的情况下，对字典进行单步 rehash 。
字典有安全迭代器的情况下不能进行 rehash ，
因为两种不同的迭代和修改操作可能会弄乱字典。
This function is called by common lookup or update operations in the
dictionary so that the hash table automatically migrates from H1 to H2
while it is actively used.
这个函数被多个通用的查找、更新操作调用，
它可以让字典在被使用的同时进行 rehash 。
T = O(1)
*/
static void _dictRehashStep(dict *d) {
if (d->iterators == 0) dictRehash(d,1);
}`

/* Performs N steps of incremental rehashing. Returns 1 if there are still

keys to move from the old to the new hash table, otherwise 0 is returned.
执行 N 步渐进式 rehash 。
返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表，
返回 0 则表示所有键都已经迁移完毕。
Note that a rehashing step consists in moving a bucket (that may have more
than one key as we use chaining) from the old to the new hash table.
注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的，
一个桶里可能会有多个节点，
被 rehash 的桶里的所有节点都会被移动到新哈希表。
T = O(N)
*/
int dictRehash(dict *d, int n) {

// 只可以在 rehash 进行中时执行
if (!dictIsRehashing(d)) return 0;

// 进行 N 步迁移
// T = O(N)
while(n–) {
dictEntry *de, *nextde;

scss复制代码 /* Check if we already rehashed the whole table... */
 // 如果 0 号哈希表为空，那么表示 rehash 执行完毕
 // T = O(1)
 if (d->ht[0].used == 0) {
     // 释放 0 号哈希表
     zfree(d->ht[0].table);
     // 将原来的 1 号哈希表设置为新的 0 号哈希表
     d->ht[0] = d->ht[1];
     // 重置旧的 1 号哈希表
     _dictReset(&d->ht[1]);
     // 关闭 rehash 标识
     d->rehashidx = -1;
     // 返回 0 ，向调用者表示 rehash 已经完成
     return 0;
 }

 /* Note that rehashidx can't overflow as we are sure there are more
  * elements because ht[0].used != 0 */
 // 确保 rehashidx 没有越界
 assert(d->ht[0].size > (unsigned)d->rehashidx);

 // 略过数组中为空的索引，找到下一个非空索引
 while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;

 // 指向该索引的链表表头节点
 de = d->ht[0].table[d->rehashidx];
 /* Move all the keys in this bucket from the old to the new hash HT */
 // 将链表中的所有节点迁移到新哈希表
 // T = O(1)
 while(de) {
     unsigned int h;

     // 保存下个节点的指针
     nextde = de->next;

     /* Get the index in the new hash table */
     // 计算新哈希表的哈希值，以及节点插入的索引位置
     h = dictHashKey(d, de->key) & d->ht[1].sizemask;

     // 插入节点到新哈希表
     de->next = d->ht[1].table[h];
     d->ht[1].table[h] = de;

     // 更新计数器
     d->ht[0].used--;
     d->ht[1].used++;

     // 继续处理下个节点
     de = nextde;
 }
 // 将刚迁移完的哈希表索引的指针设为空
 d->ht[0].table[d->rehashidx] = NULL;
 // 更新 rehash 索引
 d->rehashidx++;

}

return 1;
}

在dictRehashStep函数中，会调用dictRehash方法，而dictRehashStep每次仅会rehash一个值从ht[0]到 ht[1]，但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的，因此在每次dict增删查改时都会被调用，这无疑就加快了rehash过程。
在dictRehash函数中每次增量rehash n个元素，由于在自动调整大小时已设置好了ht[1]的大小，因此rehash的主要过程就是遍历ht[0]，取得key，然后将该key按ht[1]的桶的大小重新rehash，并在rehash完后将ht[0]指向ht[1],然后将ht[1]清空。在这个过程中rehashidx非常重要，它表示上次rehash时在ht[0]的下标位置。

active rehashing 执行过程: serverCron->databasesCron–>incrementallyRehash->dictRehashMilliseconds->dictRehash

** serverCron**
databasesCron
incrementallyRehash
dictRehashMilliseconds
dictRehash

[1] serverCron

/* This is our timer interrupt, called server.hz times per second.
*

这是 Redis 的时间中断器，每秒调用 server.hz 次。
Here is where we do a number of things that need to be done asynchronously.
For instance:
以下是需要异步执行的操作：
- Active expired keys collection (it is also performed in a lazy way on
lookup).
主动清除过期键。
- Software watchdog.
更新软件 watchdog 的信息。
- Update some statistic.
更新统计信息。
- Incremental rehashing of the DBs hash tables.
对数据库进行渐增式 Rehash
- Triggering BGSAVE / AOF rewrite, and handling of terminated children.
触发 BGSAVE 或者 AOF 重写，并处理之后由 BGSAVE 和 AOF 重写引发的子进程停止。
- Clients timeout of different kinds.
处理客户端超时。
- Replication reconnection.
复制重连
- Many more…
等等。。。
Everything directly called here will be called server.hz times per second,
so in order to throttle execution of things we want to do less frequently
a macro is used: run_with_period(milliseconds) { …. }
因为 serverCron 函数中的所有代码都会每秒调用 server.hz 次，
为了对部分代码的调用次数进行限制，
使用了一个宏 run_with_period(milliseconds) { … } ，
这个宏可以将被包含代码的执行次数降低为每 milliseconds 执行一次。
*/

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
int j;
REDIS_NOTUSED(eventLoop);
REDIS_NOTUSED(id);
REDIS_NOTUSED(clientData);

scss复制代码/* Software watchdog: deliver the SIGALRM that will reach the signal
 * handler if we don't return here fast enough. */
if (server.watchdog_period) watchdogScheduleSignal(server.watchdog_period);

/* Update the time cache. */
updateCachedTime();

// 记录服务器执行命令的次数
run_with_period(100) trackOperationsPerSecond();

/* We have just REDIS_LRU_BITS bits per object for LRU information.
 * So we use an (eventually wrapping) LRU clock.
 *
 * Note that even if the counter wraps it's not a big problem,
 * everything will still work but some object will appear younger
 * to Redis. However for this to happen a given object should never be
 * touched for all the time needed to the counter to wrap, which is
 * not likely.
 *
 * 即使服务器的时间最终比 1.5 年长也无所谓，
 * 对象系统仍会正常运作，不过一些对象可能会比服务器本身的时钟更年轻。
 * 不过这要这个对象在 1.5 年内都没有被访问过，才会出现这种现象。
 *
 * Note that you can change the resolution altering the
 * REDIS_LRU_CLOCK_RESOLUTION define.
 *
 * LRU 时间的精度可以通过修改 REDIS_LRU_CLOCK_RESOLUTION 常量来改变。
 */
server.lruclock = getLRUClock();

/* Record the max memory used since the server was started. */
// 记录服务器的内存峰值
if (zmalloc_used_memory() > server.stat_peak_memory)
    server.stat_peak_memory = zmalloc_used_memory();

/* Sample the RSS here since this is a relatively slow call. */
server.resident_set_size = zmalloc_get_rss();

/* We received a SIGTERM, shutting down here in a safe way, as it is
 * not ok doing so inside the signal handler. */
// 服务器进程收到 SIGTERM 信号，关闭服务器
if (server.shutdown_asap) {

    // 尝试关闭服务器
    if (prepareForShutdown(0) == REDIS_OK) exit(0);

    // 如果关闭失败，那么打印 LOG ，并移除关闭标识
    redisLog(REDIS_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information");
    server.shutdown_asap = 0;
}

/* Show some info about non-empty databases */
// 打印数据库的键值对信息
run_with_period(5000) {
    for (j = 0; j < server.dbnum; j++) {
        long long size, used, vkeys;

        // 可用键值对的数量
        size = dictSlots(server.db[j].dict);
        // 已用键值对的数量
        used = dictSize(server.db[j].dict);
        // 带有过期时间的键值对数量
        vkeys = dictSize(server.db[j].expires);

        // 用 LOG 打印数量
        if (used || vkeys) {
            redisLog(REDIS_VERBOSE,"DB %d: %lld keys (%lld volatile) in %lld slots HT.",j,used,vkeys,size);
            /* dictPrintStats(server.dict); */
        }
    }
}

/* Show information about connected clients */
// 如果服务器没有运行在 SENTINEL 模式下，那么打印客户端的连接信息
if (!server.sentinel_mode) {
    run_with_period(5000) {
        redisLog(REDIS_VERBOSE,
            "%lu clients connected (%lu slaves), %zu bytes in use",
            listLength(server.clients)-listLength(server.slaves),
            listLength(server.slaves),
            zmalloc_used_memory());
    }
}

/* We need to do a few operations on clients asynchronously. */
// 检查客户端，关闭超时客户端，并释放客户端多余的缓冲区
clientsCron();

/* Handle background operations on Redis databases. */
// 对数据库执行各种操作
databasesCron();

/* Start a scheduled AOF rewrite if this was requested by the user while
 * a BGSAVE was in progress. */
// 如果 BGSAVE 和 BGREWRITEAOF 都没有在执行
// 并且有一个 BGREWRITEAOF 在等待，那么执行 BGREWRITEAOF
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
    server.aof_rewrite_scheduled)
{
    rewriteAppendOnlyFileBackground();
}

/* Check if a background saving or AOF rewrite in progress terminated. */
// 检查 BGSAVE 或者 BGREWRITEAOF 是否已经执行完毕
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
    int statloc;
    pid_t pid;

    // 接收子进程发来的信号，非阻塞
    if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
        int exitcode = WEXITSTATUS(statloc);
        int bysignal = 0;
        
        if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);

        // BGSAVE 执行完毕
        if (pid == server.rdb_child_pid) {
            backgroundSaveDoneHandler(exitcode,bysignal);

        // BGREWRITEAOF 执行完毕
        } else if (pid == server.aof_child_pid) {
            backgroundRewriteDoneHandler(exitcode,bysignal);

        } else {
            redisLog(REDIS_WARNING,
                "Warning, detected child with unmatched pid: %ld",
                (long)pid);
        }
        updateDictResizePolicy();
    }
} else {

    /* If there is not a background saving/rewrite in progress check if
     * we have to save/rewrite now */
    // 既然没有 BGSAVE 或者 BGREWRITEAOF 在执行，那么检查是否需要执行它们

    // 遍历所有保存条件，看是否需要执行 BGSAVE 命令
     for (j = 0; j < server.saveparamslen; j++) {
        struct saveparam *sp = server.saveparams+j;

        /* Save if we reached the given amount of changes,
         * the given amount of seconds, and if the latest bgsave was
         * successful or if, in case of an error, at least
         * REDIS_BGSAVE_RETRY_DELAY seconds already elapsed. */
        // 检查是否有某个保存条件已经满足了
        if (server.dirty >= sp->changes &&
            server.unixtime-server.lastsave > sp->seconds &&
            (server.unixtime-server.lastbgsave_try >
             REDIS_BGSAVE_RETRY_DELAY ||
             server.lastbgsave_status == REDIS_OK))
        {
            redisLog(REDIS_NOTICE,"%d changes in %d seconds. Saving...",
                sp->changes, (int)sp->seconds);
            // 执行 BGSAVE
            rdbSaveBackground(server.rdb_filename);
            break;
        }
     }

     /* Trigger an AOF rewrite if needed */
    // 出发 BGREWRITEAOF
     if (server.rdb_child_pid == -1 &&
         server.aof_child_pid == -1 &&
         server.aof_rewrite_perc &&
         // AOF 文件的当前大小大于执行 BGREWRITEAOF 所需的最小大小
         server.aof_current_size > server.aof_rewrite_min_size)
     {
        // 上一次完成 AOF 写入之后，AOF 文件的大小
        long long base = server.aof_rewrite_base_size ?
                        server.aof_rewrite_base_size : 1;

        // AOF 文件当前的体积相对于 base 的体积的百分比
        long long growth = (server.aof_current_size*100/base) - 100;

        // 如果增长体积的百分比超过了 growth ，那么执行 BGREWRITEAOF
        if (growth >= server.aof_rewrite_perc) {
            redisLog(REDIS_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
            // 执行 BGREWRITEAOF
            rewriteAppendOnlyFileBackground();
        }
     }
}

// 根据 AOF 政策，
// 考虑是否需要将 AOF 缓冲区中的内容写入到 AOF 文件中
/* AOF postponed flush: Try at every cron cycle if the slow fsync
 * completed. */
if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);

/* AOF write errors: in this case we have a buffer to flush as well and
 * clear the AOF error in case of success to make the DB writable again,
 * however to try every second is enough in case of 'hz' is set to
 * an higher frequency. */
run_with_period(1000) {
    if (server.aof_last_write_status == REDIS_ERR)
        flushAppendOnlyFile(0);
}

/* Close clients that need to be closed asynchronous */
// 关闭那些需要异步关闭的客户端
freeClientsInAsyncFreeQueue();

/* Clear the paused clients flag if needed. */
clientsArePaused(); /* Don't check return value, just use the side effect. */

/* Replication cron function -- used to reconnect to master and
 * to detect transfer failures. */
// 复制函数
// 重连接主服务器、向主服务器发送 ACK 、判断数据发送失败情况、断开本服务器超时的从服务器，等等
run_with_period(1000) replicationCron();

/* Run the Redis Cluster cron. */
// 如果服务器运行在集群模式下，那么执行集群操作
run_with_period(100) {
    if (server.cluster_enabled) clusterCron();
}

/* Run the Sentinel timer if we are in sentinel mode. */
// 如果服务器运行在 sentinel 模式下，那么执行 SENTINEL 的主函数
run_with_period(100) {
    if (server.sentinel_mode) sentinelTimer();
}

/* Cleanup expired MIGRATE cached sockets. */
// 集群。。。TODO
run_with_period(1000) {
    migrateCloseTimedoutSockets();
}

// 增加 loop 计数器
server.cronloops++;

return 1000/server.hz;

}

// 对数据库执行各种操作
// 对数据库执行删除过期键，调整大小，以及主动和渐进式 rehash
[2] databasesCron
void databasesCron(void) {

scss复制代码// 函数先从数据库中删除过期键，然后再对数据库的大小进行修改

/* Expire keys by random sampling. Not required for slaves
 * as master will synthesize DELs for us. */
// 如果服务器不是从服务器，那么执行主动过期键清除
if (server.active_expire_enabled && server.masterhost == NULL)
    // 清除模式为 CYCLE_SLOW ，这个模式会尽量多清除过期键
    activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);

/* Perform hash tables rehashing if needed, but only if there are no
 * other processes saving the DB on disk. Otherwise rehashing is bad
 * as will cause a lot of copy-on-write of memory pages. */
// 在没有 BGSAVE 或者 BGREWRITEAOF 执行时，对哈希表进行 rehash
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
    /* We use global counters so if we stop the computation at a given
     * DB we'll be able to start from the successive in the next
     * cron loop iteration. */
    static unsigned int resize_db = 0;
    static unsigned int rehash_db = 0;
    unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL;
    unsigned int j;

    /* Don't test more DBs than we have. */
    // 设定要测试的数据库数量
    if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;

    /* Resize */
    // 调整字典的大小
    for (j = 0; j < dbs_per_call; j++) {
        tryResizeHashTables(resize_db % server.dbnum);
        resize_db++;
    }

    /* Rehash */
    // 对字典进行渐进式 rehash
    if (server.activerehashing) {
        for (j = 0; j < dbs_per_call; j++) {
            int work_done = incrementallyRehash(rehash_db % server.dbnum);
            rehash_db++;
            if (work_done) {
                /* If the function did some work, stop here, we'll do
                 * more at the next cron loop. */
                break;
            }
        }
    }
}

}

// 对字典进行渐进式 rehash
[3] incrementallyRehash
/* Our hash table implementation performs rehashing incrementally while

we write/read from the hash table. Still if the server is idle, the hash
table will use two tables for a long time. So we try to use 1 millisecond
of CPU time at every call of this function to perform some rehahsing.
虽然服务器在对数据库执行读取/写入命令时会对数据库进行渐进式 rehash ，
但如果服务器长期没有执行命令的话，数据库字典的 rehash 就可能一直没办法完成，
为了防止出现这种情况，我们需要对数据库执行主动 rehash 。
The function returns 1 if some rehashing was performed, otherwise 0
is returned.
函数在执行了主动 rehash 时返回 1 ，否则返回 0 。
*/
int incrementallyRehash(int dbid) {

/* Keys dictionary /
if (dictIsRehashing(server.db[dbid].dict)) {
dictRehashMilliseconds(server.db[dbid].dict,1);
return 1; / already used our millisecond for this loop… */
}

/* Expires /
if (dictIsRehashing(server.db[dbid].expires)) {
dictRehashMilliseconds(server.db[dbid].expires,1);
return 1; / already used our millisecond for this loop… */
}

return 0;
}

// 在给定100毫秒数内，，对字典进行 rehash 。
[4] dictRehashMilliseconds
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds /
/

在给定毫秒数内，以 100 步为单位，对字典进行 rehash 。
T = O(N)
*/
int dictRehashMilliseconds(dict *d, int ms) {
// 记录开始时间
long long start = timeInMilliseconds();
int rehashes = 0;

while(dictRehash(d,100)) {
rehashes += 100;
// 如果时间已过，跳出
if (timeInMilliseconds()-start > ms) break;
}

return rehashes;
}

// 执行 N 步渐进式 rehash
[5] dictRehash

/* Performs N steps of incremental rehashing. Returns 1 if there are still

keys to move from the old to the new hash table, otherwise 0 is returned.
执行 N 步渐进式 rehash 。
返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表，
返回 0 则表示所有键都已经迁移完毕。
Note that a rehashing step consists in moving a bucket (that may have more
than one key as we use chaining) from the old to the new hash table.
注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的，
一个桶里可能会有多个节点，
被 rehash 的桶里的所有节点都会被移动到新哈希表。
T = O(N)
*/
int dictRehash(dict *d, int n) {

// 只可以在 rehash 进行中时执行
if (!dictIsRehashing(d)) return 0;

// 进行 N 步迁移
// T = O(N)
while(n–) {
dictEntry *de, *nextde;

scss复制代码 /* Check if we already rehashed the whole table... */
 // 如果 0 号哈希表为空，那么表示 rehash 执行完毕
 // T = O(1)
 if (d->ht[0].used == 0) {
     // 释放 0 号哈希表
     zfree(d->ht[0].table);
     // 将原来的 1 号哈希表设置为新的 0 号哈希表
     d->ht[0] = d->ht[1];
     // 重置旧的 1 号哈希表
     _dictReset(&d->ht[1]);
     // 关闭 rehash 标识
     d->rehashidx = -1;
     // 返回 0 ，向调用者表示 rehash 已经完成
     return 0;
 }

 /* Note that rehashidx can't overflow as we are sure there are more
  * elements because ht[0].used != 0 */
 // 确保 rehashidx 没有越界
 assert(d->ht[0].size > (unsigned)d->rehashidx);

 // 略过数组中为空的索引，找到下一个非空索引
 while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;

 // 指向该索引的链表表头节点
 de = d->ht[0].table[d->rehashidx];
 /* Move all the keys in this bucket from the old to the new hash HT */
 // 将链表中的所有节点迁移到新哈希表
 // T = O(1)
 while(de) {
     unsigned int h;

     // 保存下个节点的指针
     nextde = de->next;

     /* Get the index in the new hash table */
     // 计算新哈希表的哈希值，以及节点插入的索引位置
     h = dictHashKey(d, de->key) & d->ht[1].sizemask;

     // 插入节点到新哈希表
     de->next = d->ht[1].table[h];
     d->ht[1].table[h] = de;

     // 更新计数器
     d->ht[0].used--;
     d->ht[1].used++;

     // 继续处理下个节点
     de = nextde;
 }
 // 将刚迁移完的哈希表索引的指针设为空
 d->ht[0].table[d->rehashidx] = NULL;
 // 更新 rehash 索引
 d->rehashidx++;

}

return 1;
}

以上的rehash 源码已经扒完,我们继续进行分析,rehash 为啥会影响性能 rehash 操作会带来较多的数据移动操作

Redis 什么时候做 rehash？

Redis 会使用装载因子（load factor）来判断是否需要做 rehash。 装载因子的计算方式是，哈希表中所有 entry 的个数除以哈希表的哈希桶个数。Redis 会根据装载因子的两种情况，来触发 rehash 操作：装载因子≥1，同时，哈希表被允许进行 rehash；装载因子≥5。

在第一种情况下，如果装载因子等于 1，同时我们假设，所有键值对是平均分布在哈希表的各个桶中的，那么，此时，哈希表可以不用链式哈希，因为一个哈希桶正好保存了一个键值对。但是，如果此时再有新的数据写入，哈希表就要使用链式哈希了，这会对查询性能产生影响。在进行 RDB 生成和 AOF 重写时，哈希表的 rehash 是被禁止的，这是为了避免对 RDB 和 AOF 重写造成影响。如果此时，Redis 没有在生成 RDB 和重写 AOF，那么，就可以进行 rehash。否则的话，再有数据写入时，哈希表就要开始使用查询较慢的链式哈希了。
在第二种情况下，也就是装载因子大于等于 5 时，就表明当前保存的数据量已经远远大于哈希桶的个数，哈希桶里会有大量的链式哈希存在，性能会受到严重影响，此时，就立马开始做 rehash。刚刚说的是触发 rehash 的情况，如果装载因子小于 1，或者装载因子大于 1 但是小于 5，同时哈希表暂时不被允许进行 rehash（例如，实例正在生成 RDB 或者重写 AOF），此时，哈希表是不会进行 rehash 操作的。

定时任务中就包含了 rehash 操作。所谓的定时任务，就是按照一定频率（例如每 100ms/ 次）执行的任务。

运维层面

fork 持久化

现象

操作 Redis 延迟变大，都发生在 Redis 后台 RDB 和 AOF rewrite 期间，那你就需要排查，在这期间有可能导致变慢的情况。主线程创建子进程,会调用操作系统的fork 函数, fork在执行过程中,主进程需要拷贝自己的内存页表给子进程,如果实例很大,拷贝的过程也会很长时间耗时的,此时如果cpu资源也很紧张,fork的耗时会更长,可能达到秒级别, 会严重影响 Redis 的性能。

定位问题

在 Redis 上执行 INFO 命令，查看 latest_fork_usec 项，单位微秒

# 上一次 fork 耗时，单位微秒 latest_fork_usec:59477

这个时间是主进程在fork 子进程期间,整个实例阻塞无法处理客户端请求的时间,如果较长需要注意了,可以理解为JVM 中的STW 状态,实例都处于不可用的状态除了数据持久化会生成 RDB 之外，当主从节点第一次建立数据同步时，主节点也创建子进程生成 RDB，然后发给从节点进行一次全量同步，所以，这个过程也会对 Redis 产生性能影响。

解决方案

slave 在配置持久化的时间放在夜间低峰期执行, 对于丢失数据不敏感的业务（例如把 Redis 当做纯缓存使用），可以关闭 AOF 和 AOF rewrite
控制Redis 实例的内存,控制在10G 内,执行fork 的时长也实例的大小也是成正比的
降低主从库全量同步的概率：适当调大 repl-backlog-size 参数，避免主从全量同步

开启AOF

AOF工作原理

Redis 执行写命令后，把这个命令写入到 AOF 文件内存中（write 系统调用）
Redis 根据配置的 AOF 刷盘策略，把 AOF 内存数据刷到磁盘上（fsync 系统调用）

具体版本

主线程操作完内存数据后，会执行write，之后根据配置决定是立即还是延迟fdatasync
redis在启动时，会创建专门的bio线程用于处理aof持久化
如果是apendfsync=everysec，时机到达后，会创建异步任务(bio)
bio线程轮询任务池，拿到任务后同步执行fdatasync

Redis是通过apendfsync参数来设置不同刷盘策略，apendfsync主要有下面三个选项：

always：

0. **解释:** 主线程每次执行写操作后立即刷盘，此方案会占用比较大的磁盘 IO 资源，但数据安全性最高,
1. **问题点**: 会把命令写入到磁盘中才返回数据,这个过程是主线程完成的,会加重Redis压力,链路也长了

no：

0. **解释**:主线程每次写操作只写内存就返回，内存数据什么时候刷到磁盘，交由操作系统决定，此方案对性能影响最小，但数据安全性也最低，Redis 宕机时丢失的数据取决于操作系统刷盘时机
1. **问题点**: 一旦宕机会将内存中的数据丢失.

everysec：

0. **解释**主线程每次写操作只写内存就返回，然后由后台线程每隔 1 秒执行一次刷盘操作（触发fsync系统调用），此方案对性能影响相对较小，但当 Redis 宕机时会丢失 1 秒的数据
1. **问题点**: 阻塞风险,**解释**当 Redis 后台线程在执行 AOF 文件刷盘时，如果此时磁盘的 IO 负载很高，那这个后台线程在执行刷盘操作（fsync系统调用）时就会被阻塞住。此时的主线程依旧会接收写请求，紧接着，主线程又需要把数据写到文件内存中（write 系统调用），但此时的后台子线程由于磁盘负载过高，导致 fsync 发生阻塞，迟迟不能返回，那主线程在执行 write 系统调用时，也会被阻塞住，直到后台线程 fsync 执行完成后，主线程执行 write 才能成功返回。:
2.

现象

磁盘负载高,
子进程正在执行 AOF rewrite，这个过程会占用大量的磁盘 IO 资源

解决方案

硬件升级为SSD
定位占用磁盘的带宽的程序
no-appendfsync-on-rewrite = yes

0. （AOF rewrite 期间，appendfsync = no）
1. AOF rewrite 期间，AOF 后台子线程不进行刷盘操作
2. 当于在这期间，临时把 appendfsync 设置为了 none

关于AOF对访问延迟的影响，Redis作者曾经专门写过一篇博客 fsync() on a different thread: apparently a useless trick，结论是bio对延迟的改善并不是很大,因为虽然apendfsync=everysec时fdatasync在后台运行，wirte的aof_buf并不大,基本上不会导致阻塞，而是后台的fdatasync会导致write等待datasync完成了之后才调用write导致阻塞，fdataysnc会握住文件句柄，fwrite也会用到文件句柄,这里write会导致了主线程阻塞。这也就是为什么之前浪潮服务器的RAID出现性能问题时，虽然对大部分应用没有影响，但是对于Redis这种对延迟非常敏感的应用却造成了影响的原因 是否可以关闭AOF？ 既然开启AOF会造成访问延迟，那么是可以关闭呢，答案是肯定的，对应纯缓存场景，例如数据Missed后会自动访问数据库，或是可以快速从数据库重建的场景，完全可以关闭，从而获取最优的性能。其实即使关闭了AOF也不意味着当一个分片实例Crash时会丢掉这个分片的数据，我们实际生产环境中每个分片都是会有主备(Master/Slave)两个实例，通过Redis的Replication机制保持同步，当主实例Crash时会自动进行主从切换，将备实例切换为主，从而保证了数据可靠性，为了避免主备同时Crash，实际生产环境都是将主从分布在不同物理机和不同交换机下。

使用Swap 虚拟内存

Redis 虚拟内存这一特性将首次出现在Redis 2.0的一个稳定发布版中。目前Git上Redis 不稳定分支的虚拟内存（从现在起称之为VM）已经可以使用，并且经试验证明足够稳定。

简介

Redis遵循 key-value模型。同时key和value通常都存储在内存中。然而有时这并不是一个最好的选择，所以在设计过程中我们要求key必须存储在内存中（为了保证快速查找），而value在很少使用时，可以从内存被交换出至磁盘上。实际应用中，如果内存中有一个10万条记录的key值数据集，而只有10%被经常使用，那么开启虚拟内存的Redis将把与较少使用的key相对应的value转移至磁盘上。当客户端请求获取这些value时，他们被将从swap 文件中读回，并载入到内存中。

解释

官方解释类似于Windows的虚拟内存，就是当内存不足的时候，把一部分硬盘空间虚拟成内存使用,从而解决内存容量不足的情况。Android是基于Linux的操作系统，所以也可以使用Swap分区来提升系统运行效率. 交换分区，英文的说法是swap，意思是“交换”、“实物交易”。它的功能就是在内存不够的情况下，操作系统先把内存中暂时不用的数据，存到硬盘的交换空间，腾出内存来让别的程序运行，和Windows的虚拟内存（pagefile.sys）的作用是一样的。

现象

请求变慢
响应延迟毫秒/秒级别
服务基本不可用

yaml复制代码# 先找到 Redis 的进程 ID
$ ps -aux | grep redis-server

# 查看 Redis Swap 使用情况
$ cat /proc/$pid/smaps | egrep '^(Swap|Size)'
    
Size:               1256 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              63488 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:              65404 kB
Swap:                  0 kB
Size:            1921024 kB
Swap:                  0 kB
...

每一行 Size 表示 Redis 所用的一块内存大小，Size 下面的 Swap 就表示这块 Size 大小的内存，有多少数据已经被换到磁盘上了，如果这两个值相等，说明这块内存的数据都已经完全被换到磁盘上了
这个时候的解决方案

    1 增加机器的内存，让 Redis 有足够的内存可以使用
    2 整理内存空间，释放出足够的内存供 Redis 使用，然后释放 Redis 的 Swap，让 Redis 重新使用内存

分析

内存数据通过虚拟地址映射到磁盘中
从磁盘中读取数据速度很慢

规避

预留更多的空间,避免使用 swap
内存 / swap 监控

内存碎片

产生的原因

经常进行修改redis 的数据,就有可能导致Redis 内存碎片,内存碎片会降低 Redis 的内存使用率，我们可以通过执行 INFO 命令，得到这个实例的内存碎片率：

写操作
内存分配器

分析

官方的计算 Redis 内存碎片率的公式如下： ** mem_fragmentation_ratio = used_memory_rss / used_memory** 即 Redis 向操作系统中申请的内存与分配器分配的内存总量的比值，两者简单来讲：

前者是我们通过 top 命令看到的 redis 进程 RES 内存占用总量
后者由 Redis 内存分配器（如 jemalloc）分配，包括自身内存、缓冲区、数据对象等

两者的比值结果 < 1 表示碎片率低， > 1 为高, 碎片率高的问题百度上海量文章有介绍，不多赘述，但碎片率低基本都归咎于使用了 SWAP 而导致 Redis 因访问磁盘而性能变慢。但，真的是这样吗？

Redis 内存碎片率低并非只跟 SWAP 有关，生产环境通常建议禁用了 SWAP。
复制积压缓冲区配置较大、业务数据量较小的情况下极容易造成碎片率远低于 1，这是正常现象，无需优化或调整。
通常将线上环境复制缓冲区的值 repl-backlog-size 设置的比较大，目的是防止主库频繁出现全量复制而影响性能。
随着业务数据量增长，Redis 内存碎片率比值会逐渐趋于 1。

解决方案

不开启碎片整理
合理配置阈值

ruby复制代码默认情况下自动清理碎片的参数是关闭的，可以按如下命令查看

127.0.0.1:6379> config get activedefrag 
1) "activedefrag"
2) "no"

启动自动清理内存碎片

127.0.0.1:6379> config set  activedefrag yes
OK

手动清理 命令
127.0.0.1:6379> memory purge
OK

碎片整理在主线程执行