1、内存碎片

运行在用户空间（user space）的进程无法直接执行内核代码或者访问内核函数来分配内存资源，需要通过 系统调用接口brk/sbrk()，请求系统内核来操作。但是系统调用会使得CPU从用户态（user mode）切换到内核态（kernel mode），这在需要频繁申请、释放内存的使用场景下会带来较大的性能开销。

为了尽量减少系统调用brk/sbrk()的调用次数，内存管理函数malloc/free()在实现上做了一定的优化。

空闲内存列表.png

一般情况下，在使用free()函数释放内存时不降低 programe break 的位置，而是将需要释放的内存添加到空闲内存列表，供malloc()函数后续循环使用。

也就是说，malloc()函数在申请内存时，会优先在空闲内存列表查找大于或等于申请大小的内存块。如果找到满足需求的内存块，直接返回给调用者；如果内存块较大，可能会对其进行分割，在将一块大小满足需求的内存返回给调用者的同时，把多余的内存块保留着空闲内存列表中。

malloc分配机制.png

Redis自身没有实现底层内存的管理机制，而是依赖于jemalloc/tcmalloc等内存分配器（allocator）的malloc/free()函数族；在删除key或者清除过期keys的时候，调用free()函数来释放内存。实际上这部分内存可能并没有及时返还给操作系统，而是由内存分配器继续持有。

在经过一段时间的使用后，Redis可能会持有大量分配了却没有使用的内存空间，这部分空间被称为 内存碎片。Redis的内存碎片情况可以通过 INFO MEMORY 命令查看：

csharp复制代码[root@localhost redis-6.2.6]# redis-cli info memory
# Memory
// 进程申请内存
used_memory:934384
// 实际分配内存
used_memory_rss:2830336
// 碎片率
mem_fragmentation_ratio:3.20
// 碎片大小（字节）
mem_fragmentation_bytes:1946360
...

Redis作为一款内存数据库（in-memory database），需要频繁的分配、释放内存，持有适量的空闲内存能有效减少系统性能开销、提升内存分配速度。

但是根据malloc()函数的内存分配机制可以知道，维护在空闲内存列表的内存块在经过malloc()函数多次地查找、分割之后，会变得越来越小。直至最后，空闲内存列表中包含大量的小块内存，然而这部分内存的任意一块都无法满足malloc()函数的内存分配需求。

例如，此时堆空间中有总数40k的空闲内存块，但是无法满足一个20k大小的数据的的内存分配需求：

空间不足.png

在物理内存资源紧张的情况下，大量的内存碎片会导致Redis出现 swap交换甚至是内存溢出（oom）的情况，影响Redis服务的性能和稳定性。

注：更多内存分配相关的内容，可以查看 Redis6源码系列（一）- 内存管理zmalloc

2、Memory compaction

内存碎片的问题不仅是体现在用户进程上，还体现在操作系统内核上。

在现代操作系统体系中，往往使用大页面（huge pages）来提升处理器的性能；但是huge pages要求系统能够找到连续的物理内存区域，这些区域不仅要求足够大，而且还要求能正确进行对齐。由于大量内存碎片的存在，系统很可能无法找到满足需求的连续内存空间。

为了解决碎片的问题，内核开发人员采用了各种方法来进行尝试，其中就包含内存压缩（Memory compaction，也称为内存紧缩）技术。

内存压缩1.png

假定一块内存区域如上图所示：白色为空闲内存页，着色的部分为已被分配使用的内存页。

我们可以简单的认为，内存压缩由2个步骤组成：

标识内存页

可移动内存页列表

从内存区域的地步开始，标识已分配使用的内存页，并构造成一个已分配内存页表，称为可移动内存页列表（Movanle pages）

空闲内存页列表

同时，从内存区域的顶部开始，标识未被分配使用的空闲内存页，并构造成空闲内存页列表（Free pages）

内存压缩2.png

页面迁移

两个标识并创建内存页列表的动作在内存区域靠近中间的部分相遇，此时将已分配使用的页面移动到内存区域顶部的空闲空间。

内存压缩3.png

已分配内存页移动后，就得到了一块较为规整的内存区域。当然，这里是一个简化的逻辑，实际上内存压缩（Memory compaction）的实现相当复杂，比如可移动内存页的识别、内存页的移动、压缩动作的触发等等一系列“细节”都是不容易实现的。

3、Redis的碎片整理

在查看Redis内存使用情况时，除了使用 info 命令之外，还可以考虑 memory 命令

内存统计

使用 memory stats 命令可以查看Redis服务的内存统计信息：

php复制代码[root@localhost redis-6.2.6]# ./src/redis-cli memory stats
// Redis使用内存的峰值
 1) "peak.allocated"
 2) (integer) 931888
 // Redis 使用其分配器分配的总字节数
 3) "total.allocated"
 4) (integer) 872024
 ...

memory stats 命令返回的结果几乎都能在 info memory 命令的结果中找到对应的数据项。

内存分配状态

在使用jemalloc作为分配器时，可以查看内存分配状态的分析报告：

yaml复制代码[root@localhost redis-6.2.6]# ./src/redis-cli memory malloc-stats
___ Begin jemalloc statistics ___
Version: "5.1.0-0-g0"
Build-time option settings
  config.cache_oblivious: true
  ...
Arenas: 16
Quantum size: 8
Page size: 4096
Maximum thread-cached size class: 32768
...  
--- End jemalloc statistics ---

内存清理：purge

内存清理 memory purge 同样是jemalloc分配器特有的命令，在使用其他分配器时并不支持。

在进程终止的时候，其所占用的所有内存都会返还给操作系统，所以很多程序的实现中都会依赖这种内存的“自动释放”机制。

但是Redis作为一个数据库服务进程，停机会是一个影响比较大的操作，在常规的生产环境下不应该也不允许经常性的停机重启服务。所以就需要有可以在不停机的情况下清理内存碎片的方法，这就是 memory purge 命令：

1 2	csharp复制代码[root@localhost redis-6.2.6]# ./src/redis-cli memory purge OK

自动整理：defrag

Redis提供了内存碎片自动整理功能（Active Defragmentation），允许服务实例在不停机、无需人工干预的情况下主动整理内存碎片。通过参数设置 config set activedefrag yes 即可启用：

perl复制代码[root@localhost redis-6.2.6]# ./src/redis-cli config get activedefrag
1) "activedefrag"
2) "no"
[root@localhost redis-6.2.6]# ./src/redis-cli config set activedefrag yes
OK

内存碎片自动整理功能最早是在 Redis 4.0 版本引入的，不过在当时这只是一个实验性质的特性。现如今的Redis已经发展到了 6.x 版本，实验性质（experimental）的警告标识也早就已经从配置文件中移除了。

来看下Redis对Active Defragmentation的介绍：

bash复制代码########################### ACTIVE DEFRAGMENTATION #######################
#
# What is active defragmentation?
# -------------------------------
#
# Active (online) defragmentation allows a Redis server to compact the
# spaces left between small allocations and deallocations of data in memory,
# thus allowing to reclaim back memory.
#
# Fragmentation is a natural process that happens with every allocator (but
# less so with Jemalloc, fortunately) and certain workloads. Normally a server
# restart is needed in order to lower the fragmentation, or at least to flush
# away all the data and create it again. However thanks to this feature
# implemented by Oran Agra for Redis 4.0 this process can happen at runtime
# in a "hot" way, while the server is running.
#
# Basically when the fragmentation is over a certain level (see the
# configuration options below) Redis will start to create new copies of the
# values in contiguous memory regions by exploiting certain specific Jemalloc
# features (in order to understand if an allocation is causing fragmentation
# and to allocate it in a better place), and at the same time, will release the
# old copies of the data. This process, repeated incrementally for all the keys
# will cause the fragmentation to drop back to normal values.

简单地理解，在内存碎片达到一定阈值时，Redis会利用某些特定的Jemalloc特性对碎片空间进行整理。换言之，Redis的Active Defragmentation特性只在使用Jemalloc作为底层的分配器时有效。

这一点在配置文件中也有声明：

bash复制代码# Important things to understand:
# 1. This feature is disabled by default, and only works if you compiled Redis
#    to use the copy of Jemalloc we ship with the source code of Redis.
#    This is the default with Linux builds.

启用defrag

默认情况下，内存碎片自动管理功能（defrag）是禁用的，可以通过 CONFIG SET activedefrag yes 命令启用。

相关的配置项有以下几个，在清楚地了解每项配置的含义之后可以根据需求进行调整：

python复制代码# Enabled active defragmentation
activedefrag no

# Minimum amount of fragmentation waste to start active defrag
active-defrag-ignore-bytes 100mb

# Minimum percentage of fragmentation to start active defrag
active-defrag-threshold-lower 10

# Maximum percentage of fragmentation at which we use maximum effort
active-defrag-threshold-upper 100

# Minimal effort for defrag in CPU percentage, to be used when the lower
# threshold is reached
active-defrag-cycle-min 1

# Maximal effort for defrag in CPU percentage, to be used when the upper
# threshold is reached
active-defrag-cycle-max 25

# Maximum number of set/hash/zset/list fields that will be processed from
# the main dictionary scan
active-defrag-max-scan-fields 1000

根据作用可以将这些配置项归类为三类，分别是功能开关、碎片的整理力度、资源的使用情况：

功能开关

activedefrag：内存碎片整理总开关，默认为禁用状态 no
active-defrag-ignore-bytes：可容忍的内存碎片量（字节），内存碎片达到该阈值时允许整理；默认允许最大持有100mb的内存碎片
active-defrag-threshold-lower：可容忍的内存碎片率，内存碎片率达到该阈值时允许整理；默认允许存在10%的内存碎片

在同时满足上面三项配置时，内存碎片自动整理功能才会启用

整理力度

active-defrag-threshold-upper：内存碎片空间占操作系统分配给 Redis 的总空间比例达到此阀值（默认100%）时，则尽最大努力整理
active-defrag-max-scan-fields：碎片整理扫描set/hash/zset/list时，仅当 set/hash/zset/list 的长度小于此阀值时，才会将此key加入碎片整理

资源占用

active-defrag-cycle-min：清理内存碎片占用 CPU 时间的比例不低于此阀值（默认1%），保证清理能正常开展
active-defrag-cycle-max：一旦超过则停止清理，从而避免在清理时，大量的内存拷贝阻塞 Redis，导致其他请求延迟

在实际使用中，建议是在Redis服务出现较多的内存碎片时启用（内存碎片率大于1.5），正常情况下尽量保持禁用状态。

4、defrag 实现

内存碎片自动整理功能（Active Defragmentation）是一项比较有意思的特性，来看看它是怎么实现的。

HAVE_DEFRAG

在分析Redis内存分配管理模块 zmalloc 的时候，发现头文件中根据宏变量 HAVE_DEFRAG 定义了2个函数：

arduino复制代码// 1、定义变量
#if defined(USE_JEMALLOC) && defined(JEMALLOC_FRAG_HINT)
#define HAVE_DEFRAG
#endif

// 2、如果存在变量HAVE_DEFRAG，则编译以下函数
#ifdef HAVE_DEFRAG
// 释放内存
void zfree_no_tcache(void *ptr);
// 分配内存
void *zmalloc_no_tcache(size_t size);
#endif

这2个函数分别用于内存的分配和释放，在实现上区别于常规的分配和释放函数zmalloc/zfree()。以 zmalloc_no_tcache() 为例，内部通过调用je_mallocx()函数来分配内存；je_mallocx()会绕过线程缓存，直接分配内存块，这是在自动内存碎片整理时所要使用到的函数。

scss复制代码#elif defined(USE_JEMALLOC)
...
// 重命名je_mallocx函数为mallocx
#define mallocx(size,flags) je_mallocx(size,flags)
// 重命名je_dallocx函数为dallocx
#define dallocx(ptr,flags) je_dallocx(ptr,flags)
#endif

// 更新已使用内存大小函数
#define update_zmalloc_stat_alloc(__n) atomicIncr(used_memory,(__n))
#define update_zmalloc_stat_free(__n) atomicDecr(used_memory,(__n))

// 已使用内存大小计时器
static redisAtomic size_t used_memory = 0;

/* Allocation and free functions that bypass the thread cache
 * and go straight to the allocator arena bins.
 * Currently implemented only for jemalloc. Used for online defragmentation. */
// 如果存在变量HAVE_DEFRAG，则编译以下函数
#ifdef HAVE_DEFRAG
void *zmalloc_no_tcache(size_t size) {
    ASSERT_NO_SIZE_OVERFLOW(size);
    // 分配内存
    void *ptr = mallocx(size+PREFIX_SIZE, MALLOCX_TCACHE_NONE);
    // 检查分配情况
    if (!ptr) zmalloc_oom_handler(size);
    // 更新内存使用统计信息
    update_zmalloc_stat_alloc(zmalloc_size(ptr));
    return ptr;
}

void zfree_no_tcache(void *ptr) {
    if (ptr == NULL) return;
    // 更新内存使用统计信息
    update_zmalloc_stat_free(zmalloc_size(ptr));
    // 释放内存
    dallocx(ptr, MALLOCX_TCACHE_NONE);
}
#endif

zmalloc_no_tcache()和zfree_no_tcache()函数的定义依赖于宏变量 HAVE_DEFRAG ；从上面源码中的使用宏定义对je_mallocx()函数重命名的逻辑不难看出来，HAVE_DEFRAGE 变量的定义需要满足当前使用Jemalloc作为底层内存分配器这一条件（存在 USE_JEMALLOC 变量）。

arduino复制代码/* We can enable the Redis defrag capabilities only if we are using Jemalloc
 * and the version used is our special version modified for Redis having
 * the ability to return per-allocation fragmentation hints. */
#if defined(USE_JEMALLOC) && defined(JEMALLOC_FRAG_HINT)
#define HAVE_DEFRAG
#endif

这里需要留意的是 defined(JEMALLOC_FRAG_HINT)，判断是否有定义 JEMALLOC_FRAG_HIT 变量。

JEMALLOC_FRAG_HIT 变量的定义在Jemalloc的依赖文件 jemalloc_macros.h.in 中，用于标识当前版本Jemalloc支持碎片整理。标准的Jemalloc内存分配器中是不包含这个变量的，Redis使用的是经过修改的Jemalloc版本。

1 2	arduino复制代码/* This version of Jemalloc, modified for Redis, has the je_get_defrag_hint() function. */ #define JEMALLOC_FRAG_HINT

注释上面的 je_get_defrag_hint() 在Redis 4（使用jemalloc4）中能找到，是 jemalloc.c 提供的一个函数；但是在后续版本中，碎片整理功能的实现有较大的调整，已经不再提供该函数的实现了。

初始化

工具有了，但是怎么去使用又是一个问题。Redis源码中包含了一个叫 defrag.c 的文件，从命名上可以猜测到，自动内存整理功能（Active Memory Defragmentation）的实现应该就在这里。

Defrag由配置项 activedefrag 、active-defrag-ignore-bytes 、active-defrag-threshold-lower 联合决定是否启用，那么在服务启动读取配置文件 redis.conf 之后，就应该会有判断是否启用的逻辑。

Redis程序入口是 server.c 文件的 main()函数，在加载和解析配置文件后调用 initServer() 函数执行初始化服务逻辑，初始化服务的逻辑里面包含一个创建时间事件（aeTimeEvent）的动作。

初始化创建的这个时间事件里面包含了大部分需要异步完成操作，其中就包含自动内存碎片整理：

scss复制代码int main(int argc, char **argv) {
    // 加载、解析配置信息等操作
    ...
    // 初始化服务
    initServer();
    // 其他操作
    ...
}

// 初始化服务
void initServer(void) {
    ...
    // 创建定时器，包含异步的增量操作如客户端超时、key过期等 
    if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR) {
        serverPanic("Can't create event loop timers.");
        exit(1);
    }
    ...
}

/* This is our timer interrupt, called server.hz times per second.
 * Here is where we do a number of things that need to be done asynchronously.
 * For instance:
 *
 * - Active expired keys collection (it is also performed in a lazy way on
 *   lookup).
 * - Software watchdog.
 * - Update some statistic.
 * - Incremental rehashing of the DBs hash tables.
 * - Triggering BGSAVE / AOF rewrite, and handling of terminated children.
 * - Clients timeout of different kinds.
 * - Replication reconnection.
 * - Many more...
 *
 * Everything directly called here will be called server.hz times per second,
 * so in order to throttle execution of things we want to do less frequently
 * a macro is used: run_with_period(milliseconds) { .... }
 */
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ...
    /* Handle background operations on Redis databases. */
    databasesCron();
    ...
}

 // 后台执行的增量操作，例如key过期、rehashing
void databasesCron(void) {
    // key过期失效处理
    if (server.active_expire_enabled) {
        if (iAmMaster()) {
            activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
        } else {
            expireSlaveKeys();
        }
    }

    /* Defrag keys gradually. */
    // 渐进式碎片整理
    activeDefragCycle();
    ...
}

服务初始化涉及较多的代码逻辑，去除掉不关联的部分后将函数调用进行简化，可以得到调用链如下：

Redis的ae事件模型我们先不去深究，可以简单认为这里的 aeCreateTimeEvent() 函数创建了一个每秒执行一次的定时器。

defrag.c

从Redis服务初始化的执行逻辑可以知道，内存碎片整理的实现在 activeDefragCycle() 函数里面。再来看看 defrag.c 文件，它的内部实现主要就是由 activeDefragCycle()、activeDefragAlloc()、activeDefragStringOb() 这三个函数组成的。

arduino复制代码#include "server.h"
#include <time.h>
#include <assert.h>
#include <stddef.h>

#ifdef HAVE_DEFRAG

// 内存自动管理逻辑实现
......

#else /* HAVE_DEFRAG */

// 空实现，什么也不做
void activeDefragCycle(void) {
    /* Not implemented yet. */
}

void *activeDefragAlloc(void *ptr) {
    UNUSED(ptr);
    return NULL;
}

robj *activeDefragStringOb(robj *ob, long *defragged) {
    UNUSED(ob);
    UNUSED(defragged);
    return NULL;
}

#endif

未完待续…

本文转载自: 掘金

开发者博客 – 和开发相关的这里全都有