Firefox jemalloc是如何工作的？有什么好处？_Firefox_Malloc

Firefox jemalloc是如何工作的？有什么好处？

firefox

Firefox jemalloc是如何工作的？有什么好处？,firefox,malloc,Firefox,Malloc,Firefox3附带了一个新的分配器：jemalloc 我在好几个地方听说这个新的分配器更好。不过，谷歌排名靠前的搜索结果没有提供任何进一步的信息，我对它的工作原理很感兴趣。有一个有趣的来源：C源本身：（）在一开始，一个简短的摘要大致描述了它是如何工作的 // This allocator implementation is designed to provide scalable performance // for multi-threaded programs on multi-pro

Firefox3附带了一个新的分配器：

jemalloc

我在好几个地方听说这个新的分配器更好。不过，谷歌排名靠前的搜索结果没有提供任何进一步的信息，我对它的工作原理很感兴趣。

有一个有趣的来源：C源本身：（）

在一开始，一个简短的摘要大致描述了它是如何工作的

// This allocator implementation is designed to provide scalable performance
// for multi-threaded programs on multi-processor systems.  The following
// features are included for this purpose:
//
//   + Multiple arenas are used if there are multiple CPUs, which reduces lock
//     contention and cache sloshing.
//
//   + Cache line sharing between arenas is avoided for internal data
//     structures.
//
//   + Memory is managed in chunks and runs (chunks can be split into runs),
//     rather than as individual pages.  This provides a constant-time
//     mechanism for associating allocations with particular arenas.
//
// Allocation requests are rounded up to the nearest size class, and no record
// of the original request size is maintained.  Allocations are broken into
// categories according to size class.  Assuming runtime defaults, 4 kB pages
// and a 16 byte quantum on a 32-bit system, the size classes in each category
// are as follows:
//
//   |=====================================|
//   | Category | Subcategory    |    Size |
//   |=====================================|
//   | Small    | Tiny           |       4 |
//   |          |                |       8 |
//   |          |----------------+---------|
//   |          | Quantum-spaced |      16 |
//   |          |                |      32 |
//   |          |                |      48 |
//   |          |                |     ... |
//   |          |                |     480 |
//   |          |                |     496 |
//   |          |                |     512 |
//   |          |----------------+---------|
//   |          | Sub-page       |    1 kB |
//   |          |                |    2 kB |
//   |=====================================|
//   | Large                     |    4 kB |
//   |                           |    8 kB |
//   |                           |   12 kB |
//   |                           |     ... |
//   |                           | 1012 kB |
//   |                           | 1016 kB |
//   |                           | 1020 kB |
//   |=====================================|
//   | Huge                      |    1 MB |
//   |                           |    2 MB |
//   |                           |    3 MB |
//   |                           |     ... |
//   |=====================================|
//
// NOTE: Due to Mozilla bug 691003, we cannot reserve less than one word for an
// allocation on Linux or Mac.  So on 32-bit *nix, the smallest bucket size is
// 4 bytes, and on 64-bit, the smallest bucket size is 8 bytes.
//
// A different mechanism is used for each category:
//
//   Small : Each size class is segregated into its own set of runs.  Each run
//           maintains a bitmap of which regions are free/allocated.
//
//   Large : Each allocation is backed by a dedicated run.  Metadata are stored
//           in the associated arena chunk header maps.
//
//   Huge : Each allocation is backed by a dedicated contiguous set of chunks.
//          Metadata are stored in a separate red-black tree.
//
// *****************************************************************************

不过，缺少更深入的算法分析。

jemalloc

首先出现在FreeBSD上，FreeBSD是一个“Jason Evans”的创意，因此出现了“je”。如果我没有写过一个叫做paxos的操作系统，我会嘲笑他过于自负

有关详细信息，请参阅。这是一份详细描述算法工作原理的白皮书

主要的好处是多处理器和多线程系统的可伸缩性，部分是通过使用多个Arena（进行分配的原始内存块）实现的

在单线程的情况下，多个竞技场没有真正的好处，因此使用单个竞技场

但是，在多线程情况下，会创建许多竞技场（竞技场的数量是处理器的四倍），并以循环方式将线程分配给这些竞技场

这意味着可以减少锁争用，因为当多个线程可以同时调用

malloc

或

free

时，它们只有在共享同一竞技场时才会争用。具有不同竞技场的两个线程不会相互影响

此外，

jemalloc

尝试优化缓存位置，因为从RAM中获取数据的动作比使用CPU缓存中已有的数据慢得多（从RAM快速获取与从磁盘缓慢获取在概念上没有区别）。为此，它首先尝试最小化总体内存使用，因为这样更有可能确保应用程序的整个工作集都在缓存中

在无法实现的地方，它会尝试确保分配是连续的，因为分配在一起的内存往往会一起使用

从白皮书中可以看出，这些策略在提高多线程使用性能的同时，似乎为单线程使用提供了与当前最佳算法相似的性能。

关于jemalloc给mozilla带来的好处，请参见（也是mozilla+jemalloc的首个google结果）：

[…]得出结论，jemalloc在长时间运行后给了我们最小的碎片量。[…]当我们打开jemalloc时，我们在Windows Vista上的自动化测试显示内存使用率下降了22%。
Aerospike在2013年在一家私人分支机构重新实施了jemalloc。2014年，它被纳入Aerospike 3.3。Psi Mankoski刚刚写了Aerospike的实现，以及何时和如何有效地使用jemalloc
jemalloc确实帮助Aerospike利用了现代多线程、多CPU、多核计算机体系结构。jemalloc还内置了一些非常重要的调试功能来管理竞技场。例如，通过调试，Psi能够判断什么是真正的内存泄漏，而不是内存碎片的结果。Psi还讨论了线程缓存和每线程分配如何提高总体性能（速度）