Algorithm 直接按升序枚举数字的因子而不排序？_Algorithm_Primes_Enumeration_Factors

Algorithm 直接按升序枚举数字的因子而不排序？

algorithm

Algorithm 直接按升序枚举数字的因子而不排序？,algorithm,primes,enumeration,factors,Algorithm,Primes,Enumeration,Factors,是否有一种有效的算法可以按升序枚举数字n的因子，而无需排序？我所说的“高效”是指：该算法从n的素数幂分解开始，避免了对除数的强力搜索该算法的运行时复杂度为O（d log）₂ d）或者更好，其中d是n的除数计数该算法的空间复杂度为O（d）该算法避免了排序操作。也就是说，这些因素是有序产生的，而不是无序产生的，然后再进行排序。虽然使用简单的递归方法进行枚举，然后进行排序是O（d）log₂ d），对于排序过程中涉及的所有内存访问来说，代价非常高昂一个简单的例子是n=360=2³×3²×5

是否有一种有效的算法可以按升序枚举数字n的因子，而无需排序？我所说的“高效”是指：

该算法从n的素数幂分解开始，避免了对除数的强力搜索

该算法的运行时复杂度为O（d log）₂ d）或者更好，其中d是n的除数计数

该算法的空间复杂度为O（d）

该算法避免了排序操作。也就是说，这些因素是有序产生的，而不是无序产生的，然后再进行排序。虽然使用简单的递归方法进行枚举，然后进行排序是O（d）log₂ d），对于排序过程中涉及的所有内存访问来说，代价非常高昂

一个简单的例子是n=360=2³×3²×5，其中d=24个因子：{1、2、3、4、5、6、8、9、10、12、15、18、20、24、30、36、40、45、60、72、90、120、180、360}

更严重的例子是n=278282512406132373381723386382308832000=2⁸ × 3⁴ ×5³×7²×11²×13²×17×19×23×29×31×37×41×43×47×53×59×61×67×71×73×79，其系数d=318504960（显然太多，无法在此列出！）。顺便说一句，这个数字的因子数最多为2^128

我可以发誓，我在几周前看到了这种算法的描述，带有示例代码，但现在我似乎在任何地方都找不到它。它使用了一些魔术，在输出列表中为每个素因子维护一个祖先索引列表。（最新消息：我把因子生成和汉明数混淆了，汉明数的运算方式类似。）

更新

我最终使用了一个在运行时为O（d）的解决方案，它具有极低的内存开销，可以就地创建O（d）输出，并且比我所知道的任何其他方法都要快得多。我已经发布了这个解决方案作为答案，带有C源代码。这是另一位撰稿人Will Ness在Haskell中介绍的一个漂亮算法的高度优化、简化版本。我选择威尔的答案作为公认的答案，因为它提供了一个非常优雅的解决方案，符合最初所述的所有要求。

简言之：反复从堆中提取下一个最小的因子，然后将仍然是n因子的因子的每一个倍数推回。请使用技巧避免出现重复，以便堆大小永远不会超过d。时间复杂度为O（kd log d），其中k是不同素因子的数目

我们利用的关键性质是，如果x和y都是n的因子，对于某些因子p>=2，y=x*p，也就是说，如果x的素因子是y的素因子的一个适当的子集合，那么x 第一次尝试：复制会减慢速度首先描述一种算法将有助于产生正确答案，但也会产生许多重复：

设置prev=NULL

在堆H中插入1

从H中提取堆t的顶部。如果堆为空，则停止

如果t==prev，则转到3[编辑：修复]

输出t

设置prev=t

对于n的每个不同素因子p：

如果n%（t*p）=0（即，如果t*p仍然是n的一个因子），则将t*p推到H上

转到3

上述算法的唯一问题是，它可以多次生成相同的因子。例如，如果n=30，则因子15将作为因子5的子项（通过乘以素数因子3）生成，并且作为因子3的子项（通过乘以5）生成。解决此问题的一种方法是注意，当任何重复项到达堆的顶部时，必须在连续块中读取它们，因此您可以简单地检查堆的顶部是否等于刚刚提取的值，如果是，则继续提取并丢弃它。但更好的方法是可能的：

在源位置消除重复项有多少种方法可以生成因子x？首先考虑X不包含多重因子＞1的素因子的情况。在这种情况下，如果它包含m个不同的素数因子，那么在前面的算法中，有m-1个“父”因子将生成它作为“子”——这些父因子中的每一个都由m-1个素数因子的子集组成，剩余的素数因子就是添加到子因子中的那个因子。（如果x有一个重数大于1的素因子，那么实际上有m个父项。）如果我们有一种方法来决定这些父项中的哪一个才是真正生成x作为子项的“选择的一个”，并且这条规则产生了一个测试，可以在父项弹出时应用于每个父项，这样，我们就可以避免在一开始就创建任何副本

我们可以使用以下规则：对于任何给定的x，选择缺少x的m因子中最大的潜在父y。这就形成了一个简单的规则：当且仅当某个p大于或等于y中已有的任何素数因子时，父y生成子x。这很容易测试：只需按降序循环素数因子，为每个因子生成子因子，直到找到一个已经除以y的素数因子。在前面的示例中，父3将生成15，但父5不会（因为3<5）——因此15实际上只生成一次。对于n=30，完整的树如下所示：

请注意，每个因子只生成一次

新的无重复算法如下所示：

在堆H中插入1

从H中提取堆t的顶部。如果堆为空，则停止

输出t

对于n的每个不同素因子p，按降序排列：

如果n%（t*p）=0（即，如果t*p仍然是a

 /*==============================================================================

 DESCRIPTION

    This is a small proof-of-concept program to test the idea of "sorting"
    factors using a form of bucket sort.  The method is essentially a 2D version
    of ProxMapSort that has tuned for vast, nonlinear distributions using two
    keys (major, minor) rather than one.  The major key is simply the floor of
    the base-2 logarithm of the value, and the minor key is derived from the most
    significant bits of the value.


 INPUT

    Input is given on the command line, either as a single argument giving the
    number to be factored or an even number of arguments giving the 2-tuples that
    comprise the prime-power factorization of the desired number.  For example,
    the number

       75600 = 2^4 x 3^3 x 5^2 x 7

    can be given by the following list of arguments:

       2 4 3 3 5 2 7 1

    Note:  If a single number is given, it will require factoring to produce its
    prime-power factorization.  Since this is just a small test program, a very
    crude factoring method is used that is extremely fast for small prime factors
    but extremely slow for large prime factors.  This is actually fine, because
    the largest factor lists occur with small prime factors anyway, and it is the
    production of large factor lists at which this program aims to be proficient.
    It is simply not interesting to be fast at producing the factor list of a
    number like 17293823921105882610 = 2 x 3 x 5 x 576460797370196087, because
    it has only 32 factors.  Numbers with tens or hundreds of thousands of
    factors are much more interesting.


 OUTPUT

    Results are written to standard output.  A list of factors in ascending order
    is produced, followed by runtime (in microseconds) required to generate the
    list (not including time to print it).


 STATISTICS

    Bucket size statistics for the 47616 canonical representatives of the prime
    signature equivalence classes of 64-bit numbers:

    ==============================================================
    Bucket size     Total count of factored       Total count of
         b          numbers needing size b      buckets of size b
    --------------------------------------------------------------
         1               47616 (100.0%)         514306458  (76.2%)
         2               47427  (99.6%)         142959971  (21.2%)
         3               43956  (92.3%)          16679329   (2.5%)
         4               27998  (58.8%)            995458   (0.1%)
         5                6536  (13.7%)             33427  (<0.1%)
         6                 400   (0.8%)               729  (<0.1%)
         7                  12  (<0.1%)                18  (<0.1%)
    --------------------------------------------------------------
         ~               47616 (100.0%)         674974643 (100.0%)
    --------------------------------------------------------------

    Thus, no 64-bit number (of the input set) ever requires more than 7 buckets,
    and the larger the bucket size the less frequent it is.  This is highly
    desirable.  Note that although most numbers need at least 1 bucket of size 5,
    the vast majority of buckets (99.9%) are of size 1, 2, or 3, meaning that
    insertions are extremely efficient.  Therefore, the use of insertion sort
    for the buckets is clearly the right choice and is arguably optimal for
    performance.


 AUTHOR

    Todd Lehman
    2015/05/08

 */

 #include <inttypes.h>
 #include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdarg.h>
 #include <string.h>
 #include <time.h>
 #include <math.h>
 #include <assert.h>

 typedef  unsigned int  uint;
 typedef  uint8_t       uint8;
 typedef  uint16_t      uint16;
 typedef  uint32_t      uint32;
 typedef  uint64_t      uint64;

 #define  ARRAY_CAPACITY(x)  (sizeof(x) / sizeof((x)[0]))

 //-----------------------------------------------------------------------------
 // This structure is sufficient to represent the prime-power factorization of
 // all 64-bit values.  The field names ω and Ω are dervied from the standard
 // number theory functions ω(n) and Ω(n), which count the number of unique and
 // non-unique prime factors of n, respectively.  The field name d is derived
 // from the standard number theory function d(n), which counts the number of
 // divisors of n, including 1 and n.
 //
 // The maximum possible value here of ω is 15, which occurs for example at
 // n = 7378677391061896920 = 2^3 x 3^2 x 5 x 7 x 11 x 13 x 17 x 19 x 23 x 29
 // 31 x 37 x 41 x 43 x 47, which has 15 unique prime factors.
 //
 // The maximum possible value of Ω here is 63, which occurs for example at
 // n = 2^63 and n = 2^62 x 3, both of which have 63 non-unique prime factors.
 //
 // The maximum possible value of d here is 184320, which occurs at
 // n = 18401055938125660800 = 2^7 x 3^4 x 5^2 x 7^2 x 11 x 13 x 17 x 19 x 23 x
 // 29 x 31 x 37 x 41.
 //
 // Maximum possible exponents when exponents are sorted in decreasing order:
 //
 //    Index   Maximum   Bits   Example of n
 //    -----   -------   ----   --------------------------------------------
 //        0        63      6   (2)^63
 //        1        24      5   (2*3)^24
 //        2        13      4   (2*3*5)^13
 //        3         8      4   (2*3*5*7)^8
 //        4         5      3   (2*3*5*7*11)^5
 //        5         4      3   (2*3*5*7*11*13)^4
 //        6         3      2   (2*3*5*7*11*13*17)^3
 //        7         2      2   (2*3*5*7*11*13*17*19)^2
 //        8         2      2   (2*3*5*7*11*13*17*19*23)^2
 //        9         1      1   (2*3*5*7*11*13*17*19*23*29)^1
 //       10         1      1   (2*3*5*7*11*13*17*19*23*29*31)^1
 //       11         1      1   (2*3*5*7*11*13*17*19*23*29*31*37)^1
 //       12         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41)^1
 //       13         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43)^1
 //       14         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43*47)^1
 //    -----   -------   ----   --------------------------------------------
 //       15        63     37
 //
 #pragma pack(push, 8)
 typedef struct
 {
   uint8   e[16];  // Exponents.
   uint64  p[16];  // Primes in increasing order.
   uint8   ω;      // Count of prime factors without multiplicity.
   uint8   Ω;      // Count of prime factors with multiplicity.
   uint32  d;      // Count of factors of n, including 1 and n.
   uint64  n;      // Value of n on which all other fields of this struct depend.
 }
 PrimePowerFactorization;  // 176 bytes with 8-byte packing
 #pragma pack(pop)

 #define  MAX_ω  15
 #define  MAX_Ω  63

 //-----------------------------------------------------------------------------
 // Fatal error:  print error message and abort.

 void fatal_error(const char *format, ...)
 {
   va_list args;
   va_start(args, format);
   vfprintf(stderr, format, args);
   exit(1);
 }

 //-----------------------------------------------------------------------------
 // Compute 64-bit 2-adic integer inverse.

 uint64 uint64_inv(const uint64 x)
 {
   assert(x != 0);

   uint64 y = 1;
   for (uint i = 0; i < 6; i++)  // 6 = log2(log2(2**64)) = log2(64)
     y = y * (2 - (x * y));

   return y;
 }

 //------------------------------------------------------------------------------
 // Compute 2 to arbitrary power.  This is just a portable and abstract way to
 // write a left-shift operation.  Note that the use of the UINT64_C macro here
 // is actually required, because the result of 1U<<x is not guaranteed to be a
 // 64-bit result; on a 32-bit compiler, 1U<<32 is 0 or is undefined.

 static inline
 uint64 uint64_pow2(x)
 {
   return UINT64_C(1) << x;
 }

 //------------------------------------------------------------------------------
 // Deduce native word size (int, long, or long long) for 64-bit integers.
 // This is needed for abstracting certain compiler-specific intrinsic functions.

 #if UINT_MAX == 0xFFFFFFFFFFFFFFFFU
   #define UINT64_IS_U
 #elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFUL
   #define UINT64_IS_UL
 #elif ULLONG_MAX == 0xFFFFFFFFFFFFFFFFULL
   #define UINT64_IS_ULL
 #else
   //error "Unable to deduce native word size of 64-bit integers."
 #endif

 //------------------------------------------------------------------------------
 // Define abstracted intrinsic function for counting leading zeros.  Note that
 // the value is well-defined for nonzero input but is compiler-specific for
 // input of zero.

 #if   defined(UINT64_IS_U) && __has_builtin(__builtin_clz)
   #define UINT64_CLZ(x) __builtin_clz(x)
 #elif defined(UINT64_IS_UL) && __has_builtin(__builtin_clzl)
   #define UINT64_CLZ(x) __builtin_clzl(x)
 #elif defined(UINT64_IS_ULL) && __has_builtin(__builtin_clzll)
   #define UINT64_CLZ(x) __builtin_clzll(x)
 #else
   #undef UINT64_CLZ
 #endif

 //------------------------------------------------------------------------------
 // Compute floor of base-2 logarithm y = log_2(x), where x > 0.  Uses fast
 // intrinsic function if available; otherwise resorts to hand-rolled method.

 static inline
 uint uint64_log2(uint64 x)
 {
   assert(x > 0);

   #if defined(UINT64_CLZ)
     return 63 - UINT64_CLZ(x);
   #else
     #define S(k) if ((x >> k) != 0) { y += k; x >>= k; }
     uint y = 0; S(32); S(16); S(8); S(4); S(2); S(1); return y;
     #undef S
   #endif
 }

 //------------------------------------------------------------------------------
 // Compute major key, given a nonzero number.  The major key is simply the
 // floor of the base-2 logarithm of the number.

 static inline
 uint major_key(const uint64 n)
 {
   assert(n > 0);
   uint k1 = uint64_log2(n);
   return k1;
 }

 //------------------------------------------------------------------------------
 // Compute minor key, given a nonzero number, its major key, k1, and the
 // bit-size b of major bucket k1.  The minor key, k2, is is computed by first
 // removing the most significant 1-bit from the number, because it adds no
 // information, and then extracting the desired number of most significant bits
 // from the remainder.  For example, given the number n=1463 and a major bucket
 // size of b=6 bits, the keys are computed as follows:
 //
 //    Step 0:  Given number              n = 0b10110110111 = 1463
 //
 //    Step 1:  Compute major key:        k1 = floor(log_2(n)) = 10
 //
 //    Step 2:  Remove high-order 1-bit:  n' = 0b0110110111 = 439
 //
 //    Step 3:  Compute minor key:        k2 = n' >> (k1 - b)
 //                                          = 0b0110110111 >> (10 - 6)
 //                                          = 0b0110110111 >> 4
 //                                          = 0b011011
 //                                          = 27

 static inline
 uint minor_key(const uint64 n, const uint k1, const uint b)
 {
   assert(n > 0); assert(k1 >= 0); assert(b > 0);
   const uint k2 = (uint)((n ^ uint64_pow2(k1)) >> (k1 - b));
   return k2;
 }

 //------------------------------------------------------------------------------
 // Raw unsorted factor.

 #pragma push(pack, 4)

 typedef struct
 {
   uint64  n;   // Value of factor.
   uint32  k1;  // Major key.
   uint32  k2;  // Minor key.
 }
 UnsortedFactor;

 #pragma pop(pack)

 //------------------------------------------------------------------------------
 // Compute sorted list of factors, given a prime-power factorization.

 static uint64 memory_usage;

 uint64 *compute_factors(const PrimePowerFactorization ppf)
 {
   memory_usage = 0;

   if (ppf.n == 0)
     return NULL;

   uint64 *sorted_factors = calloc(ppf.d, sizeof(*sorted_factors));
   if (!sorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*sorted_factors);

   UnsortedFactor *unsorted_factors = malloc(ppf.d * sizeof(*unsorted_factors));
   if (!unsorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*unsorted_factors);


   // These arrays are indexed by the major key of a number.
   uint32 major_counts[64];   // Counts of factors in major buckets.
   uint32 major_spans[64];    // Counts rounded up to power of 2.
   uint32 major_bits[64];     // Base-2 logarithm of bucket size.
   uint32 major_indexes[64];  // Indexes into minor array.
   memset(major_counts,  0, sizeof(major_counts));
   memset(major_spans,   0, sizeof(major_spans));
   memset(major_bits,    0, sizeof(major_bits));
   memset(major_indexes, 0, sizeof(major_indexes));


   // --- Step 1:  Produce unsorted list of factors from prime-power
   //     factorization.  At the same time, count groups of factors by their
   //     major keys.
   {
     // This array is for counting in the multi-radix number system dictated by
     // the exponents of the prime-power factorization.  An invariant is that
     // e[i] <= ppf.e[i] for all i (0 < i <ppf.ω).
     uint8 e[MAX_ω];
     for (uint i = 0; i < ppf.ω; i++)
       e[i] = 0;

     // Initialize inverse-prime-powers.  This array allows for division by
     // p[i]**e[i] extremely quickly in the main loop below.  Note that 2-adic
     // inverses are not defined for even numbers (of which 2 is the only prime),
     // so powers of 2 must be handled specially.
     uint64 pe_inv[MAX_ω];
     for (uint i = 0; i < ppf.ω; i++)
     {
       uint64 pe = 1; for (uint j = 1; j <= ppf.e[i]; j++) pe *= ppf.p[i];
       pe_inv[i] = uint64_inv(pe);
     }

     uint64 n = 1;  // Current factor accumulator.
     for (uint k = 0; k < ppf.d; k++)   // k indexes into unsorted_factors[].
     {
       //printf("unsorted_factors[%u] = %"PRIu64"   j = %u\n", k, n, j);
       assert(ppf.n % n == 0);
       unsorted_factors[k].n = n;

       uint k1 = major_key(n);
       assert(k1 < ARRAY_CAPACITY(major_counts));
       unsorted_factors[k].k1 = k1;
       major_counts[k1] += 1;

       // Increment the remainder of the multi-radix number e[].
       for (uint i = 0; i < ppf.ω; i++)
       {
         if (e[i] == ppf.e[i])  // Carrying is occurring.
         {
           if (ppf.p[i] == 2)
             n >>= ppf.e[i];  // Divide n by 2 ** ppf.e[i].
           else
             n *= pe_inv[i];  // Divide n by ppf.p[i] ** ppf.e[i].

           e[i] = 0;
         }
         else  // Carrying is not occurring.
         {
           n *= ppf.p[i];
           e[i] += 1;
           break;
         }
       }
     }
     assert(n == 1);  // n always cycles back to 1, not to ppf.n.

     assert(unsorted_factors[ppf.d-1].n == ppf.n);
   }


   // --- Step 2:  Define the major bits array, the major spans array, the major
   //     index array, and count the total spans.

   uint32 total_spans = 0;
   {
     uint32 k = 0;
     for (uint k1 = 0; k1 < ARRAY_CAPACITY(major_counts); k1++)
     {
       uint32 count = major_counts[k1];
       uint32 bits = (count <= 1)? count : uint64_log2(count - 1) + 1;
       major_bits[k1] = bits;
       major_spans[k1] = (count > 0)? (UINT32_C(1) << bits) : 0;
       major_indexes[k1] = k;
       k += major_spans[k1];
     }
     total_spans = k;
   }


   // --- Step 3:  Allocate and populate the minor counts array.  Note that it
   //     must be initialized to zero.

   uint32 *minor_counts = calloc(total_spans, sizeof(*minor_counts));
   if (!minor_counts)
     fatal_error("Failed to allocate array of %"PRIu32" counts.", total_spans);
   memory_usage += total_spans * sizeof(*minor_counts);

   for (uint k = 0; k < ppf.d; k++)
   {
     const uint64 n = unsorted_factors[k].n;
     const uint k1 = unsorted_factors[k].k1;
     const uint k2 = minor_key(n, k1, major_bits[k1]);
     assert(k2 < major_spans[k1]);
     unsorted_factors[k].k2 = k2;
     minor_counts[major_indexes[k1] + k2] += 1;
   }


   // --- Step 4:  Define the minor indexes array.
   //
   // NOTE:  Instead of allocating a separate array, the earlier-allocated array
   // of minor indexes is simply repurposed here using an alias.

   uint32 *minor_indexes = minor_counts;  // Alias the array for repurposing.

   {
     uint32 k = 0;
     for (uint i = 0; i < total_spans; i++)
     {
       uint32 count = minor_counts[i];  // This array is the same array...
       minor_indexes[i] = k;            // ...as this array.
       k += count;
     }
   }


   // --- Step 5:  Populate the sorted factors array.  Note that the array must
   //              be initialized to zero earlier because values of zero are used
   //              as sentinels in the bucket lists.

   for (uint32 i = 0; i < ppf.d; i++)
   {
     uint64 n = unsorted_factors[i].n;
     const uint k1 = unsorted_factors[i].k1;
     const uint k2 = unsorted_factors[i].k2;

     // Insert factor into bucket using insertion sort (which happens to be
     // extremely fast because we know the bucket sizes are always very small).
     uint32 k;
     for (k = minor_indexes[major_indexes[k1] + k2];
          sorted_factors[k] != 0;
          k++)
     {
       assert(k < ppf.d);
       if (sorted_factors[k] > n)
         { uint64 t = sorted_factors[k]; sorted_factors[k] = n; n = t; }
     }
     sorted_factors[k] = n;
   }


   // --- Step 6:  Validate array of sorted factors.
   {
     for (uint32 k = 1; k < ppf.d; k++)
     {
       if (sorted_factors[k] == 0)
         fatal_error("Produced a factor of 0 at index %"PRIu32".", k);

       if (ppf.n % sorted_factors[k] != 0)
         fatal_error("Produced non-factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] == sorted_factors[k])
         fatal_error("Duplicate factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] > sorted_factors[k])
         fatal_error("Out-of-order factors %"PRIu64" and %"PRIu64" "
                     "at indexes %"PRIu32" and %"PRIu32".",
                     sorted_factors[k-1], sorted_factors[k], k-1, k);
     }
   }


   free(minor_counts);
   free(unsorted_factors);

   return sorted_factors;
 }

 //------------------------------------------------------------------------------
 // Compute prime-power factorization of a 64-bit value.  Note that this function
 // is designed to be fast *only* for numbers with very simple factorizations,
 // e.g., those that produce large factor lists.  Do not attempt to factor
 // large semiprimes with this function.  (The author does know how to factor
 // large numbers efficiently; however, efficient factorization is beyond the
 // scope of this small test program.)

 PrimePowerFactorization compute_ppf(const uint64 n)
 {
   PrimePowerFactorization ppf;

   if (n == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else if (n == 1)
   {
     ppf = (PrimePowerFactorization){ .p = { 1 }, .e = { 1 },
                                      .ω = 1, .Ω = 1, .d = 1, .n = 1 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = n };

     uint64 m = n;
     for (uint64 p = 2; p * p <= m; p += 1 + (p > 2))
     {
       if (m % p == 0)
       {
         assert(ppf.ω <= MAX_ω);
         ppf.p[ppf.ω] = p;
         ppf.e[ppf.ω] = 0;
         while (m % p == 0)
           { m /= p; ppf.e[ppf.ω] += 1; }
         ppf.d *= (1 + ppf.e[ppf.ω]);
         ppf.Ω += ppf.e[ppf.ω];
         ppf.ω += 1;
       }
     }
     if (m > 1)
     {
       assert(ppf.ω <= MAX_ω);
       ppf.p[ppf.ω] = m;
       ppf.e[ppf.ω] = 1;
       ppf.d *= 2;
       ppf.Ω += 1;
       ppf.ω += 1;
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Parse prime-power factorization from a list of ASCII-encoded base-10 strings.
 // The values are assumed to be 2-tuples (p,e) of prime p and exponent e.
 // Primes must not exceed 2^64 - 1.  Exponents must not exceed 2^8 - 1.  The
 // constructed value must not exceed 2^64 - 1.

 PrimePowerFactorization parse_ppf(const uint pairs, const char *const values[])
 {
   assert(pairs <= MAX_ω);

   PrimePowerFactorization ppf;

   if (pairs == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = 1 };

     for (uint i = 0; i < pairs; i++)
     {
       ppf.p[i] = (uint64)strtoumax(values[(i*2)+0], NULL, 10);
       ppf.e[i] =  (uint8)strtoumax(values[(i*2)+1], NULL, 10);

       // Validate prime value.
       if (ppf.p[i] < 2)  // (Ideally this would actually do a primality test.)
         fatal_error("Factor %"PRIu64" is invalid.", ppf.p[i]);

       // Accumulate count of unique prime factors.
       if (ppf.ω > UINT8_MAX - 1)
         fatal_error("Small-omega overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.ω += 1;

       // Accumulate count of total prime factors.
       if (ppf.Ω > UINT8_MAX - ppf.e[i])
         fatal_error("Big-omega wverflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.Ω += ppf.e[i];

       // Accumulate total divisor count.
       if (ppf.d > UINT32_MAX / (1 + ppf.e[i]))
         fatal_error("Divisor count overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.d *= (1 + ppf.e[i]);

       // Accumulate value.
       for (uint8 k = 1; k <= ppf.e[i]; k++)
       {
         if (ppf.n > UINT64_MAX / ppf.p[i])
           fatal_error("Value overflow at factor %"PRIu64".", ppf.p[i]);
         ppf.n *= ppf.p[i];
       }
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Main control.  Parse command line and produce list of factors.

 int main(const int argc, const char *const argv[])
 {
   PrimePowerFactorization ppf;

   uint values = (uint)argc - 1;  // argc is always guaranteed to be at least 1.

   if (values == 1)
   {
     ppf = compute_ppf((uint64)strtoumax(argv[1], NULL, 10));
   }
   else
   {
     if (values % 2 != 0)
       fatal_error("Odd number of arguments (%u) given.", values);
     uint pairs = values / 2;
     ppf = parse_ppf(pairs, &argv[1]);
   }

   // Run for (as close as possible to) a fixed amount of time, tallying the
   // elapsed CPU time.
   uint64 iterations = 0;
   double cpu_time = 0.0;
   const double cpu_time_limit = 0.05;
   while (cpu_time < cpu_time_limit)
   {
     clock_t clock_start = clock();
     uint64 *factors = compute_factors(ppf);
     clock_t clock_end = clock();
     cpu_time += (double)(clock_end - clock_start) / (double)CLOCKS_PER_SEC;

     if (++iterations == 1)
     {
       for (uint32 i = 0; i < ppf.d; i++)
         printf("%"PRIu64"\n", factors[i]);
     }

     if (factors) free(factors);
   }

   // Print the average amount of CPU time required for each iteration.
   uint mem_scale = (memory_usage >= 1e9)? 9:
                    (memory_usage >= 1e6)? 6:
                    (memory_usage >= 1e3)? 3:
                                           0;
   char *mem_units = (mem_scale == 9)? "GB":
                     (mem_scale == 6)? "MB":
                     (mem_scale == 3)? "KB":
                                        "B";

   printf("%"PRIu64"  %"PRIu32" factors  %.6f ms  %.3f ns/factor  %.3f %s\n",
          ppf.n,
          ppf.d,
          cpu_time/iterations * 1e3,
          cpu_time/iterations * 1e9 / (double)(ppf.d? ppf.d : 1),
          (double)memory_usage / pow(10, mem_scale),
          mem_units);

   return 0;
 }

/*==============================================================================

DESCRIPTION

   This is a small proof-of-concept program to test the idea of generating the
   factors of a number in ascending order using an ultra-efficient sortless
   method.


INPUT

   Input is given on the command line, either as a single argument giving the
   number to be factored or an even number of arguments giving the 2-tuples that
   comprise the prime-power factorization of the desired number.  For example,
   the number

      75600 = 2^4 x 3^3 x 5^2 x 7

   can be given by the following list of arguments:

      2 4 3 3 5 2 7 1

   Note:  If a single number is given, it will require factoring to produce its
   prime-power factorization.  Since this is just a small test program, a very
   crude factoring method is used that is extremely fast for small prime factors
   but extremely slow for large prime factors.  This is actually fine, because
   the largest factor lists occur with small prime factors anyway, and it is the
   production of large factor lists at which this program aims to be proficient.
   It is simply not interesting to be fast at producing the factor list of a
   number like 17293823921105882610 = 2 x 3 x 5 x 576460797370196087, because
   it has only 32 factors.  Numbers with tens or hundreds of thousands of
   factors are much more interesting.


OUTPUT

   Results are written to standard output.  A list of factors in ascending order
   is produced, followed by runtime required to generate the list (not including
   time to print it).


AUTHOR

   Todd Lehman
   2015/05/10

*/

//-----------------------------------------------------------------------------
#include <inttypes.h>
#include <limits.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#include <math.h>
#include <assert.h>

//-----------------------------------------------------------------------------
typedef  unsigned int  uint;
typedef  uint8_t       uint8;
typedef  uint16_t      uint16;
typedef  uint32_t      uint32;
typedef  uint64_t      uint64;
typedef  __uint128_t   uint128;

#define  UINT128_MAX  (uint128)(-1)

#define  UINT128_MAX_STRLEN  39

//-----------------------------------------------------------------------------
#define  ARRAY_CAPACITY(x)  (sizeof(x) / sizeof((x)[0]))

//-----------------------------------------------------------------------------
// This structure encode a single prime-power pair for the prime-power
// factorization of numbers, for example 3 to the 4th power.

#pragma pack(push, 8)
typedef struct
{
  uint128  p;  // Prime.
  uint8    e;  // Power (exponent).
}
PrimePower;   // 24 bytes using 8-byte packing
#pragma pack(pop)

//-----------------------------------------------------------------------------
// Prime-power factorization structure.
//
// This structure is sufficient to represent the prime-power factorization of
// all 128-bit values.  The field names ω and Ω are dervied from the standard
// number theory functions ω(n) and Ω(n), which count the number of unique and
// non-unique prime factors of n, respectively.  The field name d is derived
// from the standard number theory function d(n), which counts the number of
// divisors of n, including 1 and n.
//
// The maximum possible value here of ω is 26, which occurs at
// n = 232862364358497360900063316880507363070 = 2 x 3 x 5 x 7 x 11 x 13 x 17 x
// 19 x 23 x 29 x 31 x 37 x 41 x 43 x 47 x 53 x 59 x 61 x 67 x 71 x 73 x 79 x
// 83 x 89 x 97 x 101, which has 26 unique prime factors.
//
// The maximum possible value of Ω here is 127, which occurs at n = 2^127 and
// n = 2^126 x 3, both of which have 127 non-unique prime factors.
//
// The maximum possible value of d here is 318504960, which occurs at
// n = 333939014887358848058068063658770598400 = 2^9 x 3^5 x 5^2 x 7^2 x 11^2 x
// 13^2 x 17 x 19 x 23 x 29 x 31 x 37 x 41 x 43 x 47 x 53 x 59 x 61 x 67 x 71 x
// 73 x 79.
//
#pragma pack(push, 8)
typedef struct
{
  PrimePower  f[32];  // Primes and their exponents.
  uint8       ω;      // Count of prime factors without multiplicity.
  uint8       Ω;      // Count of prime factors with multiplicity.
  uint32      d;      // Count of factors of n, including 1 and n.
  uint128     n;      // Value of n on which all other fields depend.
}
PrimePowerFactorization;  // 656 bytes using 8-byte packing
#pragma pack(pop)

#define  MAX_ω  26
#define  MAX_Ω  127

//-----------------------------------------------------------------------------
// Fatal error:  print error message and abort.

void fatal_error(const char *format, ...)
{
  va_list args;
  va_start(args, format);
  vfprintf(stderr, format, args);
  exit(1);
}

//------------------------------------------------------------------------------
uint128 uint128_from_string(const char *const str)
{
  assert(str != NULL);

  uint128 n = 0;
  for (int i = 0; isdigit(str[i]); i++)
    n = (n * 10) + (uint)(str[i] - '0');

  return n;
}

//------------------------------------------------------------------------------
void uint128_to_string(const uint128 n,
                       char *const strbuf, const uint strbuflen)
{
  assert(strbuf != NULL);
  assert(strbuflen >= UINT128_MAX_STRLEN + 1);

  // Extract digits into string buffer in reverse order.
  uint128 a = n;
  char *s = strbuf;
  do { *(s++) = '0' + (uint)(a % 10); a /= 10; } while (a != 0);
  *s = '\0';

  // Reverse the order of the digits.
  uint l = strlen(strbuf);
  for (uint i = 0; i < l/2; i++)
    { char t = strbuf[i]; strbuf[i] = strbuf[l-1-i]; strbuf[l-1-i] = t; }

  // Verify result.
  assert(uint128_from_string(strbuf) == n);
}

//------------------------------------------------------------------------------
char *uint128_to_static_string(const uint128 n, const uint i)
{
  static char str[2][UINT128_MAX_STRLEN + 1];
  assert(i < ARRAY_CAPACITY(str));
  uint128_to_string(n, str[i], ARRAY_CAPACITY(str[i]));
  return str[i];
}

//------------------------------------------------------------------------------
// Compute sorted list of factors, given a prime-power factorization.

uint128 *compute_factors(const PrimePowerFactorization ppf)
{
  const uint128 n =       ppf.n;
  const uint    d = (uint)ppf.d;
  const uint    ω = (uint)ppf.ω;

  if (n == 0)
    return NULL;

  uint128 *factors = malloc((d + 1) * sizeof(*factors));
  if (!factors)
    fatal_error("Failed to allocate array of %u factors.", d);
  uint128 *const factors_end = &factors[d];


  // --- Seed the factors[] array.

  factors_end[0] = 0;   // Dummy value to simplify looping in bottleneck code.
  factors_end[-1] = 1;  // Seed value.

  if (n == 1)
    return factors;


  // --- Iterate over all prime factors.

  uint range = 1;
  for (uint i = 0; i < ω; i++)
  {
    const uint128 p = ppf.f[i].p;
    const uint    e = ppf.f[i].e;

    // --- Initialize phantom input lists and output list.
    assert(e < 128);
    assert(range < d);
    uint128 *restrict in[128];
    uint128 pe[128], f[128];
    for (uint j = 0; j <= e; j++)
    {
      in[j] = &factors[d - range];
      pe[j] = (j == 0)? 1 : pe[j-1] * p;
      f[j] = pe[j];
    }
    uint active_list_count = 1 + e;
    range *= 1 + e;
    uint128 *restrict out = &factors[d - range];

    // --- Merge phantom input lists to output list, until all input lists are
    //     extinguished.
    while (active_list_count > 0)
    {
      if (active_list_count == 1)
      {
        assert(out == in[0]);
        while (out != factors_end)
          *(out++) *= pe[0];
        in[0] = out;
      }
      else if (active_list_count == 2)
      {
        // This section of the code is the bottleneck of the entire factor-
        // producing algorithm.  Other portions need to be fast, but this
        // *really* needs to be fast; therefore, it has been highly optimized.
        // In fact, it is by far most frequently the case here that pe[0] is 1,
        // so further optimization is warranted in this case.
        uint128 f0 = f[0], f1 = f[1];
        uint128 *in0 = in[0], *in1 = in[1];
        const uint128 pe0 = pe[0], pe1 = pe[1];

        if (pe[0] == 1)
        {
          while (true)
          {
            if (f0 < f1)
              { *(out++) = f0; f0 = *(++in0);
                if (in0 == factors_end) break; }
            else
              { *(out++) = f1; f1 = *(++in1) * pe1; }
          }
        }
        else
        {
          while (true)
          {
            if (f0 < f1)
              { *(out++) = f0; f0 = *(++in0) * pe0;
                if (in0 == factors_end) break; }
            else
              { *(out++) = f1; f1 = *(++in1) * pe1; }
          }
        }

        f[0] = f0; f[1] = f1;
        in[0] = in0; in[1] = in1;
      }
      else if (active_list_count == 3)
      {
        uint128 f0 = f[0], f1 = f[1], f2 = f[2];
        uint128 *in0 = in[0], *in1 = in[1], *in2 = in[2];
        const uint128 pe0 = pe[0], pe1 = pe[1], pe2 = pe[2];

        while (true)
        {
          if (f0 < f1)
          {
            if (f0 < f2)
              { *(out++) = f0; f0 = *(++in0) * pe0;
                if (in0 == factors_end) break; }
            else
              { *(out++) = f2; f2 = *(++in2) * pe2; }
          }
          else
          {
            if (f1 < f2)
              { *(out++) = f1; f1 = *(++in1) * pe1; }
            else
              { *(out++) = f2; f2 = *(++in2) * pe2; }
          }
        }

        f[0] = f0; f[1] = f1, f[2] = f2;
        in[0] = in0; in[1] = in1, in[2] = in2;
      }
      else if (active_list_count >= 3)
      {
        while (true)
        {
          // Chose the smallest multiplier.
          uint k_min = 0;
          uint128 f_min = f[0];
          for (uint k = 0; k < active_list_count; k++)
            if (f[k] < f_min)
              { f_min = f[k]; k_min = k; }

          // Write the output factor, advance the input pointer, and
          // produce a new factor in the array f[] of list heads.
          *(out++) = f_min;
          f[k_min] = *(++in[k_min]) * pe[k_min];
          if (in[k_min] == factors_end)
            { assert(k_min == 0); break; }
        }
      }

      // --- Remove the newly emptied phantom input list.  Note that this is
      //     guaranteed *always* to be the first remaining non-empty list.
      assert(in[0] == factors_end);
      for (uint j = 1; j < active_list_count; j++)
      {
        in[j-1] = in[j];
        pe[j-1] = pe[j];
         f[j-1] =  f[j];
      }
      active_list_count -= 1;
    }

    assert(out == factors_end);
  }


  // --- Validate array of sorted factors.
  #ifndef NDEBUG
  {
    for (uint k = 0; k < d; k++)
    {
      if (factors[k] == 0)
        fatal_error("Produced a factor of 0 at index %u.", k);

      if (n % factors[k] != 0)
        fatal_error("Produced non-factor %s at index %u.",
                    uint128_to_static_string(factors[k], 0), k);

      if ((k > 0) && (factors[k-1] == factors[k]))
        fatal_error("Duplicate factor %s at index %u.",
                    uint128_to_static_string(factors[k], 0), k);

      if ((k > 0) && (factors[k-1] > factors[k]))
        fatal_error("Out-of-order factors %s and %s at indexes %u and %u.",
                    uint128_to_static_string(factors[k-1], 0),
                    uint128_to_static_string(factors[k], 1),
                    k-1, k);
    }
  }
  #endif


  return factors;
}

//------------------------------------------------------------------------------
// Print prime-power factorization of a number.

void print_ppf(const PrimePowerFactorization ppf)
{
  printf("%s = ", uint128_to_static_string(ppf.n, 0));
  if (ppf.n == 0)
  {
    printf("0");
  }
  else
  {
    for (uint i = 0; i < ppf.ω; i++)
    {
      if (i > 0)
        printf(" x ");

      printf("%s", uint128_to_static_string(ppf.f[i].p, 0));

      if (ppf.f[i].e > 1)
        printf("^%"PRIu8"", ppf.f[i].e);
    }
  }
  printf("\n");
}

//------------------------------------------------------------------------------
int compare_powers_ascending(const void *const pf1,
                             const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.e < f2.e)?  -1:
          (f1.e > f2.e)?  +1:
                           0;  // Not an error; duplicate exponents are common.
}

//------------------------------------------------------------------------------
int compare_powers_descending(const void *const pf1,
                              const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.e < f2.e)?  +1:
          (f1.e > f2.e)?  -1:
                           0;  // Not an error; duplicate exponents are common.
}

//------------------------------------------------------------------------------
int compare_primes_ascending(const void *const pf1,
                             const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.p < f2.p)?  -1:
          (f1.p > f2.p)?  +1:
                           0;  // Error; duplicate primes must never occur.
}

//------------------------------------------------------------------------------
int compare_primes_descending(const void *const pf1,
                              const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.p < f2.p)?  +1:
          (f1.p > f2.p)?  -1:
                           0;  // Error; duplicate primes must never occur.
}

//------------------------------------------------------------------------------
// Sort prime-power factorization.

void sort_ppf(PrimePowerFactorization *const ppf,
              const bool primes_major,      // Best false
              const bool primes_ascending,  // Best false
              const bool powers_ascending)  // Best false
{
  int (*compare_primes)(const void *, const void *) =
    primes_ascending? compare_primes_ascending : compare_primes_descending;

  int (*compare_powers)(const void *, const void *) =
    powers_ascending? compare_powers_ascending : compare_powers_descending;

  if (primes_major)
  {
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_powers);
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_primes);
  }
  else
  {
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_primes);
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_powers);
  }
}

//------------------------------------------------------------------------------
// Compute prime-power factorization of a 128-bit value.  Note that this
// function is designed to be fast *only* for numbers with very simple
// factorizations, e.g., those that produce large factor lists.  Do not attempt
// to factor large semiprimes with this function.  (The author does know how to
// factor large numbers efficiently; however, efficient factorization is beyond
// the scope of this small test program.)

PrimePowerFactorization compute_ppf(const uint128 n)
{
  PrimePowerFactorization ppf;

  if (n == 0)
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
  }
  else if (n == 1)
  {
    ppf = (PrimePowerFactorization){ .f[0] = { .p = 1, .e = 1 },
                                     .ω = 1, .Ω = 1, .d = 1, .n = 1 };
  }
  else
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = n };

    uint128 m = n;
    for (uint128 p = 2; p * p <= m; p += 1 + (p > 2))
    {
      if (m % p == 0)
      {
        assert(ppf.ω <= MAX_ω);
        ppf.f[ppf.ω].p = p;
        ppf.f[ppf.ω].e = 0;
        while (m % p == 0)
          { m /= p; ppf.f[ppf.ω].e += 1; }
        ppf.d *= (1 + ppf.f[ppf.ω].e);
        ppf.Ω += ppf.f[ppf.ω].e;
        ppf.ω += 1;
      }
    }
    if (m > 1)
    {
      assert(ppf.ω <= MAX_ω);
      ppf.f[ppf.ω].p = m;
      ppf.f[ppf.ω].e = 1;
      ppf.d *= 2;
      ppf.Ω += 1;
      ppf.ω += 1;
    }
  }

  return ppf;
}

//------------------------------------------------------------------------------
// Parse prime-power factorization from a list of ASCII-encoded base-10 strings.
// The values are assumed to be 2-tuples (p,e) of prime p and exponent e.
// Primes must not exceed 2^128 - 1 and must not be repeated.  Exponents must
// not exceed 2^8 - 1, but can of course be repeated.  The constructed value
// must not exceed 2^128 - 1.

PrimePowerFactorization parse_ppf(const uint pairs, const char *const values[])
{
  assert(pairs <= MAX_ω);

  PrimePowerFactorization ppf;

  if (pairs == 0)
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
  }
  else
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = 1 };

    for (uint i = 0; i < pairs; i++)
    {
      ppf.f[i].p = uint128_from_string(values[(i*2)+0]);
      ppf.f[i].e = (uint8)strtoumax(values[(i*2)+1], NULL, 10);

      // Validate prime value.
      if (ppf.f[i].p < 2)  // (Ideally this would actually do a primality test.)
        fatal_error("Factor %s is invalid.",
                    uint128_to_static_string(ppf.f[i].p, 0));

      // Accumulate count of unique prime factors.
      if (ppf.ω > UINT8_MAX - 1)
        fatal_error("Small-omega overflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.ω += 1;

      // Accumulate count of total prime factors.
      if (ppf.Ω > UINT8_MAX - ppf.f[i].e)
        fatal_error("Big-omega wverflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.Ω += ppf.f[i].e;

      // Accumulate total divisor count.
      if (ppf.d > UINT32_MAX / (1 + ppf.f[i].e))
        fatal_error("Divisor count overflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.d *= (1 + ppf.f[i].e);

      // Accumulate value.
      for (uint8 k = 1; k <= ppf.f[i].e; k++)
      {
        if (ppf.n > UINT128_MAX / ppf.f[i].p)
          fatal_error("Value overflow at factor %s.",
                      uint128_to_static_string(ppf.f[i].p, 0));
        ppf.n *= ppf.f[i].p;
      }
    }
  }

  return ppf;
}

//------------------------------------------------------------------------------
// Main control.  Parse command line and produce list of factors.

int main(const int argc, const char *const argv[])
{
  bool primes_major     = false;
  bool primes_ascending = false;
  bool powers_ascending = false;

  PrimePowerFactorization ppf;


  // --- Parse prime-power sort specifier (if present).

  uint value_base = 1;
  uint value_count = (uint)argc - 1;
  if ((argc > 1) && (argv[1][0] == '-'))
  {
    static const struct
    {
      char *str; bool primes_major, primes_ascending, powers_ascending;
    }
    sort_options[] =
    {
                        // Sorting criteria:
                        // ----------------------------------------
      { "ep", 0,0,0 },  // Exponents descending, primes descending
      { "Ep", 0,0,1 },  // Exponents ascending, primes descending
      { "eP", 0,1,0 },  // Exponents descending, primes ascending
      { "EP", 0,1,1 },  // Exponents ascending, primes ascending
      { "p",  1,0,0 },  // Primes descending (exponents irrelevant)
      { "P",  1,1,0 },  // Primes ascending (exponents irrelevant)
    };

    bool valid = false;
    for (uint i = 0; i < ARRAY_CAPACITY(sort_options); i++)
    {
      if (strcmp(&argv[1][1], sort_options[i].str) == 0)
      {
        primes_major     = sort_options[i].primes_major;
        primes_ascending = sort_options[i].primes_ascending;
        powers_ascending = sort_options[i].powers_ascending;
        valid = true;
        break;
      }
    }

    if (!valid)
      fatal_error("Bad sort specifier: \"%s\"", argv[1]);

    value_base += 1;
    value_count -= 1;
  }


  // --- Prime factorization from either a number or a raw prime factorization.

  if (value_count == 1)
  {
    uint128 n = uint128_from_string(argv[value_base]);
    ppf = compute_ppf(n);
  }
  else
  {
    if (value_count % 2 != 0)
      fatal_error("Odd number of arguments (%u) given.", value_count);
    uint pairs = value_count / 2;
    ppf = parse_ppf(pairs, &argv[value_base]);
  }


  // --- Sort prime factorization by either the default or the user-overridden
  //     configuration.

  sort_ppf(&ppf, primes_major, primes_ascending, powers_ascending);
  print_ppf(ppf);


  // --- Run for (as close as possible to) a fixed amount of time, tallying the
  //     elapsed CPU time.

  uint128 iterations = 0;
  double cpu_time = 0.0;
  const double cpu_time_limit = 0.10;
  uint128 memory_usage = 0;
  while (cpu_time < cpu_time_limit)
  {
    clock_t clock_start = clock();
    uint128 *factors = compute_factors(ppf);
    clock_t clock_end = clock();
    cpu_time += (double)(clock_end - clock_start) / (double)CLOCKS_PER_SEC;
    memory_usage = sizeof(*factors) * ppf.d;

    if (++iterations == 0) //1)
    {
      for (uint32 i = 0; i < ppf.d; i++)
        printf("%s\n", uint128_to_static_string(factors[i], 0));
    }

    if (factors) free(factors);
  }


  // --- Print the average amount of CPU time required for each iteration.

  uint memory_scale = (memory_usage >= 1e9)? 9:
                      (memory_usage >= 1e6)? 6:
                      (memory_usage >= 1e3)? 3:
                                             0;
  char *memory_units = (memory_scale == 9)? "GB":
                       (memory_scale == 6)? "MB":
                       (memory_scale == 3)? "KB":
                                            "B";

  printf("%s  %"PRIu32" factors  %.6f ms  %.3f ns/factor  %.3f %s\n",
         uint128_to_static_string(ppf.n, 0),
         ppf.d,
         cpu_time/iterations * 1e3,
         cpu_time/iterations * 1e9 / (double)(ppf.d? ppf.d : 1),
         (double)memory_usage / pow(10, memory_scale),
         memory_units);

  return 0;
}