Sorting 如何在hadoop流媒体中对带有数字的文件名进行二次排序？_Sorting_Hadoop_Hadoop Streaming_Hadoop Partitioning_Secondary Sort

Sorting 如何在hadoop流媒体中对带有数字的文件名进行二次排序？

sorting hadoop

Sorting 如何在hadoop流媒体中对带有数字的文件名进行二次排序？,sorting,hadoop,hadoop-streaming,hadoop-partitioning,secondary-sort,Sorting,Hadoop,Hadoop Streaming,Hadoop Partitioning,Secondary Sort,我正在尝试对文件名进行排序，例如 cat1.pdf, cat2.pdf, ... cat10.pdf ... 我现在正在使用具有以下参数的排序： -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator -D stream.num.map.output.key.fields=2 -D mapre

我正在尝试对文件名进行排序，例如

    cat1.pdf, cat2.pdf, ... cat10.pdf ...

我现在正在使用具有以下参数的排序：

    -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator 
    -D stream.num.map.output.key.fields=2 
    -D mapreduce.partition.keypartitioner.options="-k1,1" 
    -D mapreduce.partition.keycomparator.options="-k1,1 -k2,2 -V" 
    -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

键-值对由以文件名为值、以字符串为键的选项卡分隔。问题是我现在的排序是对文件名进行二次排序，以便

    cat1.pdf, cat10.pdf, cat2.pdf, cat3.pdf, cat30.pdf ...

我如何获得文件，使其按如下方式排序：

    cat1.pdf, cat2.pdf, cat3.pdf ... cat10.pdf,cat11.pdf...

我正在使用hadoop streaming 2.7.1

试试这个不适用于hadoop streaming的，尽管试试这个不适用于hadoop streaming的