如何每月按事务聚合此数据集?(Hadoop Mapreduce Python聚合函数)

如何每月按事务聚合此数据集?(Hadoop Mapreduce Python聚合函数),python,hadoop,aggregate,Python,Hadoop,Aggregate,我有一个显示交易的数据集。我正在尝试使用MapReduce聚合这些数据,以显示事务计数,并按月显示它们 我有一个以这种格式显示事务的数据集。它有7个字段,用逗号分隔。 下面是一些字段的示例 block_number,from_address,to_address,value,gas,gas_price,block_timestamp 4391310,0x40349c34b15f6df84bad1b8ae79bd43c800acfda,0xb64ef51c888972c908cfacf59b47

我有一个显示交易的数据集。我正在尝试使用MapReduce聚合这些数据,以显示事务计数,并按月显示它们

我有一个以这种格式显示事务的数据集。它有7个字段,用逗号分隔。 下面是一些字段的示例

block_number,from_address,to_address,value,gas,gas_price,block_timestamp

4391310,0x40349c34b15f6df84bad1b8ae79bd43c800acfda,0xb64ef51c888972c908cfacf59b47c1afbc0ab8ac,0,36688,1,1508442025
4391310,0xeb1c0a44167ed59385f3158c92ba5aa3d32d27c1,0xb64ef51c888972c908cfacf59b47c1afbc0ab8ac,0,36752,1,1508442025
4391310,0x26d87be2b72eb5942c471d8f9c14029cda55db79,0xb64ef51c888972c908cfacf59b47c1afbc0ab8ac,0,36752,1,1508442025
2412045,0x515967c3f02451356461c24d70fa39325257f018,0xbfc39b6f805a9e40e77291aff27aee3c96915bdd,1016457550000000000,40000,21897464574,1476066980
2412045,0x0f2df1dce827c075ef303ab8bbcceef8aee6dc52,0xbfc39b6f805a9e40e77291aff27aee3c96915bdd,999517090000000000,40000,21897464574,1476066980

所以我希望输出是这样的:

block_timestamp, count

"10-2017" 3
"10-2016" 2
下面是我编写的MapReduce代码,但是当我在数据集上运行它(使用hadoop)时,我从输出中什么也得不到,我想知道我做错了什么

from mrjob.job import MRJob
import time

class cw_partA (MRJob):
    def mapper(self,_, line):
        try:
            fields = line.split(',')    #splits fields  by the commas
            if len(fields) == 7 :       #there are 7 fields in each line
                time = int(fields[6]) #time   #convert timestamp
                month = time.strftime("%m-%Y",time.gmtime(time)) #returns year and month
                yield (month, 1)

        except:
            pass
            #do nothing

    def combiner(self, month, counts):
        yield (month, sum(counts))

    def reducer(self, day, counts):
        yield (month, sum(counts))

if __name__ == '__main__':
    cw_partA.run()

任何帮助都将不胜感激。

您需要使用MapReduce吗?如今,它实际上是一种过时的技术。考虑一下火花。