Join 结合两个rdd-Pypark

Join 结合两个rdd-Pypark,join,pyspark,rdd,Join,Pyspark,Rdd,我有两个RDD,如下所示 r1 = [(u'5971', u'COLOR > RED', 599),(u'5131', u'MEN > BOW TIES > ALL IN COLLECTION', 599)] # id, category, price r2 = [(u'5131', 1), (u'5971', 1), (u'8347', 1)] # id, quantity 我希望结果如下所示: r3 = [(u'5131', ('MEN > BOW TIES &

我有两个RDD,如下所示

r1 = [(u'5971', u'COLOR > RED', 599),(u'5131', u'MEN > BOW TIES > ALL IN COLLECTION', 599)]  # id, category, price

r2 = [(u'5131', 1), (u'5971', 1), (u'8347', 1)] # id, quantity
我希望结果如下所示:

r3 = [(u'5131', ('MEN > BOW TIES > ALL IN COLLECTION',599)), (u'5971', ('COLOR > RED',599)]
我试过以下方法:

r3 = r1.join(r2)
但最终的r3 rdd中缺少价格

r1 = [(u'5971', u'COLOR > RED', 599),(u'5131', u'MEN > BOW TIES > ALL IN COLLECTION', 599)]  # id, category, price

r2 = [(u'5131', 1), (u'5971', 1), (u'8347', 1)] # id, quantity

r1_modified = r1.map(lambda (l,m,n):(l,(m,n)))

r3 = r1_modified.join(r2).map(lambda (l,m,n):(l,m))