Apache spark 连接rdd列表中的元素
我的RDD如下所示Apache spark 连接rdd列表中的元素,apache-spark,pyspark,rdd,Apache Spark,Pyspark,Rdd,我的RDD如下所示 >>> rdd.collect() [([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])] 如何获得新的RDD as [([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])] 我尝试用JOIN将其映射到新的rd
>>> rdd.collect()
[([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
如何获得新的RDD as
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
我尝试用JOIN将其映射到新的rdd,但它不起作用我能够解决这个问题
>>> rdd2=rdd.map(lambda l: [''.join(x) for x in l])
>>> rdd2.map(tuple).collect()
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]