Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 连接rdd列表中的元素_Apache Spark_Pyspark_Rdd - Fatal编程技术网

Apache spark 连接rdd列表中的元素

Apache spark 连接rdd列表中的元素,apache-spark,pyspark,rdd,Apache Spark,Pyspark,Rdd,我的RDD如下所示 >>> rdd.collect() [([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])] 如何获得新的RDD as [([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])] 我尝试用JOIN将其映射到新的rd

我的RDD如下所示

>>> rdd.collect()
[([u'steve'], [u'new', u'york'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
如何获得新的RDD as

[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]
我尝试用JOIN将其映射到新的rdd,但它不起作用

我能够解决这个问题

>>> rdd2=rdd.map(lambda l: [''.join(x) for x in l])
>>> rdd2.map(tuple).collect()
[([u'steve'], [u'newyork'], [u'baseball']), ([u'smith'], [u'virginia'], [u'football'])]