Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pyspark:RDD(具有令牌列表)到RDD(具有每行一个令牌)_Python_List_Apache Spark_Pyspark - Fatal编程技术网

Python Pyspark:RDD(具有令牌列表)到RDD(具有每行一个令牌)

Python Pyspark:RDD(具有令牌列表)到RDD(具有每行一个令牌),python,list,apache-spark,pyspark,Python,List,Apache Spark,Pyspark,我有一个带有标记的列表,例如: mylist = [['hello'], ['cat'], ['dog'], ['hey'], ['dog'], ['I', 'need', 'coffee'], ['dance'], ['dream', 'job']] myRDD = sc.parallelize(mylist) 我正在努力寻找一种可能导致RDD的方法,

我有一个带有标记的列表,例如:

mylist = [['hello'],
          ['cat'],
          ['dog'],
          ['hey'],
          ['dog'],
          ['I', 'need', 'coffee'],
          ['dance'],
          ['dream', 'job']]

myRDD = sc.parallelize(mylist)
我正在努力寻找一种可能导致RDD的方法,其中每一行都是一个令牌。我期望的输出是:

[['hello'],
['cat'],
['dog'],
['hey'],
['dog'],
['I'], 
['need'], 
['coffee'],
['dance'],
['dream'], 
['job']]

这个的正确语法是什么?谢谢你

只是
flatMap

myRDD.flatMap(lambda xs: ([x] for x in xs))

只是
flatMap

myRDD.flatMap(lambda xs: ([x] for x in xs))

假设您的意思是
x
vs
[x]
或者更简单地说
myRDD.flatMap(lambda xs:xs)
[x]
实际上是正确的,问题需要单元素列表的RDD。假设您的意思是
x
vs
[x]
或者更简单地说
myRDD.flatMap(lambda xs:xs)
[x]
,问题要求单元素列表的RDD。。