Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/ios/108.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
pyspark:使用reduceByKey聚合后写入文件_Pyspark_Apache Spark Sql - Fatal编程技术网

pyspark:使用reduceByKey聚合后写入文件

pyspark:使用reduceByKey聚合后写入文件,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,我的代码如下所示: sc = SparkContext("local", "App Name") eventRDD = sc.textFile("file:///home/cloudera/Desktop/python/event16.csv") outRDDExt = eventRDD.filter(lambda s: "Topic" in s).map(lambda s: s.split('|')) outRDDExt2 = outRDDExt.keyBy(lambda x: (x[1],x

我的代码如下所示:

sc = SparkContext("local", "App Name")
eventRDD = sc.textFile("file:///home/cloudera/Desktop/python/event16.csv")
outRDDExt = eventRDD.filter(lambda s: "Topic" in s).map(lambda s: s.split('|'))
outRDDExt2 = outRDDExt.keyBy(lambda x: (x[1],x[2][:-19]))
outRDDExt3 = outRDDExt2.mapValues(lambda x: 1)
outRDDExt4 = outRDDExt3.reduceByKey(lambda x,y: x + y)
outRDDExt4.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")
当前输出文件如下所示: ((u'Topic',u'2017/05/08'),15)

我想要的文件是:

u'Topic',u'2017/05/08',15


如何获得上述输出(即从当前输出中除去元组等?

您可以手动展开元组并将所有元素作为字符串连接起来

outRDDExt4\
.map(lambda row : ",".join([row[0][1],row[0][1],str(row[1])])\
.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")