Apache spark 如何在pyspark中映射for循环中行的单列值_Apache Spark_Pyspark_Pyspark Sql

Apache spark 如何在pyspark中映射for循环中行的单列值

apache-spark pyspark

Apache spark 如何在pyspark中映射for循环中行的单列值,apache-spark,pyspark,pyspark-sql,Apache Spark,Pyspark,Pyspark Sql,上面的代码在for循环中给出了错误。如何在不改变上述映射的情况下打印单个列，因为我想进一步将代码编写为 from pyspark.sql import HiveContext hive_context = HiveContext(sc) test = hive_context.table("dbname.tablename") iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id)) for it

上面的代码在for循环中给出了错误。如何在不改变上述映射的情况下打印单个列，因为我想进一步将代码编写为

from pyspark.sql import HiveContext  
hive_context = HiveContext(sc)  
test = hive_context.table("dbname.tablename")    
iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
 for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id) 
   print (v)

运行时出现错误的原因：

for ite in iterate.collect():  
   for ite11 in secondtable.collect() :   
       if ite.metric_id.find(ite11.column1)  
         result.append(ite , ite11)

collect（）的结果不是RDD，而是python列表（或类似的东西）

map
可以在RDD上执行，不能在python列表上执行。

通常不建议在spark中使用

collect（）

以下各项应无误地执行类似操作：

for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id)

运行时出错的原因：

for ite in iterate.collect():  
   for ite11 in secondtable.collect() :   
       if ite.metric_id.find(ite11.column1)  
         result.append(ite , ite11)

collect（）的结果不是RDD，而是python列表（或类似的东西）

map
可以在RDD上执行，不能在python列表上执行。

通常不建议在spark中使用

collect（）

以下各项应无误地执行类似操作：

for ite in iterate.collect() :       
   v = ite.map(lambda p:p.metric_id)

最后，我得到了另一个解决方案，将for循环中的单个列值映射为

iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
   v = iterate.map(lambda (survey_date,pro_catg,metric_id): metric_id)
   print (v.collect())

它很好用。我们可以使用

find

作为

for ite in iterate.collect():
  for itp in prod.collect():    
    if itp[0] in ite[1]: result.append(p)   
print(result)

最后，我得到了另一个解决方案，将for循环中的单个列值映射为

iterate = test.map(lambda p:(p.survey_date,p.pro_catg,p.metric_id))  
   v = iterate.map(lambda (survey_date,pro_catg,metric_id): metric_id)
   print (v.collect())

它很好用。我们可以使用

find

作为

for ite in iterate.collect():
  for itp in prod.collect():    
    if itp[0] in ite[1]: result.append(p)   
print(result)

我有一个问题。提前谢谢`我有一个问题。提前谢谢`我有个问题。提前谢谢！我有个问题。提前谢谢！