Python PySpark show_profile（）不使用数据帧API操作打印任何内容_Python_Apache Spark_Pyspark_Apache Spark Sql_Cprofile

Python PySpark show_profile（）不使用数据帧API操作打印任何内容

python apache-spark pyspark

Python PySpark show_profile（）不使用数据帧API操作打印任何内容,python,apache-spark,pyspark,apache-spark-sql,cprofile,Python,Apache Spark,Pyspark,Apache Spark Sql,Cprofile,Pyspark使用cProfile并根据RDDAPI的文档工作，但是在运行一系列DataFrame API操作之后，似乎没有办法让探查器打印结果 from pyspark import SparkContext, SQLContext sc = SparkContext() sqlContext = SQLContext(sc) rdd = sc.parallelize([('a', 0), ('b', 1)]) df = sqlContext.createDataFrame(rdd) rdd.

Pyspark使用cProfile并根据RDDAPI的文档工作，但是在运行一系列DataFrame API操作之后，似乎没有办法让探查器打印结果

from pyspark import SparkContext, SQLContext
sc = SparkContext()
sqlContext = SQLContext(sc)
rdd = sc.parallelize([('a', 0), ('b', 1)])
df = sqlContext.createDataFrame(rdd)
rdd.count()         # this ACTUALLY gets profiled :)
sc.show_profiles()  # here is where the profiling prints out
sc.show_profiles()  # here prints nothing (no new profiling to show)

rdd.count()         # this ACTUALLY gets profiled :)
sc.show_profiles()  # here is where the profiling prints out in DataFrame API

df.count()          # why does this NOT get profiled?!?
sc.show_profiles()  # prints nothing?!

# and again it works when converting to RDD but not 

df.rdd.count()      # this ACTUALLY gets profiled :)
sc.show_profiles()  # here is where the profiling prints out

df.count()          # why does this NOT get profiled?!?
sc.show_profiles()  # prints nothing?!

这是预期的行为

与提供本机Python逻辑的

RDD

API不同，

DataFrame

SQL

API是JVM本机的。除非调用Python

udf

*（包括

udf

），否则不会在工作机器上执行Python代码。所有这些都是在Python端完成的，只是通过Py4j网关进行简单的API调用

因此，不存在任何分析信息

*请注意，

udf

似乎也被排除在评测之外。

我尝试了

df.groupby（''u 1'）.count（）.collect（）

，它显然既有操作也有转换，但仍然没有打印输出。很遗憾，我可以确认python udf没有评测。