Python 获取数据帧列及其值作为pyspark中的变量

Python 获取数据帧列及其值作为pyspark中的变量,python,pyspark,Python,Pyspark,我正在使用pyspark从mysql表中获取数据,如下所示 df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifn

我正在使用
pyspark
mysql
表中获取数据,如下所示

df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, ifnull(max(test_time),'1900-01-01 00:00:00') as maxtime FROM `{}`) as `{}`".format(table, table)).option("user", "{}".format(mysql_user)).option("password", "{}".format(password)).load()
 max_val = df.select('maxval').collect()[0].asDict()['maxval']
 min_val = df.select('minval').collect()[0].asDict()['minval']
 max_time = df.select('maxtime').collect()[0].asDict()['maxtime']
 min_time = df.select('mintime').collect()[0].asDict()['mintime']
df.show()
的结果如下

+------+------+-------------------+-------------------+
|maxval|minval|            mintime|            maxtime|
+------+------+-------------------+-------------------+
|  1721|     1|2017-03-09 22:15:49|2017-12-14 05:17:04|
+------+------+-------------------+-------------------+
现在我想分别得到列及其值

我想去

max_valval = 1721
min_valval = 1
min_timetime = 2017-03-09 22:15:49
max_timetime = 2017-12-14 05:17:04
我做了如下的事情

df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, ifnull(max(test_time),'1900-01-01 00:00:00') as maxtime FROM `{}`) as `{}`".format(table, table)).option("user", "{}".format(mysql_user)).option("password", "{}".format(password)).load()
 max_val = df.select('maxval').collect()[0].asDict()['maxval']
 min_val = df.select('minval').collect()[0].asDict()['minval']
 max_time = df.select('maxtime').collect()[0].asDict()['maxtime']
 min_time = df.select('mintime').collect()[0].asDict()['mintime']

pyspark
中是否有更好的方法可以做到这一点,目前您使用
collect
4次,这是非常划算的。您可以尝试一些python技能来完成这项工作。我有一种方法,您可以尝试:-

df = (sqlContext.read.format("jdbc")
    .option("url", "{}:{}/{}".format(domain,port,mysqldb))
    .option("driver", "com.mysql.jdbc.Driver")
    .option("dbtable", """(
        select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, 
               ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, 
               ifnull(max(test_time), '1900-01-01 00:00:00') as maxtime 
         FROM `{}`) as `{}`""".format(table, table))
    .option("user", "{}".format(mysql_user))
    .option("password", "{}".format(password)).load())


for key, value in df.first().asDict().items():
    globals()[key] = value

print minval
print maxval
print mintime
print maxtime

通过这种方式,您可以将列转换为变量。如果您需要进一步的帮助,请告诉我。

目前您正在使用
collect
4次,这很划算。您可以尝试一些python技能来完成这项工作。我有一种方法,您可以尝试:-

df = (sqlContext.read.format("jdbc")
    .option("url", "{}:{}/{}".format(domain,port,mysqldb))
    .option("driver", "com.mysql.jdbc.Driver")
    .option("dbtable", """(
        select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, 
               ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, 
               ifnull(max(test_time), '1900-01-01 00:00:00') as maxtime 
         FROM `{}`) as `{}`""".format(table, table))
    .option("user", "{}".format(mysql_user))
    .option("password", "{}".format(password)).load())


for key, value in df.first().asDict().items():
    globals()[key] = value

print minval
print maxval
print mintime
print maxtime

通过这种方式,您可以将列转换为变量。如果您需要进一步的帮助,请告诉我。

我收到以下错误
回溯(最近一次调用最后一次):文件“”,第1行,在ValueError中:太多的值无法解包
@Question\u bank只是一个小错误,我忘记在
之后添加
。asDict().items()
。我修改了答案,您现在可以查看。我得到以下错误
回溯(最近一次调用上次):文件“”,第1行,在ValueError中:太多的值无法解包
@Question\u bank只是一个小错误,我忘记在
之后添加
项。我修改了答案,你们现在可以检查了。