Python 获取数据帧列及其值作为pyspark中的变量
我正在使用Python 获取数据帧列及其值作为pyspark中的变量,python,pyspark,Python,Pyspark,我正在使用pyspark从mysql表中获取数据,如下所示 df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifn
pyspark
从mysql
表中获取数据,如下所示
df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, ifnull(max(test_time),'1900-01-01 00:00:00') as maxtime FROM `{}`) as `{}`".format(table, table)).option("user", "{}".format(mysql_user)).option("password", "{}".format(password)).load()
max_val = df.select('maxval').collect()[0].asDict()['maxval']
min_val = df.select('minval').collect()[0].asDict()['minval']
max_time = df.select('maxtime').collect()[0].asDict()['maxtime']
min_time = df.select('mintime').collect()[0].asDict()['mintime']
df.show()
的结果如下
+------+------+-------------------+-------------------+
|maxval|minval| mintime| maxtime|
+------+------+-------------------+-------------------+
| 1721| 1|2017-03-09 22:15:49|2017-12-14 05:17:04|
+------+------+-------------------+-------------------+
现在我想分别得到列及其值
我想去
max_valval = 1721
min_valval = 1
min_timetime = 2017-03-09 22:15:49
max_timetime = 2017-12-14 05:17:04
我做了如下的事情
df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval, ifnull(min(test_time),'1900-01-01 00:00:00') as mintime, ifnull(max(test_time),'1900-01-01 00:00:00') as maxtime FROM `{}`) as `{}`".format(table, table)).option("user", "{}".format(mysql_user)).option("password", "{}".format(password)).load()
max_val = df.select('maxval').collect()[0].asDict()['maxval']
min_val = df.select('minval').collect()[0].asDict()['minval']
max_time = df.select('maxtime').collect()[0].asDict()['maxtime']
min_time = df.select('mintime').collect()[0].asDict()['mintime']
在
pyspark
中是否有更好的方法可以做到这一点,目前您使用collect
4次,这是非常划算的。您可以尝试一些python技能来完成这项工作。我有一种方法,您可以尝试:-
df = (sqlContext.read.format("jdbc")
.option("url", "{}:{}/{}".format(domain,port,mysqldb))
.option("driver", "com.mysql.jdbc.Driver")
.option("dbtable", """(
select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval,
ifnull(min(test_time),'1900-01-01 00:00:00') as mintime,
ifnull(max(test_time), '1900-01-01 00:00:00') as maxtime
FROM `{}`) as `{}`""".format(table, table))
.option("user", "{}".format(mysql_user))
.option("password", "{}".format(password)).load())
for key, value in df.first().asDict().items():
globals()[key] = value
print minval
print maxval
print mintime
print maxtime
通过这种方式,您可以将列转换为变量。如果您需要进一步的帮助,请告诉我。目前您正在使用
collect
4次,这很划算。您可以尝试一些python技能来完成这项工作。我有一种方法,您可以尝试:-
df = (sqlContext.read.format("jdbc")
.option("url", "{}:{}/{}".format(domain,port,mysqldb))
.option("driver", "com.mysql.jdbc.Driver")
.option("dbtable", """(
select ifnull(max(id),0) as maxval, ifnull(min(id),0) as minval,
ifnull(min(test_time),'1900-01-01 00:00:00') as mintime,
ifnull(max(test_time), '1900-01-01 00:00:00') as maxtime
FROM `{}`) as `{}`""".format(table, table))
.option("user", "{}".format(mysql_user))
.option("password", "{}".format(password)).load())
for key, value in df.first().asDict().items():
globals()[key] = value
print minval
print maxval
print mintime
print maxtime
通过这种方式,您可以将列转换为变量。如果您需要进一步的帮助,请告诉我。我收到以下错误
回溯(最近一次调用最后一次):文件“”,第1行,在ValueError中:太多的值无法解包
@Question\u bank只是一个小错误,我忘记在之后添加项
。asDict().items()
。我修改了答案,您现在可以查看。我得到以下错误回溯(最近一次调用上次):文件“”,第1行,在ValueError中:太多的值无法解包
@Question\u bank只是一个小错误,我忘记在之后添加项。我修改了答案,你们现在可以检查了。