Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/user-interface/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pyspark/Python-在不丢失列的情况下使用MIN/MAX_Python_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Pyspark/Python-在不丢失列的情况下使用MIN/MAX

Pyspark/Python-在不丢失列的情况下使用MIN/MAX,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,我有这样一个数据帧: ---------------------------------------------- | User_ID | Timestamp | Article_ID | ---------------------------------------------- | 121212 | 2018-01-15 10:00:00 | 1 | | 121212 | 2018-01-15 10:05:00 | 11 | | 12

我有这样一个数据帧:

----------------------------------------------
| User_ID |      Timestamp      | Article_ID |
----------------------------------------------
| 121212  | 2018-01-15 10:00:00 |      1     |
| 121212  | 2018-01-15 10:05:00 |      11    |
| 121212  | 2018-01-15 10:10:00 |      12    |
| 989898  | 2018-01-15 17:30:00 |      100   |
| 989898  | 2018-01-15 17:40:00 |      200   |
| 989898  | 2018-01-15 17:50:00 |      1     |
| 989898  | 2018-01-15 17:55:00 |      11    |
|...      |                     |            |
----------------------------------------------
现在,我想要每个用户ID具有最小时间戳的行。 结果应该是:

----------------------------------------------
| User_ID |      Timestamp      | Article_ID |
----------------------------------------------
| 121212  | 2018-01-15 10:00:00 |      1     |
| 989898  | 2018-01-15 17:30:00 |      100   |
|...      |                     |            |
----------------------------------------------
我尝试了以下方法:

df.groupBy('User_ID').agg(F.min('Timestamp')).show()
这还不错,但是“Article_ID”列丢失了。。。
有人能帮我吗?

我找到了一个功能结构有效的解决方案:

df.select('User_ID',F.struct('Timestamp','Article_ID').alias("TA")).groupBy('User_ID').agg(F.min("TA").alias("TA")).select('User_ID','TA.Timestamp','TA.Article_ID').orderBy('User_ID').limit(10).toPandas()

另请参阅原始资料:

如果您使用其他答案的解决方案,请不要忘记。谢谢您的提示。我添加了原始来源。。。