Apache spark 如何使用spark dataframe API按最大值（日期）进行选择_Apache Spark_Spark Dataframe - Fatal编程技术网

Apache spark 如何使用spark dataframe API按最大值（日期）进行选择

apache-spark

Apache spark 如何使用spark dataframe API按最大值（日期）进行选择,apache-spark,spark-dataframe,Apache Spark,Spark Dataframe,给定以下数据集 id v date 1 a1 1 1 a2 2 2 b1 3 2 b2 4 我只想为每个id选择最后一个值（关于日期）我想出了以下代码： scala> val df = sc.parallelize(List((41,"a1",1), (1, "a2", 2), (2, "b1", 3), (2, "b2", 4))).toDF("id", "v", "date") df: org.apache.spark.sql.DataFrame = [id: int,

给定以下数据集

id v  date
1  a1 1
1  a2 2
2  b1 3
2  b2 4

我只想为每个id选择最后一个值（关于日期）

我想出了以下代码：

scala> val df = sc.parallelize(List((41,"a1",1), (1, "a2", 2), (2, "b1", 3), (2, "b2", 4))).toDF("id", "v", "date")
df: org.apache.spark.sql.DataFrame = [id: int, v: string, date: int]

scala> val agg = df.groupBy("id").max("date")
agg: org.apache.spark.sql.DataFrame = [id: int, max(date): int]

scala> val res = df.join(agg, df("id") === agg("id") && df("date") === agg("max(date)"))
16/11/14 22:25:01 WARN sql.Column: Constructing trivially true equals predicate, 'id#3 = id#3'. Perhaps you need to use aliases.
res: org.apache.spark.sql.DataFrame = [id: int, v: string, date: int, id: int, max(date): int]

有没有更好的方法（更地道的…）

奖励：如何对日期列执行最大值并避免此错误

聚合函数只能应用于数字列。

？

您可以使用最大值函数尝试

agg（）

：

导入静态org.apache.spark.sql.functions*
df.groupBy（“id”）.agg（max（“date”））

对我来说，它只能以以下方式工作：

df = df.groupBy('CPF').agg({'DATA': 'max'})

您可以尝试从_unixtime函数中使用

来在日期字段中应用agg
。我不确定这是否有效，但值得尝试SQL：按id从tmp_表组中选择最大（日期）作为mdate，id；




[ms word]相关文章推荐



                                                        
Ms word 单词合并字段通配符不正确匹配
ms-word 
Ms word 使用OpenXML替换Word 2010文本框中的内容
ms-word 
Ms word OpenXml和Word：如何计算WrapPolygon坐标？
ms-word 
Ms word 如何检查一个表与Word中已存在的表重叠
ms-word 
Ms word MS Word 2007-如何设置占位符文本以模拟文本而不是格式化
ms-word 
Ms word 如何将左右边距设置为绝对0
ms-word 
Ms word 是否可以在快速零件公式中使用DOCPROPERTY？（Word 2007）
ms-wordms-office 
Ms word 为什么打开word文档时会创建以~$开头的文件？
ms-wordms-office 
Ms word 设置通过docx中的内容控件绑定的内容大小写
ms-word 
Ms word 将Microsoft Word文本和公式作为mathml和文本一起复制
ms-word 
Ms word 如何在Word 2016中的文本中创建可编辑的文档属性？
ms-wordms-office 
Ms word 如何将word文档插入MarkLogic？
ms-wordmarklogic 
Ms word 设置单词格式的方式，使一行出现在距页面边框给定距离的位置？
ms-word 
Ms word 如何使用OpenXMLSDK获取MS Word总页数？
ms-word 
Ms word Word加载项和JavaScript，段落.getHtml（）转换为图像
ms-wordoffice-js 
Ms word TinyMCE编辑器：无法从MS Word粘贴内容
ms-wordtinymce 
Ms word 宏以在MS Word中粘贴超链接
ms-word 
Ms word 点击一个单词可以加粗吗？
ms-word 
                                       





随机文章推荐



                                                        
Salesforce 我可以获取apex:outputText的区域设置格式吗？
salesforce 
Salesforce 使用SOQL将文件从私有库移动到公共库
salesforce 
Salesforce 使用带有正则表达式组件的验证规则时出现问题
salesforce 
Salesforce:具有动态值的选取列表（多个）
salesforce 
访问链接到salesforce案例的开放式活动字段
salesforce 
Salesforce元数据API部署未推送系统权限
salesforce 
使用SOAP API禁用SalesForce触发器
salesforce 
Salesforce 顶如何从名称字段引用FirstName？
salesforce 
Salesforce sosl查询通配符未返回正确的结果
salesforce 
Salesforce-Apex-基于活动历史记录查询帐户
salesforce 
两表Salesforce SOQL查询
salesforce 
如何以编程方式在Salesforce中作废DocuSign信封？
salesforcedocusignapi 
Salesforce和vk.com API集成
salesforce 
如何创建像Mailchimp这样的Salesforce应用程序
salesforce 
Salesforce 使用JS函数在新选项卡中打开Docusign
salesforcedocusignapi 
Salesforce如何更改pageBlockTable的颜色
salesforce 
对现有表使用Matillion中的Salesforce增量加载组件
salesforce 
如何在与Salesforce平台EventBus对话的Comed客户端中刷新令牌？
salesforce 
Salesforce Lightning Web组件专家超级徽章：挑战7
salesforce 
使用“接受”按钮从队列接受Salesforce中的自定义对象记录
salesforce


                                        

                                        
                                        


                                                
                                                        [apache spark]相关推荐
                                                        
Apache spark 在从设备上设置mesos主设备的ip
									Apache Spark
							 
Apache spark RStudio中SparkR.init（master="；local"；中的SparkR错误
									Apache Spark
							 
Apache spark 合并独立spark上的拼花地板文件
									Apache Spark
							 
Apache spark Spark Streaming是否像Flink一样支持迭代？
									Apache Spark
							 									Apache Flink
							 
Apache spark 在何处使用Cloudera Manager检查纱线集群中当前运行的作业数？
									Apache Spark
							 
Apache spark “调试”；“检测到托管内存泄漏”；在Spark 1.6.0中
									Apache Spark
							 
Apache spark 分组和求和后的RDD排序
									Apache Spark
							 									Pyspark
							 
Apache spark 齐柏林飞艇火花Maxmind jackson.databind NoTouchMethodError
									Apache Spark
							 
Apache spark 如何在EC2火花簇上训练深层神经网络（tensorflow）？
									Apache Spark
							 									Amazon Ec2
							 									Tensorflow
							 									Deep Learning
							 
Apache spark 为什么PySpark会随机失败；“插座已关闭”；错误？
									Apache Spark
							 									Pyspark
							 
Apache spark Apache Spark或Spark Cassandra连接器看起来不像是在并行读取多个分区？
									Apache Spark
							 									Cassandra
							 
Apache spark 在群集上运行Spark:初始作业未接受任何资源
我有一个远程Ubuntu服务器，有4个内核和8G内存
我的远程Ubuntu服务器上有一个Spark-2集群，由1个主服务器和1个从服务器组成
									Apache Spark
							 									Dataframe
							 									Pyspark
							 
Apache spark 为什么硬代码重新划分值
									Apache Spark
							 
Apache spark 为什么Spark Streaming仅在终止后才启动作业？
									Apache Spark
							 
Apache spark 修改jupyter内核以在spark中添加cassandra连接
									Apache Spark
							 									Cassandra
							 									Pyspark
							 									Jupyter Notebook
							 
Apache spark Spark在一系列值中查找空值块
									Apache Spark
							 
Apache spark 如何在spark shell中启用或获取跟踪URL？
									Apache Spark
							 									Hadoop
							 
Apache spark 宽数据集的Spark性能非常慢
									Apache Spark
							 
Apache spark 实例化JavaStreamingContext时发生AbstractMethodError异常
									Apache Spark
							 
Apache spark 读取时是否忽略Spark中已排序文件的拼花地板摘要文件（_元数据）？
									Apache Spark
							 									Hadoop
							 
Apache spark 火花逻辑回归套索运行非常缓慢
									Apache Spark
							 									Pyspark
							 
Apache spark DataProc群集Spark作业提交无法启动NodeManager
									Apache Spark
							 									Google Cloud Platform
							 
Apache spark 如何避免pyspark数据帧上每次转换的重复评估
									Apache Spark
							 									Pyspark
							 
Apache spark 使用sparkSql自动重新连接数据库（postgres）
									Apache Spark
							 
Apache spark 在spark中将任意日期格式转换为DD-MM-YYYY hh:MM:ss
									Apache Spark
							 									Hive
							 
Apache spark Pyspark中从字符串到日期时间（yyyy-mm-dd hh:mm:ss）的转换
									Apache Spark
							 									Pyspark
							 
Apache spark Spark和Executors在本地模式下的行为
									Apache Spark
							 
Apache spark 它'；可以用spark配置来配置Beam便携式转轮吗？
太长，读不下去了
									Apache Spark
							 									Kubernetes
							 
Apache spark 如何删除pyspark中的常量列，而不是包含null和其他值的列？
									Apache Spark
							 									Pyspark
							 
Apache spark 什么是用于读取特殊字符的配置单元编码字符集
									Apache Spark
							 									Hive
							 									Character Encoding
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Pycharm
Routes
Dependencies
Chef Infra
Glassfish
Karate
Smalltalk
Vagrant
Python 2.7
Permissions
Identityserver4
Sorting
Parallel Processing
Artifactory
Mediawiki
Nginx
Erlang
Video Streaming
Error Handling
Sockets
Electron
Mobile
Object
Linux
Websocket
Db2
Wix
Collections
X86
C++11
Nest
Web Applications
Azure Functions
Sapui5
Animation
Asp.net Core
Variables
Module
Snmp
Mips
Adobe
Class
Salesforce
Ravendb
Youtube
Highcharts
Libgdx
Bash
If Statement
Pagination
Sas
Lua
Ios7
Windbg
Printing
Canvas
Google Drive Api
Robotframework
Openerp
Google Cloud Storage
Rust
Cmake
Sharepoint 2010
Documentation
Scroll
Clang
Gitlab
Dataframe
Graphql
Visual Studio 2017
Applescript
Google Visualization
Firefox
Machine Learning
Serialization
Path
Time Complexity
Floating Point
Email
Plugins
Drupal 6
Gmail
Angular
Gnuplot
Java 8
Colors
Webpack
Google App Maker
Sbt
Symfony
Dart
Jsf 2
Sencha Touch
Mpi
Gulp
Csv
Github
Requirejs
Actions On Google
Outlook
Sharepoint 2013
Node.js
Elm
Windows Phone 8
Stripe Payments
Transactions
Google Cloud Firestore
Flask
Jsf
Timer
Pdf
Scikit Learn
Udp
Axapta
Pointers
Itext
Azure Service Fabric
Stored Procedures
Entity Framework 4
Networking
Phantomjs
Nservicebus
Sml
Google Api
Cocoa
Firebase
Spring Cloud
Python 3.x
Struts2
Jvm
Deployment
Mono
Ckeditor
Vue.js
Opencl
Protocol Buffers
Apache Nifi
Express
Jar
Ruby On Rails
Cakephp
Botframework
Tcp
Functional Programming
Firefox Addon
Events
Kubernetes
Calendar
Ag Grid
Isabelle
Push Notification
Swift2
Opengl Es
Tensorflow
Azure Devops
Maven
Windows Phone 7
Vuejs2
Openshift
Flutter
Exception
Jdbc
Vaadin
Ssh
Apache Flex
Grep
Twitter
Actionscript 3
Ruby On Rails 3.1
Breeze
Data Binding
Office Js
Redis
Matrix
Liferay
Installation
Blackberry
Webrtc
Serial Port
Asp.net Mvc 2
Image Processing
Cors
Playframework 2.0
Groovy
Internet Explorer
Coq
Templates
Log4net
Scrapy
Centos
Twitter Bootstrap 3
Corda
Marklogic
Yii
Https
Cassandra
Perl
Cocos2d X
Javafx
React Native
Android Ndk


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网