Apache spark 什么'；spark sql百分位函数和spark数据帧量化器之间的差异是多少？_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 什么'；spark sql百分位函数和spark数据帧量化器之间的差异是多少？

apache-spark

Apache spark 什么'；spark sql百分位函数和spark数据帧量化器之间的差异是多少？,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我想把双倍分数转换成整数。我试过火花量化器，但速度太慢了。完成离散化过程需要几个小时。但当我使用spark sql的函数percentile时，它比quantilediscretizer快得多。那么这两种方法之间的区别是什么？spark sql中实现了哪些优化？默认值不提供任何优化。在内部，它使用naiveTypeDimOperativeAggregate，它收集所有值的计数（请参见和），然后使用结果进行计算这种方法唯一真正的优点是它非常简单。但是，在最坏的情况下，它需要O（N）个本地内存（感

我想把双倍分数转换成整数。我试过火花量化器，但速度太慢了。完成离散化过程需要几个小时。但当我使用spark sql的函数percentile时，它比quantilediscretizer快得多。那么这两种方法之间的区别是什么？spark sql中实现了哪些优化？

默认值不提供任何优化。在内部，它使用naive

TypeDimOperativeAggregate

，它收集所有值的计数（请参见和），然后使用结果进行计算

这种方法唯一真正的优点是它非常简单。但是，在最坏的情况下，它需要O（N）个本地内存（感兴趣的列中的所有值都是唯一的），因此它不可扩展，只能在相当有限的情况下应用

相比之下，适用于（已修改）的

QuantileDiscretizer

Spark's。这种方法的计算成本更高，但与蛮力近似法不同，它是可伸缩的，并且受数据基数的限制。此外，它的性能可以通过调整

相对误差
来调整，这是非常值得赞赏的。你的回答让我长久以来的困惑变得清晰！谢谢！




[hive]相关文章推荐



                                                        
Hive 配置单元RegexSerDe null
hive 
Hive 配置单元读取映射值
hive 
Hive 如何在配置单元0.13+；
hive 
无法从jdbc配置单元客户端（HIVESERVER2）访问配置单元的表
hive 
Hive 带有'；选择*'；在子选择中
hive 
Hive 0.14-Hbase 0.98.8集成问题
hivehbase 
Hive 如何在UDF初始化方法中读取配置单元配置变量
hive 
Hive 配置单元外部表无法读取已分区的hdfs目录
hive 
Spark HiveContext-从外部分区配置单元表分隔符读取问题
hive 
Hive PXF配置单元插件，仅选择查询中选定的列
hive 
Hive 有没有办法改变存储为ORC的配置单元表中的列？
hive 
Hive 配置单元将字符串转换为字符数组
hive 
HiveQL：选择列的值与另一列的最大值（值）配对
hive 
Hive 在配置单元表中使用多个值进行分区？
hive 
Hive 我无法将数据（由set声明）插入到日分区表中
hive 
Hive 为presto配置配置单元元存储，并从s3和ApacheKudu查询数据
hive 
Hive TSocket读取0字节（代码THRIFTTRANSPORT）：ttTransportException（'；TSocket读取0字节'；，）
hive 
Hive 配置单元创建表脚本-AWS数据管道
hive 
Hive 如何将所有映射器加载到配置单元表中？
hive 
Hive 如何执行配置单元SQL IF/ELSE查询？
hive 
                                       





随机文章推荐



                                                        
Zend framework 如何在一个Zend_表单中将两个按钮放在一行中
zend-frameworklayout 
Zend framework 条令模式更改为id自动增量
zend-frameworkormsymfony1doctrine 
Zend framework 布局中的Zend_Dojo_表单
zend-frameworkdojo 
Zend framework 将Zend“U Form”元素ID设置为“Q”时出现问题；id"；
zend-framework 
Zend framework ZF/条令问题
zend-frameworkdoctrine 
Zend framework Zend+；Doctrine2：如何使用ArrayCollections（）正确刷新实体？
zend-frameworkormdoctrinedoctrine-orm 
Zend framework ZF检查是否允许用户访问资源
zend-framework 
Zend framework 原则2，不能从db中删除引用
zend-frameworkdoctrinedoctrine-orm 
Zend framework Zend Framework-路由：通过参数调用操作
zend-framework 
Zend framework 使用Varnish/ESI和Zend框架缓存和页面视图
我有几个场景，几个月后我将最终需要考虑。把问题抛出去，这样我就可以在讨论的同时仔细考虑了
zend-frameworkcaching 
Zend framework Zend框架奇怪的类加载
zend-framework 
Zend framework 无法在zend framework中重命名上载的图像
zend-framework 
Zend framework Zend Framework:Form_MyForm未在blah中找到。。。这是一个引导问题吗？
zend-framework 
Zend framework 限制zend multiselect中的最大选择数
zend-framework 
Zend framework Zend Library中同名文件和文件夹的用途是什么？
zend-framework 
Zend framework Zend自动加载器播种错误
zend-framework 
Zend framework Zend：如何创建图像选择表单并使用Zend上传图像？
zend-framework 
Zend framework 如何动态禁用Zend Framework 2上的特定模块
zend-frameworkzend-framework2 
Zend framework Zend_Form：正确渲染图像元素
zend-framework 
Zend framework Zend framework 3中的Authenticationservice
zend-framework


                                        

                                        
                                        


                                                
                                                        [apache spark]相关推荐
                                                        
Apache spark 在Spark Streaming中有没有一种方法可以从嵌套目录中流式传输文件？
									Apache Spark
							 
Apache spark Spark UI显示0个核心，即使在应用程序中设置核心
									Apache Spark
							 									Pyspark
							 
Apache spark 如何在Spark应用程序中显示语句序列的逐步执行？
									Apache Spark
							 									Pyspark
							 
Apache spark 无法在Apache Spark中读取和稍后查询文本文件
									Apache Spark
							 
Apache spark ApacheSpark：如何进行不同的计数和一起计数？
									Apache Spark
							 
Apache spark 在Pyspark HiveContext中，SQL偏移量的等效值是什么？
									Apache Spark
							 									Hive
							 									Pyspark
							 
Apache spark 如何通过针对S3的Spark Streaming提高性能
									Apache Spark
							 									Amazon S3
							 
Apache spark 获取可用执行者的数量
									Apache Spark
							 
Apache spark 如何在spark流媒体中定期更新rdd
									Apache Spark
							 
Apache spark 如何知道bin/run示例转换为什么？
									Apache Spark
							 
Apache spark 使用Spark JDBC时DataFrame列的自定义数据类型
									Apache Spark
							 									Jdbc
							 
Apache spark Spark结构化流媒体[2.2.x或2.3.x]应用程序如何发出信号，表示它准备从卡夫卡主题中消费
提出问题
									Apache Spark
							 									Apache Kafka
							 
Apache spark apachespark和hadoop之间的Jar冲突
									Apache Spark
							 									Hadoop
							 
Apache spark Spark：在不更改列的可空属性的情况下强制转换十进制
									Apache Spark
							 
Apache spark Pyspark窗口订购者
									Apache Spark
							 									Pyspark
							 
Apache spark 使用spark读取sql server日志文件（ldf）
									Apache Spark
							 
Apache spark 如何在Spark Streaming中创建到数据源的连接以进行查找
									Apache Spark
							 									Redis
							 
Apache spark Apache Spark Log4j日志记录应用程序ID
									Apache Spark
							 									Logging
							 									Log4j
							 
Apache spark 用于管理重新计算的批处理框架
									Apache Spark
							 									Apache Flink
							 									Airflow
							 
Apache spark 启用检查点的Spark streaming SQS
									Apache Spark
							 
Apache spark 如何删除Spark DataFrame中只有一列中具有相同值的行
									Apache Spark
							 
Apache spark 如何基于GroupBy列的值选择列，而不知道Spark中的任何特定值
									Apache Spark
							 
Apache spark 创建外部分区表GCP Bucket
									Apache Spark
							 									Google Cloud Platform
							 									Hive
							 
Apache spark 在pyspark中将两个数据帧中的一个数据帧作为单独的子列
									Apache Spark
							 									Pyspark
							 
Apache spark Spark window函数，并在每个分区的每列中获取第一个和最后一个值（在窗口上聚合）
									Apache Spark
							 									Pyspark
							 
Apache spark Dataproc中Apache Beam上的管道转换日志记录
									Apache Spark
							 									Logging
							 
Apache spark 如何使用Spark streaming进行实时日志分析？？（附建筑图）
									Apache Spark
							 									Pyspark
							 
Apache spark Spark 3.0 UTC到AKST转换失败，ZoneRulesException:未知时区ID
									Apache Spark
							 
Apache spark 将字符串转换为时间戳
									Apache Spark
							 									Pyspark
							 
Apache spark 无法同时在两个活动的Jupyter笔记本会话上加载拼花地板文件
									Apache Spark
							 									Hadoop
							 									Pyspark
							 									Jupyter Notebook
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Razor
Google Colaboratory
Clearcase
Visual Studio 2015
Prolog
Twig
Wso2
Bash
Macos
Zurb Foundation
Jquery Plugins
Random
Amazon S3
Solr
Version Control
Pip
Requirejs
Linux Kernel
Tags
Dependency Injection
Zend Framework
Sharepoint 2010
Mariadb
Sprite Kit
Join
Pycharm
Chef Infra
C# 3.0
Sockets
Log4j
Node.js
Http
Sqlalchemy
Fluent Nhibernate
Css
Pytorch
Servlets
Firebase
Webgl
Puppet
Prometheus
Build
Jakarta Ee
Uml
Debugging
Reflection
Vmware
Youtube
Drop Down Menu
Polymer
Single Sign On
C
Formatting
Jira
Rspec
Objective C
Django Models
Recursion
Mapreduce
Mercurial
Vim
Matrix
Import
Exception
Twitter
Netlogo
3d
Shell
Angular
Tcp
Time Complexity
Bison
Swift2
Api
D
Pandas
List
Exchange Server
Antlr
Fiware
Signalr
Subsonic
Kotlin
Methods
EmptyTag
Rdf
Jersey
Asp.net Mvc 5
Ms Access
Drupal
Linkedin
Go
Domain Driven Design
Csv
Service
Ravendb
Active Directory
Pointers
Terminal
Math
File Upload
Telegram
Odoo
Ms Office
Jaxb
Geometry
Google Maps
Bluetooth
Keras
Express
Xsd
Optimization
Graphics
Delphi
Vaadin
Grep
Gis
Process
Deep Learning
Django Rest Framework
Aws Lambda
Monitoring
Json
Redux
Makefile
Vb6
Azure Functions
.htaccess
Replace
Cluster Computing
Openshift
Windows Store Apps
Next.js
Sass
Uitableview
Streaming
Lisp
View
Nunit
Mysql
Coffeescript
Caching
Image Processing
Swing
Sql
Jetty
Selenium Webdriver
Gruntjs
Web Scraping
Encryption
Report
Ruby
Scikit Learn
Yaml
Jquery
Concurrency
React Native
Coding Style
Aurelia
Jestjs
Variables
E Commerce
Cocos2d Iphone
Internet Explorer
Processing
Performance
Sparql
Windows Runtime
Log4net
Algorithm
Awk
Google Analytics
Phpmyadmin
Angular Material
Timer
Pine Script
Vuejs2
Sap
Marklogic
Dialogflow Es
Protractor
Tfs
Download
Rxjs
Arrays
Multithreading
Enums
Ibm Midrange
Swift
Jmeter
Hive
Smtp
Gnuplot
Language Agnostic
Ocaml
Encoding
Unicode
Cordova
Nsis
Javafx
Dart


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网