Scala Spark：读取表并按分区过滤_Scala_Apache Spark - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark：读取表并按分区过滤_Scala_Apache Spark - Fatal编程技术网

Scala Spark：读取表并按分区过滤

scala apache-spark

Scala Spark：读取表并按分区过滤,scala,apache-spark,Scala,Apache Spark,我想了解Spark的评价有一个表table_name，它由partition_列进行分区。它是一个以拼花形式存储的外部桌子。现在，考虑下面的行 val df = spark.read.table(table_name).filter(partition_column=partition_value) 由于Spark的惰性计算，它是否将应用谓词下推并仅扫描分区\列=分区\值的文件夹？或者它将读取整个表格并稍后过滤掉？尝试。解释以查看结果但是拼花地板确实被压下了变换、过滤器、映射等都融合在

我想了解Spark的评价

有一个表table_name，它由partition_列进行分区。它是一个以拼花形式存储的外部桌子。现在，考虑下面的行

val df = spark.read.table(table_name).filter(partition_column=partition_value)

由于Spark的惰性计算，它是否将应用谓词下推并仅扫描分区\列=分区\值的文件夹？或者它将读取整个表格并稍后过滤掉？

尝试。解释以查看结果
但是拼花地板确实被压下了
变换、过滤器、映射等都融合在一起。懒惰的一面确实是正确的，尽管你在一句话中就做到了

所以，答案是肯定的，Spark将生成代码以在源代码处进行过滤
为了给答案添加细节，如果您有一些列，您需要将所有
partition\u column
列包含到
filter
谓词中。请参见列筛选器的限制

[apache spark]相关文章推荐

Apache spark 避免"；“任务不可序列化”；在类中使用嵌套方法 apache-spark

Apache spark 为什么顶点不使用Graph API显示？ apache-spark

Apache spark ApacheSparkUI中的自定义指标 apache-spark

Apache spark Spark:Shuffle with Parallelism=1 apache-spark

Apache spark 如果两个阶段使用相同的RDD，spark是否读取相同的文件两次？ apache-spark

Apache spark 在存储为RDD元素的列表中查找唯一的元素集 apache-spark pyspark

Apache spark 是否有任何方法可以在流上下文启动后创建数据流而不停止流上下文 apache-spark

Apache spark Spark数据集选择性重新计算 apache-spark

Apache spark Cassandra复制是否会降低其他DC的分析性能，反之亦然？ apache-spark cassandra

Apache spark Spark RDD：根据文本文件格式进行分区 apache-spark hadoop

Apache spark Amazon EMR集群中的无限递归（StackOverflowError） apache-spark

Apache spark 从Pyspark中的文本文件创建流 apache-spark pyspark

Apache spark Spark 2.2连接失败，数据集庞大 apache-spark join cluster-computing

Apache spark 为什么Spark broadcast exchange数据大小大于连接时的原始大小？ apache-spark

Apache spark pyspark—FPGrowth：转换如何处理看不见的事务？ apache-spark pyspark

Apache spark sparksql中的Oracle INSTR等价物 apache-spark

Apache spark PySpark-获取具有相同值的数组元素数 apache-spark pyspark

Apache spark 未能执行'；表'；在org.apache.spark.sql.SparkSession上 apache-spark hive

Apache spark pyspark中基于时间范围的数据帧分割 apache-spark pyspark

Apache spark “访问”；表$partitions"；通过sparksql apache-spark pyspark

随机文章推荐

Aframe 使用A帧动画组件对任何类型的物理现象进行建模是否可能（并且合适，以及如何使用）？ aframe

Aframe “选择自己的能力”；点击"；激光控制中的事件？ aframe

网络化aframe：禁用外观控件也会禁用wasd控件 aframe

[scala]相关推荐

Tags

Asp.net Amazon Dynamodb Unity3d Db2 Outlook Stream Npm .net Core Jasmine Nhibernate Firefox Apache Camel Asp.net Mvc 4 Blackberry Sitecore Mule Sharepoint 2010 Curl Msbuild Apache Storm Magento Prometheus Jvm Windows Wordpress Gradle Angular Nginx Node.js Cuda Angular Material Log4j Sencha Touch Google Chrome Computer Science Sublimetext2 Notifications Apache Spark Hybris Orientdb Spring Mvc Content Management System Internet Explorer 8 Ruby On Rails 3.1 Jenkins Objective C Linq Bazel Uiview D Post Visual Studio 2012 Identityserver4 Docker Matrix Kendo Ui Dependencies Drupal 6 Animation Xamarin Bluetooth Protocol Buffers Parsing Websphere Date Doctrine Dynamics Crm 2011 Spring Integration Functional Programming Jasper Reports Sap Compilation Primefaces Cakephp Jupyter Notebook Merge Elixir Windows Phone 8.1 Stanford Nlp Aem Tridion Java Me Service Twitter Paypal Ftp Stata Asp.net Web Api Memory Management Logstash Amazon Web Services Csv Scripting Jsp Webgl Routes Azure Devops Visual Studio 2017 Nsis Time Ignite Ada Performance Google Bigquery Parameters Qt Linux Xquery Kubernetes Vb6 Audio Silverstripe Artificial Intelligence Eclipse Plugin Java 8 Maven Menu Zend Framework Coq Bootstrap 4 Prestashop Directory Windows 8 F# Synchronization Ibm Midrange Safari C Abap Instagram Three.js Tinymce Azure Service Fabric Erlang Struct Django Models Log4net Path Checkbox Telegram Optimization Jquery Ui Talend Verilog Alfresco Testng Xcode4 Url Xpath Login Polymer Selenium Actionscript Orm Oracle11g Angular6 Cron Highcharts Aurelia Swift3 Mongodb Snmp E Commerce Redux Logic Process Sockets Error Handling Sass Ruby On Rails 3 Material Ui Shell Raspberry Pi Canvas Cucumber Algorithm Notepad++ Swift C# 3.0 Laravel Map Loops Reflection Swiftui Stm32 Events Spring Cloud Mdx Dynamics Crm Ffmpeg Ruby Hash Project Management Weblogic Azure Stored Procedures Zend Framework2 Windows Store Apps Pandas Webrtc Windows Services

Copyright © 2024. All Rights Reserved by - Fatal编程技术网