Apache spark 无法将spark数据帧写入gcs存储桶_Apache Spark_Google Cloud Platform_Google Cloud Storage - Fatal编程技术网

Apache spark 无法将spark数据帧写入gcs存储桶

apache-spark google-cloud-platform google-cloud-storage

Apache spark 无法将spark数据帧写入gcs存储桶,apache-spark,google-cloud-platform,google-cloud-storage,Apache Spark,Google Cloud Platform,Google Cloud Storage,作业已提交并成功运行。但是桶里没有数据。我应该如何解决它 df = spark.createDataFrame([["Amy", "lily", 12], ["john", "tom", 34]]).toDF(*["first_name", "last_name", "age"]) df.write.format("parquet").p

作业已提交并成功运行。但是桶里没有数据。我应该如何解决它

df = spark.createDataFrame([["Amy", "lily", 12], ["john", "tom", 34]]).toDF(*["first_name", "last_name", "age"])
df.write.format("parquet").partitionBy("age").option("path", "gs://my_bucket/my_table")

问题中的代码配置写入操作，但从不触发写入本身

为了实际触发写入操作，您需要调用

Writer

界面中的

save

函数之一

例如，以下内容将完成此工作：

df.write.format("parquet").partitionBy("age").option("path", "gs://my_bucket/my_table").save()

或：

甚至：

df.write.partitionBy("age").parquet("gs://my_bucket/my_table")

模式详情：

df.write

返回

DataFrameWriter

的实例；以下是API：

DataFrameWriter

API在其精神上与所有其他sparkapi是一致的：它是懒惰的。除非触发操作，否则不会执行任何操作。为此，
DataFrameWriter
实例的行为类似于构建器模式实现：随后调用
格式
，
选项
，
模式
等。仅配置可能最终执行的写入操作。配置该操作后，您可以通过在此实例上调用
save
或类似方法来触发该操作
类似地，
DataFrameWriter
还允许您多次重复使用写入操作（例如，配置一组基本选项，然后调用两次以写入拼花地板和csv文件；或者写入不同的位置等）

df.write.partitionBy("age").parquet("gs://my_bucket/my_table")

[google cloud platform]相关文章推荐

Google cloud platform 当谷歌云平台项目的gcloud配置更改时，kubectl没有更改配置 google-cloud-platform kubernetes

Google cloud platform 在错误报告中重新发送错误通知 google-cloud-platform

Google cloud platform 条带错误：未找到与负载的预期签名匹配的签名 google-cloud-platform stripe-payments

Google cloud platform 如何将Bigquery数据集和DAG从一个GCP组织传输/移动到另一个GCP组织 google-cloud-platform

Google cloud platform 为什么在使用fastify日志记录时，错误被记录为GCP中的信息？ google-cloud-platform

Google cloud platform 如何下载上传到谷歌云存储的文件 google-cloud-platform google-cloud-storage

Google cloud platform 云SDK:gcloud sql实例描述抛出错误404:云sql实例不存在 google-cloud-platform

Google cloud platform GCE实例元数据服务的速率限制是多少？ google-cloud-platform google-compute-engine

Google cloud platform 如何在Google云平台中为多个域处理1个虚拟IP/IP负载平衡 google-cloud-platform

Google cloud platform 谷歌硬盘许可页面消息 google-cloud-platform google-drive-api

Google cloud platform @google云/存储：找不到具有指定主机名的服务器 google-cloud-platform google-cloud-storage

Google cloud platform 为什么云函数在数字和字母数字执行ID之间切换？ google-cloud-platform

Google cloud platform 将Google Dataproc查询的输出重定向到文本文件 google-cloud-platform

Google cloud platform 是否可以更改gcloud sdk'；是oauth应用程序吗？ google-cloud-platform

Google cloud platform 错误：gcloud崩溃（ServerNotFoundError）：无法在www.googleapis.com上找到服务器 google-cloud-platform

Google cloud platform Google端点服务用作代理来调用其他api google-cloud-platform

Google cloud platform 如何在google cloud sql server中为AWS DMS复制启用CDC？ google-cloud-platform

Google cloud platform Terraform GCP项目创建 google-cloud-platform terraform

Google cloud platform 是否可以强制云函数创建新实例来运行请求？ google-cloud-platform

Google cloud platform 使用composer运行大查询数据传输服务时是否使用服务帐户 google-cloud-platform google-bigquery

随机文章推荐

d3.js：到svg的距离点：路径 d3.js

D3.js nvd3图例重叠 d3.js

D3.js 从Github repo中提取要点，并显示在bl.ocks.org上 d3.js

D3.js d3.有间隙的折线图（无间隙数据） d3.js

d3.js堆叠条形图上的y轴 d3.js

D3.js 如何将原始DOM元素转换为D3选择？ d3.js

d3.js：我可以在函数中封装弧定义吗？ d3.js

D3.js 图形~轴对齐问题 d3.js

D3.js 带复合轴的Dimple.js折线图，系列上的点之间没有链接 d3.js

D3.js 为什么在svg元素d3js之外追加文本？ d3.js

D3.js 字云生成器 d3.js

D3.js 跨组排序形状 d3.js svg

D3.js 标准化堆叠条形图到堆叠条形图 d3.js charts

D3.js d3js tree.nodes（）不是函数 d3.js

D3.js 减少billboard.js图表中的刻度数 d3.js

D3.js 在D3中，如何在SVG中创建HTML画布？ d3.js svg html5-canvas

D3.js d3多个不同的分划 d3.js

D3.js 省略尾随零/使用带有d3.format（）的SIGIFIGS d3.js

D3.js e> d3.js

D3.JS ticks方法在使用日期的x轴上不起作用 d3.js

[apache spark]相关推荐

Apache spark Spark自定义流式处理删除大部分数据
Apache Spark

Apache spark sparkstream和sparksql与数据仓库
Apache Spark

Apache spark 筛选Spark中数组所有值的记录
Apache Spark

Apache spark 无法将pyspark连接到master
Apache Spark

Apache spark 需要学习火花的建议吗
Apache Spark

Apache spark 将多个任务作为一个任务-ApacheSpark
Apache Spark

Apache spark 使用sparkR-crash将数据保存到Hadoop中
Apache Spark

Apache spark pyspark类路径包含多个SLF4J绑定cloudera
Apache Spark Pyspark

Apache spark 要传递给Kafkautils.createStream方法的输入参数和组
Apache Spark Apache Kafka

Apache spark 保存JavaDStream<；列表<；字符串>&燃气轮机；作为火花流的拼花地板
Apache Spark

Apache spark 使用saveToCassandra（）时自动递增主键
Apache Spark Cassandra

Apache spark 在不停止应用程序的情况下重新启动流式查询
Apache Spark

Apache spark 基于RDD的PySpark-LDA模型稠密向量
Apache Spark Machine Learning Pyspark

Apache spark 处理json要比处理多核csv慢得多
Apache Spark Pyspark

Apache spark Pyspark:使用带参数的UDF创建新列
Apache Spark Pyspark

Apache spark Pyspark sql count返回的行数与纯sql不同
Apache Spark Hive Pyspark

Apache spark SPARK JDBC连接重用用于执行的许多查询
Apache Spark

Apache spark Apache纱线-分配比物理内存或RAM更多的内存
Apache Spark Hadoop

Apache spark 使用PySpark中数据帧中的值除以聚合值
Apache Spark Pyspark

Apache spark 如何解决有关Spark TopK的问题？
Apache Spark

Apache spark Spark SQL以不同的方式读取拼花地板表和csv表
Apache Spark

Apache spark 为什么spark shell会因“；此时发生了意外”；？-更改JAVA#U HOME环境变量不会'；行不通
Apache Spark Pyspark

Apache spark 指定字符串长度大于256的pyspark数据帧架构
Apache Spark Pyspark Amazon Redshift

Apache spark 如果块大小为128 MB，则需要较大的执行器内存
Apache Spark Pyspark

Apache spark 我不能让hive用spark来做工作
Apache Spark Hadoop Hive

Apache spark 带有结构化流协议的ApacheKafka
Apache Spark Apache Kafka

Apache spark 火花&x27；s数据集'；s limit函数创建一个只有1个分区的新数据集。为什么？
Apache Spark

Apache spark 自动化集群和高并发性Databricks集群
Apache Spark

Apache spark 如何在pyspark中使用带有两列的date_add？
Apache Spark Pyspark

Apache spark 从spark worker节点写入databricks表
Apache Spark

Tags

Asp.net Statistics Symfony Memory Management Django Models Orchardcms For Loop Xslt Rxjs Windows Services Csv Asterisk Forms Internet Explorer Gremlin Data Structures Post Google Plus Events Typo3 Jakarta Ee Telerik Database Design Grep Google Cloud Dataflow Sms Talend Pine Script Drools E Commerce Moodle Ocaml Cmake Cloud C++11 Material Ui Less Notifications Hibernate Programming Languages Maven Google Chrome Extension Jquery Plugins Localization Unity3d Docker Compose Signalr Paypal Sencha Touch Tensorflow Parsing Active Directory Flask Object Clang Numpy Elm Centos Laravel Rss Highcharts Http Kotlin Twitter Multithreading Internationalization Python Google Cloud Storage Vmware Phantomjs Iis 7 Colors Database Ibm Mq Fluent Nhibernate Crystal Reports Libgdx Ibm Mobilefirst Printing Gulp Aframe Scripting Directx Abap Combobox Codeigniter Sharepoint Install4j Appium Ssis Asp.net Mvc 4 String Ibm Midrange Terraform Excel Dojo Keycloak Cryptography Reflection Iphone Telegram Android Emulator Shell Scroll Arduino Oracle11g Blazor Triggers Debugging Magento Webpack Symfony1 Ignite Github Sitecore Algorithm Google App Engine Jqgrid Azure Data Factory Language Agnostic Windows Phone 7 Postgresql Image Gatsby Powershell Selenium Webdriver File Upload Dns Terminal Documentation Monitoring Graphviz Ravendb Itext Opengl Es Facebook Jaxb Apache Amazon Cloudformation Validation Sql Server 2005 Error Handling Stata Memory Leaks Swift Magento2 Open Source Sed Google Cloud Platform Tsql Scala Visual Studio 2013 Tcl Filesystems Spring Gis Rdf Serial Port Pyspark Google Apps Script Ant Ffmpeg Odata Gitlab Datetime Struts2 Adobe Drop Down Menu Liferay Checkbox Exception Handling Nhibernate Opencl Machine Learning Amazon Dynamodb Nuget Openssl Polymer Ldap Interface Orientdb Windows 10 Qt Antlr4 Server Stored Procedures Swing Python 3.x Coldfusion Asp.net Mvc 2 Process Url Rewriting Requirejs Aws Lambda Ember.js Templates Haskell Boost Atom Editor Transactions Cypress

Copyright © 2024. All Rights Reserved by - Fatal编程技术网