Azure databricks 如何使用Databricks(Python)将CSV写入Azure存储Gen2
我想将reqular CSV文件写入存储,但我得到的是文件夹“sample_file.CSV”及其下的4个文件。如何创建从数据帧到Azure存储Gen2的普通csv文件 我很高兴有任何建议或文章的链接Azure databricks 如何使用Databricks(Python)将CSV写入Azure存储Gen2,azure-databricks,Azure Databricks,我想将reqular CSV文件写入存储,但我得到的是文件夹“sample_file.CSV”及其下的4个文件。如何创建从数据帧到Azure存储Gen2的普通csv文件 我很高兴有任何建议或文章的链接 df.coalesce(1).write.option(“header”,“true”).csv(TargetDirectory+“/sample_file.csv”)下面是将(数据帧)csv数据直接写入Azure DataRicks笔记本中Azure blob存储Gen2容器的代码片段 对于文件
df.coalesce(1).write.option(“header”,“true”).csv(TargetDirectory+“/sample_file.csv”)下面是将(数据帧)csv数据直接写入Azure DataRicks笔记本中Azure blob存储Gen2容器的代码片段 对于文件夹中的多个csv文件,您可以尝试以下操作:
# Data available in on the DBFS Databricks File System
df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").load("dbfs:/myfolder/sample/*.csv")
df.show()
如果您使用的是Spark 3.0,则可以使用RecursiveFileLookup
递归文件查找–递归扫描目录中的文件。使用此选项将禁用分区发现
对于文件夹中的单个csv文件:
# Configure blob storage gen2 account access key globally
spark.conf.set("fs.azure.account.key.chepragen2.dfs.core.windows.net", "<<ACCESS KEY>>")
# Configure blob storage gen2 account folder
output_container_path = "abfss://<<filesystem>>@<<Storage_Name>>.dfs.core.windows.net/<<DirectoryName>>"
output_blob_folder = "%s/CSV_data_folder" % output_container_path
# Configure blob storage gen2 account folder
output_container_path = "abfss://<<filesystem>>@<<Storage_Name>>.dfs.core.windows.net/<<DirectoryName>>"
output_blob_folder = "%s/CSV_data_folder" % output_container_path
# write the dataframe as a single file to blob storage
(df
.coalesce(1)
.write
.mode("overwrite")
.option("header", "true")
.format("com.databricks.spark.csv")
.save(output_blob_folder))
#全局配置blob存储gen2帐户访问密钥
spark.conf.set(“fs.azure.account.key.chepragen2.dfs.core.windows.net”,“如果我的答案对您有帮助,您可以将其作为答案接受(单击答案旁边的复选标记,将其从灰色变为填充)。这对其他社区成员也有好处。谢谢。
# Configure blob storage gen2 account access key globally
spark.conf.set("fs.azure.account.key.chepragen2.dfs.core.windows.net", "<<ACCESS KEY>>")
# Configure blob storage gen2 account folder
output_container_path = "abfss://<<filesystem>>@<<Storage_Name>>.dfs.core.windows.net/<<DirectoryName>>"
output_blob_folder = "%s/CSV_data_folder" % output_container_path
# Configure blob storage gen2 account folder
output_container_path = "abfss://<<filesystem>>@<<Storage_Name>>.dfs.core.windows.net/<<DirectoryName>>"
output_blob_folder = "%s/CSV_data_folder" % output_container_path
# write the dataframe as a single file to blob storage
(df
.coalesce(1)
.write
.mode("overwrite")
.option("header", "true")
.format("com.databricks.spark.csv")
.save(output_blob_folder))