Pyspark 如何将文本文件从Databricks笔记本上传到FTP
我试图找到一个解决办法,但一无所获。我在这方面是新手,所以如果你知道解决方案,请帮助我。Pyspark 如何将文本文件从Databricks笔记本上传到FTP,pyspark,databricks,Pyspark,Databricks,我试图找到一个解决办法,但一无所获。我在这方面是新手,所以如果你知道解决方案,请帮助我。 谢谢 在Databricks中,您可以使用下面描述的任何一种方法访问存储在ADL中的文件。 有三种访问Azure Data Lake存储Gen2的方法: 使用服务主体和OAuth 2.0将Azure Data Lake Storage Gen2文件系统装载到DBFS 直接使用服务主体 直接使用Azure Data Lake Storage Gen2存储帐户访问密钥 将文件系统中的文件当作本地文件装载和访问的
谢谢 在Databricks中,您可以使用下面描述的任何一种方法访问存储在ADL中的文件。 有三种访问Azure Data Lake存储Gen2的方法:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<password>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
要装载Azure Data Lake存储Gen2或容器中的文件夹,请使用以下命令:
语法:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<password>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
示例:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<password>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
参考:
希望这能有所帮助。您能详细说明您的问题吗?“从Databricks笔记本上传文本文件到FTP”是什么意思?是的。我在ADLS中有文本文件,我想用pysparkOk在Databricks笔记本中编写(上传)到FTP,谢谢,但这不是访问ADLS的方法,而是如何从Databricks上传文件到FTP。我找到了一个解决方案,我将用它来回答。无论如何,谢谢。嗨@MilosTodosijevic,请分享你的发现。这对其他社区成员是有益的。非常感谢。
Ok, I found a solution.
#copy file from ADLS to SFTP
from ftplib import FTP_TLS
from azure.datalake.store import core, lib, multithread
import pandas as pd
keyVaultName = "yourkeyvault"
#then you need to configure keyvault with ADLS
#set up authentification for ADLS
tenant_id = dbutils.secrets.get(scope = keyVaultName, key = "tenantId")
username = dbutils.secrets.get(scope = keyVaultName, key = "appRegID")
password = dbutils.secrets.get(scope = keyVaultName, key = "appRegSecret")
store_name = 'ADLSStoridge'
token = lib.auth(tenant_id = tenant_id, client_id = username, client_secret = password)
adl = core.AzureDLFileSystem(token, store_name=store_name)
#create secure connection with SFTP
ftp = FTP_TLS('ftp.xyz.com')
#add credentials
ftp.login(user='',passwd='')
ftp.prot_p()
#set sftp directory path
ftp.cwd('folder path on FTP')
#load file
f = adl.open('adls path of your file')
#write to SFTP
ftp.storbinary('STOR myfile.csv', f)