将数据写入Azure data Lake存储-Powershell脚本
我需要将数据写入Azure data Lake存储,而不是本地D:\驱动器。我正试图通过PowerShell获取ADF触发器信息,并希望将数据加载到Azure data Lake容器的目录中,而不是blob存储中 ADF->PowerShell->Azure数据湖 我想将Azure data Lake目录中的数据加载到YYYY(文件夹)->MM(文件夹)->DD(文件夹)->data File in.CSV中的容器中 这是我将数据写入本地机器的代码,我需要将其转换为将数据加载到data Lake存储器。为了隐藏用户名和密码,我使用了Passowrd&AES加密文件的机制 如有任何帮助和建议,将不胜感激 代码:将数据写入Azure data Lake存储-Powershell脚本,azure,powershell,azure-data-lake-gen2,Azure,Powershell,Azure Data Lake Gen2,我需要将数据写入Azure data Lake存储,而不是本地D:\驱动器。我正试图通过PowerShell获取ADF触发器信息,并希望将数据加载到Azure data Lake容器的目录中,而不是blob存储中 ADF->PowerShell->Azure数据湖 我想将Azure data Lake目录中的数据加载到YYYY(文件夹)->MM(文件夹)->DD(文件夹)->data File in.CSV中的容器中 这是我将数据写入本地机器的代码,我需要将其转换为将数据加载到data Lake
# 1- Connect to Azure Account
$username = "xyz@abc.com"
$password = Get-Content D:\Powershell\new\passwords\password.txt | ConvertTo-SecureString -Key (Get-Content D:\Powershell\new\passwords\aes.key)
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential | out-null
# 2 - Input Area
$subscriptionName = 'Data Analytics'
$resourceGroupName = 'DataLake-Gen2'
$dataFactoryName = 'dna-production-gen2'
# 3 - (All Triggers Information)
$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path D:\Powershell\new\TriggerInfo.txt -append
Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName
Stop-Transcript
# read the file as a single, multiline string using the -Raw switch
$triggers = Get-Content "D:\Powershell\new\TriggerInfo.txt" -Raw
# split the text in 'trigger' text blocks on the empty line
# loop through these blocks (skip any possible empty textblock)
$triggers = ($triggers -split '(\r?\n){2,}'| Where-Object {$_ -match '\S'}) | ForEach-Object {
# and parse the data into Hashtables
$today = Get-Date
$yesterday = $today.AddDays(-1)
$data = $_ -replace ':', '=' | ConvertFrom-StringData
$splat = @{
ResourceGroupName = $data.ResourceGroupName
DataFactoryName = $data.DataFactoryName
TriggerName = $data.TriggerName
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
Get-AzDataFactoryV2TriggerRun @splat
} | Export-Csv -Path 'D:\Powershell\new\Output.csv' -Encoding UTF8 -NoTypeInformation
# 4 - To extract the final output from the Output File.
Import-Csv D:\Powershell\new\Output.csv -DeLimiter "," |
Select-Object 'TriggerRunTimestamp', 'ResourceGroupName','DataFactoryName','TriggerName','TriggerRunId','TriggerType','Status' |
Export-Csv -Path 'D:\Powershell\new\Finalresult.csv' -Encoding UTF8 -NoTypeInformation -Force
$storageAccount = Get-AzStorageAccount -ResourceGroupName "DataLake-Gen2" -AccountName "dna2020gen2"
>> $ctx = $storageAccount.Context
PS C:\Windows\system32> $filesystemName = "dev"
>> $dirname = "triggers/"
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $dirname -Directory
$localSrcFile = "D:\Powershell\new\passwords\password.txt"
>> $filesystemName = "dev"
>> $dirname = "triggers/"
>> $destPath = $dirname + (Get-Item $localSrcFile).Name
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $destPath -Source $localSrcFile -Force
代码试图从本地系统上载文件:
# 1- Connect to Azure Account
$username = "xyz@abc.com"
$password = Get-Content D:\Powershell\new\passwords\password.txt | ConvertTo-SecureString -Key (Get-Content D:\Powershell\new\passwords\aes.key)
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential | out-null
# 2 - Input Area
$subscriptionName = 'Data Analytics'
$resourceGroupName = 'DataLake-Gen2'
$dataFactoryName = 'dna-production-gen2'
# 3 - (All Triggers Information)
$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path D:\Powershell\new\TriggerInfo.txt -append
Get-AzDataFactoryV2Trigger -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName
Stop-Transcript
# read the file as a single, multiline string using the -Raw switch
$triggers = Get-Content "D:\Powershell\new\TriggerInfo.txt" -Raw
# split the text in 'trigger' text blocks on the empty line
# loop through these blocks (skip any possible empty textblock)
$triggers = ($triggers -split '(\r?\n){2,}'| Where-Object {$_ -match '\S'}) | ForEach-Object {
# and parse the data into Hashtables
$today = Get-Date
$yesterday = $today.AddDays(-1)
$data = $_ -replace ':', '=' | ConvertFrom-StringData
$splat = @{
ResourceGroupName = $data.ResourceGroupName
DataFactoryName = $data.DataFactoryName
TriggerName = $data.TriggerName
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
Get-AzDataFactoryV2TriggerRun @splat
} | Export-Csv -Path 'D:\Powershell\new\Output.csv' -Encoding UTF8 -NoTypeInformation
# 4 - To extract the final output from the Output File.
Import-Csv D:\Powershell\new\Output.csv -DeLimiter "," |
Select-Object 'TriggerRunTimestamp', 'ResourceGroupName','DataFactoryName','TriggerName','TriggerRunId','TriggerType','Status' |
Export-Csv -Path 'D:\Powershell\new\Finalresult.csv' -Encoding UTF8 -NoTypeInformation -Force
$storageAccount = Get-AzStorageAccount -ResourceGroupName "DataLake-Gen2" -AccountName "dna2020gen2"
>> $ctx = $storageAccount.Context
PS C:\Windows\system32> $filesystemName = "dev"
>> $dirname = "triggers/"
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $dirname -Directory
$localSrcFile = "D:\Powershell\new\passwords\password.txt"
>> $filesystemName = "dev"
>> $dirname = "triggers/"
>> $destPath = $dirname + (Get-Item $localSrcFile).Name
>> New-AzDataLakeGen2Item -Context $ctx -FileSystem $filesystemName -Path $destPath -Source $localSrcFile -Force
我能够上传文件,但无法将命令输出写入datalake。关于此问题,请参考以下脚本
$username = "xyz@abc.com"
$password =ConvertTo-SecureString "" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PsCredential($username,$password)
#Connect-AzureRmAccount -Credential $credential | out-null
Connect-AzAccount -Credential $credential
$dataFactoryName=""
$resourceGroupName=""
# get dataFactory triggers
$triggers=Get-AzDataFactoryV2Trigger -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName
$datas=@()
foreach ($trigger in $triggers) {
# get the trigger run history
$today = Get-Date
$yesterday = $today.AddDays(-1)
$splat = @{
ResourceGroupName = $trigger.ResourceGroupName
DataFactoryName = $trigger.DataFactoryName
TriggerName = $trigger.Name
TriggerRunStartedAfter = $yesterday
TriggerRunStartedBefore = $today
}
$historys =Get-AzDataFactoryV2TriggerRun @splat
if($historys -ne $null){
# create date
foreach($history in $historys){
$obj =[PsCustomObject]@{
'TriggerRunTimestamp ' = $history.TriggerRunTimestamp
'ResourceGroupName ' =$history.ResourceGroupName
'DataFactoryName' =$history.DataFactoryName
'TriggerName ' = $history.TriggerName
'TriggerRunId'= $history.TriggerRunId
'TriggerType'=$history.TriggerType
'Status' =$history.Status
}
# add data to an array
$datas += $obj
}
}
}
# convert data to csv string
$contents =(($datas | ConvertTo-Csv -NoTypeInformation) -join [Environment]::NewLine)
# upload to Azure Data Lake Store Gen2
#1. Create a sas token
$accountName="testadls05"
$fileSystemName="test"
$filePath="data.csv"
$account = Get-AzStorageAccount -ResourceGroupName andywin7 -Name $accountName
$sas= New-AzStorageAccountSASToken -Service Blob -ResourceType Service,Container,Object `
-Permission "racwdlup" -StartTime (Get-Date).AddMinutes(-10) `
-ExpiryTime (Get-Date).AddHours(2) -Context $account.Context
$baseUrl ="https://{0}.dfs.core.windows.net/{1}/{2}{3}" -f $accountName , $fileSystemName, $filePath, $sas
#2. Create file
$endpoint =$baseUrl +"&resource=file"
Invoke-RestMethod -Method Put -Uri $endpoint -Headers @{"Content-Length" = 0} -UseBasicParsing
#3 append data
$endpoint =$baseUrl +"&action=append&position=0"
Invoke-RestMethod -Method Patch -Uri $endpoint -Headers @{"Content-Length" = $contents.Length} -Body $contents -UseBasicParsing
#4 flush data
$endpoint =$baseUrl + ("&action=flush&position={0}" -f $contents.Length)
Invoke-RestMethod -Method Patch -Uri $endpoint -UseBasicParsing
#Check the result (get data)
Invoke-RestMethod -Method Get -Uri $baseUrl -UseBasicParsing
有关更多详细信息,请参阅和命令
Export Csv
just can write connect to local driver或network driver。所以我认为我们不能直接将内容写入azure data lake store。我建议您使用azure data lake gen2 rest API直接存储csv内容。谢谢@JimXu。我会尝试在这里更新。请检查我的解决方案。嗨,吉姆,谢谢你写了这么好的脚本!它适合我,但我需要将数据文件写入文件系统“Dev”中的一个目录,比如说“Triggers”。在这里,我每天运行这个脚本时需要创建一些文件夹,如YYYY=2020->MM=10->DD=28->在其中我需要为最后24小时的运行编写“Data.csv”。@SaurabhShakyawar您可以尝试将文件名定义为2020/10/28/{}.csv
。但这将是硬编码的,对吗?我想让它每天为YYYY=2020->MM=10->DD=28创建文件夹,并将Data.csv放入其中。我正在尝试这样的东西来创建,但它给了我完整的日期,而不是文件夹。New Item-ItemType Directory-Path“\$(Get Date).ToString('yyyy-MM-dd')”@SaurabhShakyawar如果您想在Azure data lake gen2中创建目录,可以调用rest APIhttps://{accountName}.{dnsSuffix}/{filesystem}/{Path}?resource=Directory
.Jim,我不太熟悉PS,所以不知道如何将其与上面显示的脚本结合起来。很抱歉给您带来不便。