Python 使用unixshell脚本进行文件解析
我正试图做一些改变,我被卡住了。下面是问题描述 下面是以管道分隔的文件。我屏蔽了数据Python 使用unixshell脚本进行文件解析,python,snaplogic,snaplogic-script-snap,Python,Snaplogic,Snaplogic Script Snap,我正试图做一些改变,我被卡住了。下面是问题描述 下面是以管道分隔的文件。我屏蔽了数据 AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
在这个文件中,我们得到了CalculatedCurrency
字段,其中有多个值由逗号分隔。该文件还有字段CalculatedCurrencyAmount
,该字段也有多个由逗号分隔的值。但我只需要从属于的CalculatedCurrency
字段中提取货币值
BranchCurrency
(文件中的另一个字段),当然还有对应于该货币的CalculatedCurrencyAmount
所需输出:-
AccountancyNumber | AccountancyNumber Extra | Amount | Approved by | BranchCurrency | BranchUID | BranchName | Calculated Currency | Calculated CurrencyVatAmount | ControllerBy | Country | Currency | CustomFieldEnabled | CustomFieldGuid | CustomFieldName | CustomFieldRequired成员| ExpenseAccountGuid | ExpenseAccountName | ExpenseAccountStatus | ExpenseGuid | ExpenseStatus | ExternalId | GroupGuid | GroupId | GroupName | IBAN | Image | IsInvoice | MatchStatus | Merchant | MerchantEnterpriseNumber |注|所有权|支付方式|支付方式GUID |支付方式|项目GUID | TravelerName |可报销unt |增值税百分比|增值税参考|增值税代码|文件名| CreateTstamp |实际货币|实际金额
61470003 | 30.00 |零欧元| 168fcea9-17d4-45a1-8b6f-bfb249cdbea6 |贝尔|贝尔|美元,印度卢比,欧元| 35.202420.11,30.00 |零欧元,空|空| BE | EUR | true | 0d4b767b-0988-47e8-9144-05e607169284 | CARERTITLE | false | FE | 2018-07-24T00:00 |空|空| abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad |办公室用餐|真实| 781d10d2-2f3b-43bc-866e-a653fefacbbe |批准| 70926 | 40ac7117-c7e2-42ea-b34f-96330C380B6 | BEL FSP用户| BEL FSP用户|假|无|#1月4日(12日)12441244124412441244124412441244124412441244124412441244124412441244124412441244124412441244124412441244124412441246-F4-F4-F4-7-7-4-3-3-acd7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-787878787-7-787-7-7-7-7-7-7-7-7-7-7-7-7-787878787878787-7-7-7->
请帮忙
Snaplogic Python脚本
from com.snaplogic.scripting.language import ScriptHook
from com.snaplogic.scripting.language.ScriptHook import *
import csv
class TransformScript(ScriptHook):
def __init__(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log
def execute(self):
self.log.info("Executing Transform script")
while self.input.hasNext():
data = self.input.next()
branch_currency = data['BranchCurrency']
calc_currency = data['CalculatedCurrency'].split(',')
calc_currency_amount = data['CalculatedCurrencyAmount'].split(',')
result = None
for i, name in enumerate(calc_currency):
result = calc_currency_amount[i] if name == branch_currency else result
data["CalculatedCurrencyAmount"] = result
result1 = calc_currency[i] if name == branch_currency else result
data["CalculatedCurrency"] = result1
try:
data["mathTryCatch"] = data["counter2"].longValue() + 33
self.output.write(data)
except Exception as e:
data["errorMessage"] = e.message
self.error.write(data)
self.log.info("Finished executing the Transform script")
hook = TransformScript(input, output, error, log)
使用awk:
awk 'BEGIN{FS=OFS="|"}
NR==1{print $0,"ActualCurrency","ActualAmount";next}
{n=split($9,a,",");split($10,b,",");for(i=1;i<=n;i++) if(a[i]==$5) print $0,$5,b[i]}' file
awk'BEGIN{FS=OFS=“|”}
NR==1{打印$0,“实际货币”,“实际金额”;下一个}
{n=split($9,a,“,”);split($10,b,“,”);for(i=1;i使用bash和一些数组:
arr_find() {
echo $(( $(printf "%s\0" "${@:2}" | grep -Fnxz "$1" | cut -d: -f1) - 1 ))
}
IFS='|' read -r -a headers
while IFS='|' read -r "${headers[@]}"; do
IFS=',' read -r -a CalculatedCurrency <<<"$CalculatedCurrency"
IFS=',' read -r -a CalculatedCurrencyAmount <<<"$CalculatedCurrencyAmount"
idx=$(arr_find "$BranchCurrency" "${CalculatedCurrency[@]}")
echo "BranchCurrency is $BranchCurrency. Hence CalculatedCurrency will be ${CalculatedCurrency[$idx]} and CalculatedCurrencyAmount will have to be ${CalculatedCurrencyAmount[$idx]}."
done
arr_find(){
echo$($(printf“%s\0”${:2}”grep-Fnxz“$1”| cut-d:-f1)-1))
}
IFS=“|”读-r-a头
而IFS='|'read-r“${headers[@]}”则执行
如果s=','read-r-a calculated currency我知道,op要求使用unix shell,但作为一个替代选项,我展示了一些使用python实现它的代码。(显然,这段代码也可以大大改进。)其最大的优点是可读性,例如,您可以通过名称或代码对数据进行寻址,这比使用awk等人的方法可读性好得多
将数据保存在data.psv
中,将以下脚本写入文件main.py
。我已经使用python3和python2对其进行了测试。这两种方法都有效。请使用python main.py
运行脚本
更新:我扩展了脚本以解析所有行。在示例数据中,我将第一行BranchCurrency设置为EUR,第二行设置为USD,作为虚拟测试
文件:main.py
import csv
def parse_line(row):
branch_currency = row['BranchCurrency']
calc_currency = row['CalculatedCurrency'].split(',')
calc_currency_amount = row['CalculatedCurrencyAmount'].split(',')
result = None
for i, name in enumerate(calc_currency):
result = calc_currency_amount[i] if name == branch_currency else result
return result
def main():
with open('data.psv') as f:
reader = csv.DictReader(f, delimiter='|')
for row in reader:
print(parse_line(row))
if __name__ == '__main__':
main()
示例数据:
[:~] $ cat data.psv
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
[:~] $ python main.py
30.00
35.20
运行示例:
[:~] $ cat data.psv
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
[:~] $ python main.py
30.00
35.20
您根本不需要在此处使用脚本快照。一直为转换编写脚本会影响性能,并完全违背IPaaS工具的用途。映射器应该足够了
我为这个问题创建了以下测试管道
我将此问题中提供的数据保存在一个文件中,并将其保存在SnapLogic中以供测试。在管道中,我使用CSV解析器对其进行解析
以下是解析的结果
然后,我使用映射器进行所需的转换
下面是获取实际金额的表达式
$CalculatedCurrency.split(',').indexOf($BranchCurrency) >= 0 ? $CalculatedCurrencyAmount.split(',')[$CalculatedCurrency.split(',').indexOf($BranchCurrency)] : null
结果如下
避免为可以使用映射器解决的问题编写脚本。您介意用更高级的语言编写吗?@SubhasreeMitra:而且您只想用Posix构造来编写脚本???为什么不—至少—使用awk或ferdy建议的“真实”语言。甚至是一些具有模式匹配功能的shell(Zsh、bash)比纯Posix外壳更有用。Python、Javascript或Ruby也很好。为什么要使用脚本快照呢?这会影响性能,完全违背IPaaS工具的用途。这非常有用,奥利夫,谢谢。但是如果文件中有多条记录呢?我应该使用for循环吗?@SubhasreeMitra No-need for loop。awk
将为输入文件的每一行应用此脚本。您能帮忙吗?想象一下,有多条记录。@SubhasreeMitra我帮不上什么忙…只需在更大的文件中尝试完全相同的脚本。是的,我得到了。但这是作为-sh Rydoo.sh计算货币是计算货币USD,INR,USD EUR,INR,EUR计算的实际金额为30.00 30.00我不想将这些值放在一起,我希望像在记录中一样放在单独的行中;可能会在记录的末尾追加其他字段。记录中的所有其他内容都应保持原样。感谢脚本..但这不起作用..错误如下-原因:脚本没有一个名为“hook”的变量,解析:将一个名为“hook”的全局变量添加到脚本中,该脚本被实例化到实现ScriptHook interfaceFurther的类中,并扩展到多行。尽管如此,这个脚本可以变得非常python化。但是它也可以与python2一起工作。感谢ferdy,它可以工作。实际上我正在给fro打电话m snaplogic。因此,您用snaplogic格式编写了脚本。但此逻辑仅适用于最后一个r