Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用unixshell脚本进行文件解析_Python_Snaplogic_Snaplogic Script Snap - Fatal编程技术网

Python 使用unixshell脚本进行文件解析

Python 使用unixshell脚本进行文件解析,python,snaplogic,snaplogic-script-snap,Python,Snaplogic,Snaplogic Script Snap,我正试图做一些改变,我被卡住了。下面是问题描述 下面是以管道分隔的文件。我屏蔽了数据 AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|

我正试图做一些改变,我被卡住了。下面是问题描述

下面是以管道分隔的文件。我屏蔽了数据

AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||30.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
在这个文件中,我们得到了
CalculatedCurrency
字段,其中有多个值由逗号分隔。该文件还有字段
CalculatedCurrencyAmount
,该字段也有多个由逗号分隔的值。但我只需要从属于的
CalculatedCurrency
字段中提取货币值
BranchCurrency
(文件中的另一个字段),当然还有对应于该货币的
CalculatedCurrencyAmount

所需输出:-

AccountancyNumber | AccountancyNumber Extra | Amount | Approved by | BranchCurrency | BranchUID | BranchName | Calculated Currency | Calculated CurrencyVatAmount | ControllerBy | Country | Currency | CustomFieldEnabled | CustomFieldGuid | CustomFieldName | CustomFieldRequired成员| ExpenseAccountGuid | ExpenseAccountName | ExpenseAccountStatus | ExpenseGuid | ExpenseStatus | ExternalId | GroupGuid | GroupId | GroupName | IBAN | Image | IsInvoice | MatchStatus | Merchant | MerchantEnterpriseNumber |注|所有权|支付方式|支付方式GUID |支付方式|项目GUID | TravelerName |可报销unt |增值税百分比|增值税参考|增值税代码|文件名| CreateTstamp |实际货币|实际金额 61470003 | 30.00 |零欧元| 168fcea9-17d4-45a1-8b6f-bfb249cdbea6 |贝尔|贝尔|美元,印度卢比,欧元| 35.202420.11,30.00 |零欧元,空|空| BE | EUR | true | 0d4b767b-0988-47e8-9144-05e607169284 | CARERTITLE | false | FE | 2018-07-24T00:00 |空|空| abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad |办公室用餐|真实| 781d10d2-2f3b-43bc-866e-a653fefacbbe |批准| 70926 | 40ac7117-c7e2-42ea-b34f-96330C380B6 | BEL FSP用户| BEL FSP用户|假|无|#1月4日(12日)12441244124412441244124412441244124412441244124412441244124412441244124412441244124412441244124412441244124412441246-F4-F4-F4-7-7-4-3-3-acd7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-7-787878787-7-787-7-7-7-7-7-7-7-7-7-7-7-7-787878787878787-7-7-7->

请帮忙

Snaplogic Python脚本

from com.snaplogic.scripting.language import ScriptHook
from com.snaplogic.scripting.language.ScriptHook import *
import csv

class TransformScript(ScriptHook):
    def __init__(self, input, output, error, log):
        self.input = input
        self.output = output
        self.error = error
        self.log = log

    def execute(self):
        self.log.info("Executing Transform script")

        while self.input.hasNext():
            data = self.input.next()
            branch_currency = data['BranchCurrency']
            calc_currency = data['CalculatedCurrency'].split(',')
            calc_currency_amount = data['CalculatedCurrencyAmount'].split(',')

            result = None
        for i, name in enumerate(calc_currency):
            result = calc_currency_amount[i] if name == branch_currency else result
            data["CalculatedCurrencyAmount"] = result
            result1 = calc_currency[i] if name == branch_currency else result
            data["CalculatedCurrency"] = result1



            try:
                data["mathTryCatch"] = data["counter2"].longValue() + 33
                self.output.write(data)
            except Exception as e:
                data["errorMessage"] = e.message
                self.error.write(data)






        self.log.info("Finished executing the Transform script") 
hook = TransformScript(input, output, error, log)
使用awk:

awk 'BEGIN{FS=OFS="|"}
     NR==1{print $0,"ActualCurrency","ActualAmount";next}
     {n=split($9,a,",");split($10,b,",");for(i=1;i<=n;i++) if(a[i]==$5) print $0,$5,b[i]}' file
awk'BEGIN{FS=OFS=“|”}
NR==1{打印$0,“实际货币”,“实际金额”;下一个}

{n=split($9,a,“,”);split($10,b,“,”);for(i=1;i使用bash和一些数组:

arr_find() {
        echo $(( $(printf "%s\0" "${@:2}" | grep -Fnxz "$1" | cut -d: -f1) - 1 ))
}
IFS='|' read -r -a headers
while IFS='|' read -r "${headers[@]}"; do
        IFS=',' read -r -a CalculatedCurrency <<<"$CalculatedCurrency"
        IFS=',' read -r -a CalculatedCurrencyAmount <<<"$CalculatedCurrencyAmount"

        idx=$(arr_find "$BranchCurrency" "${CalculatedCurrency[@]}")
        echo "BranchCurrency is $BranchCurrency. Hence CalculatedCurrency will be ${CalculatedCurrency[$idx]} and CalculatedCurrencyAmount will have to be ${CalculatedCurrencyAmount[$idx]}."

done
arr_find(){
echo$($(printf“%s\0”${:2}”grep-Fnxz“$1”| cut-d:-f1)-1))
}
IFS=“|”读-r-a头
而IFS='|'read-r“${headers[@]}”则执行

如果s=','read-r-a calculated currency我知道,op要求使用unix shell,但作为一个替代选项,我展示了一些使用python实现它的代码。(显然,这段代码也可以大大改进。)其最大的优点是可读性,例如,您可以通过名称或代码对数据进行寻址,这比使用awk等人的方法可读性好得多

将数据保存在
data.psv
中,将以下脚本写入文件
main.py
。我已经使用python3和python2对其进行了测试。这两种方法都有效。请使用
python main.py
运行脚本

更新:我扩展了脚本以解析所有行。在示例数据中,我将第一行BranchCurrency设置为EUR,第二行设置为USD,作为虚拟测试

文件:main.py

import csv

def parse_line(row):
  branch_currency = row['BranchCurrency']
  calc_currency = row['CalculatedCurrency'].split(',')
  calc_currency_amount = row['CalculatedCurrencyAmount'].split(',')

  result = None
  for i, name in enumerate(calc_currency):
    result = calc_currency_amount[i] if name == branch_currency else result

  return result


def main():
  with open('data.psv') as f:
    reader = csv.DictReader(f, delimiter='|')
    for row in reader:
      print(parse_line(row))


if __name__ == '__main__':
  main()
示例数据:

[:~] $ cat data.psv 
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
[:~] $ python main.py 
30.00
35.20
运行示例:

[:~] $ cat data.psv 
AccountancyNumber|AccountancyNumberExtra|Amount|ApprovedBy|BranchCurrency|BranchGuid|BranchId|BranchName|CalculatedCurrency|CalculatedCurrencyAmount|CalculatedCurrencyVatAmount|ControllerBy|Country|Currency|CustomFieldEnabled|CustomFieldGuid|CustomFieldName|CustomFieldRequired|CustomFieldValue|Date|DateApproved|DateControlled|Email|EnterpriseNumber|ExpenseAccountGuid|ExpenseAccountName|ExpenseAccountStatus|ExpenseGuid|ExpenseReason|ExpenseStatus|ExternalId|GroupGuid|GroupId|GroupName|IBAN|Image|IsInvoice|MatchStatus|Merchant|MerchantEnterpriseNumber|Note|OwnerShip|PaymentMethod|PaymentMethodGuid|PaymentMethodName|ProjectGuid|ProjectId|ProjectName|Reimbursable|TravellerId|UserGUID|VatAmount|VatPercentage|XpdReference|VatCode|FileName|CreateTstamp
61470003||35.00|null|EUR|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
61470003||35.00|null|USD|168fcea9-17d4-45a1-8b6f-bfb249cdbea6|BEL|BEL|USD,INR,EUR|35.20,2420.11,30.00|null,null,null|null|BE|EUR|true|0d4b767b-0988-47e8-9144-05e607169284|careertitle|false|FE|2018-07-24T00:00:00|null|null|abc_def@xyz.com||c32f03c6-31df-4fd8-8cc2-1c5f3a580aad|Meals - In Office|true|781d10d2-2f3b-43bc-866e-a653fefacbbe||Approved|70926|40ac7117-c7e2-42ea-b34f-96330c9380b6|BEL-FSP-Users|BEL-FSP-Users|||false|None|in office meal #1|||Personal|Cash|1ee44666-f4c7-44b3-acd3-8ecd7127480a|Cash|2cb4ccb7-634d-4386-af43-b4572ec72098|00AA06|00AA06|true||6c5a835f-5152-46db-923a-3ebd08c7dad3|null|null|XPD012245802||1820711.xml|2018-08-07 05:42:10.46
[:~] $ python main.py 
30.00
35.20

您根本不需要在此处使用脚本快照。一直为转换编写脚本会影响性能,并完全违背IPaaS工具的用途。映射器应该足够了

我为这个问题创建了以下测试管道

我将此问题中提供的数据保存在一个文件中,并将其保存在SnapLogic中以供测试。在管道中,我使用CSV解析器对其进行解析

以下是解析的结果

然后,我使用映射器进行所需的转换

下面是获取实际金额的表达式

$CalculatedCurrency.split(',').indexOf($BranchCurrency) >= 0 ? $CalculatedCurrencyAmount.split(',')[$CalculatedCurrency.split(',').indexOf($BranchCurrency)] : null
结果如下


避免为可以使用映射器解决的问题编写脚本。

您介意用更高级的语言编写吗?@SubhasreeMitra:而且您只想用Posix构造来编写脚本???为什么不—至少—使用awk或ferdy建议的“真实”语言。甚至是一些具有模式匹配功能的shell(Zsh、bash)比纯Posix外壳更有用。Python、Javascript或Ruby也很好。为什么要使用脚本快照呢?这会影响性能,完全违背IPaaS工具的用途。这非常有用,奥利夫,谢谢。但是如果文件中有多条记录呢?我应该使用for循环吗?@SubhasreeMitra No-need for loop。
awk
将为输入文件的每一行应用此脚本。您能帮忙吗?想象一下,有多条记录。@SubhasreeMitra我帮不上什么忙…只需在更大的文件中尝试完全相同的脚本。是的,我得到了。但这是作为-sh Rydoo.sh计算货币是计算货币USD,INR,USD EUR,INR,EUR计算的实际金额为30.00 30.00我不想将这些值放在一起,我希望像在记录中一样放在单独的行中;可能会在记录的末尾追加其他字段。记录中的所有其他内容都应保持原样。感谢脚本..但这不起作用..错误如下-原因:脚本没有一个名为“hook”的变量,解析:将一个名为“hook”的全局变量添加到脚本中,该脚本被实例化到实现ScriptHook interfaceFurther的类中,并扩展到多行。尽管如此,这个脚本可以变得非常python化。但是它也可以与python2一起工作。感谢ferdy,它可以工作。实际上我正在给fro打电话m snaplogic。因此,您用snaplogic格式编写了脚本。但此逻辑仅适用于最后一个r