Awk sed处理OFX,从<;备忘录>;以及在<;名称>;

Awk sed处理OFX,从<;备忘录>;以及在<;名称>;,awk,sed,ofx,Awk,Sed,Ofx,我正在处理OFX(银行交易)文件。我的银行不使用标记指定收款人,但此信息是标记的子字符串 因此,我的文件类似于: ...ofx headers and other stuff ...line below is a transaction <STMTTRN> <TRNTYPE>OTHER</TRNTYPE> <DTPOSTED>20160609120000</DTPOSTED> <TRNAMT>-4.0

我正在处理OFX(银行交易)文件。我的银行不使用
标记指定收款人,但此信息是
标记的子字符串

因此,我的文件类似于:

...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
</STMTTRN>
...continues other transactions and end of file
作为awk的另一个工具可以是一个解决方案。

使用GNU时:

sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n    <NAME>\1<\/NAME>/' file

sed-r的s/*[0-9]{2}:[0-9]{2}(.*)补充@Cyrus answer以处理无ascii字符:

我放弃了非ascii字符,现在它开始工作了:

iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n                  <NAME>\1<\/NAME>/' file-utf8.ofx 
iconv-f“windows-1252”-t“UTF-8”file-ansi.ofx-o file-utf8.ofx
rm文件-ansi.ofx
sed'y/aaaaaeeeiooouucc/'-i file-utf8.ofx

sed-i-r的s/*.[0-9]{2}:[0-9]{2}(.*)使用-i,正则表达式不匹配的拉丁字符行:Cartão de Crédito-09/06 18:37沃尔玛2号街我放弃了非ascii字符。我在下面写了一个答案来格式化代码。GNU和BSD
sed
都应该能够正确处理UTF-8输入(假设您的语言环境是基于UTF-8的),所以我认为您不需要
rm
sed'y/..
命令<代码>sed-E的/[:alpha:][]/一个字母/'
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
    <NAME>Walmart 2th street</NAME>
</STMTTRN>
iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n                  <NAME>\1<\/NAME>/' file-utf8.ofx 
<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO>
<NAME>Walmart 2th street</NAME>