将mailgun报告转换为csv格式perl_Perl_Batch File_Join_Split_Mailgun

将mailgun报告转换为csv格式perl

perl batch-file join

将mailgun报告转换为csv格式perl,perl,batch-file,join,split,mailgun,Perl,Batch File,Join,Split,Mailgun,我有个问题。我想编写一个perl脚本，将Mailgun输出解析为csv格式。我假设“拆分”和“联接”函数在此过程中可以正常工作。以下是一些示例数据：样本数据 { "geolocation": { "city": "Random City", "region": "State", "country": "US" }, "url": "https://www4.website.com/register/1234567", "

我有个问题。我想编写一个perl脚本，将Mailgun输出解析为csv格式。我假设“拆分”和“联接”函数在此过程中可以正常工作。以下是一些示例数据：

样本数据

{

    "geolocation": {

    "city": "Random City", 

    "region": "State", 

    "country": "US"
    }, 
    "url": "https://www4.website.com/register/1234567", 

    "timestamp": "1237854980723.0239847"
}


{

    "geolocation": {

    "city": "Random City2", 

    "region": "State2", 

    "country": "mEXICO"
    }, 
    "url": "https://www4.website2.com/register/ABCDE567", 

    "timestamp": "1237854980723.0239847"
}

所需输出

“城市”、“地区”、“国家”、“url”、“时间戳”

“随机城市”、“州”、“美国”、“1237854980723.0239847”

“随机城市2”、“州2”、“墨西哥”、“www4.website2.com/ABCDE567”、“1237854980723.0239847”

我的目标是获取示例数据，并将所需输出创建为逗号分隔的CSV文件。我不知道该怎么办。通常我会尝试在批处理文件中使用一系列单行程序来解决这个问题，但我更喜欢perl脚本。真实数据将包含更多信息。但是，只要弄清楚如何解析一般结构就可以了

这是我在批处理文件中的内容

代码

    perl -p -i.bak -e "s/(,$|,+ +$|^.*?{$|^.*?}.*?$|^.*?],.*?$)//gi" file.txt

    rem Removes all unnecessary characters and lines with { and }. ^

    perl -p -i.bak -e "s/(^ +| +$)//gi" file.txt    

    perl -p -i.bak -e "s/^\n$//gi" file.txt


rem Removes all blank lines in initial file. Next one-liner takes care of trailing and beginning 

rem whitespace.  The file is nice and clean now.

perl -p -e "s/(^\".*?\"):.*?$/$1/gi" file.txt > header.txt

rem retains only header info and puts into 'header.txt' ^

perl -p -e "s/^\".*?\": +(\".*?\"$)/$1/gi" file.txt > data.txt

rem retains only data that is associated with each field.

perl -p -i.bak -e "s/\n/,/gi" data.txt

rem replaces new line character with ',' delimiter.

perl -p -i.bak -e "s/^/\n/gi" data.txt

rem drops data down a line

perl -p -i.bak -e "s/\n/,/gi" header.txt

rem replaces new line character with ',' delimiter.

copy header.txt+data.txt report.txt

rem copies both files together.  Since there is the same amount of fields as there are data   

rem delimiters, the columns and headers match.

我的输出

“城市”、“地区”、“国家”、“url”、“时间戳”

“随机城市”、“州”、“美国”、“1237854980723.0239847”

这确实有效，但浓缩脚本会更好。不同的情况会影响这个批处理脚本我需要一些更坚实的。有什么建议吗？？？

您可以使用单个Perl脚本和一个正则表达式

#!/usr/bin/env perl
use v5.10;
use Data::Dumper;

$_ = <<TXT;
{

    "geolocation": {

    "city": "Random City",

    "region": "State",

    "country": "US"
    },
    "url": "https://www4.website.com/register/1234567",

    "timestamp": "1237854980723.0239847"
}
TXT

my @matches = /\s*\s*("[^"]+")\s*\s*:\s*("[^"]+")/gmx;
my %hash = @matches;

say join(",", keys %hash);
say join(",", values %hash);

当然，如果要使用STDIN，可以将字符串定义替换为：

local $/ = undef;
$_ = <>;

然后使用一个批处理文件调用脚本：

rem How to call it...
@perl program.pl text.txt > report.txt

不要嘲笑@coin的regex fu，但使用CPAN模块的优势包括获得一个更灵活的解决方案，可以在以后的基础上构建，并利用其他人已经解决的边缘案例处理

这个解决方案使用一个JSON模块来解析传入的数据（我假设它仍然看起来像JSON），使用CSV模块来生成高质量的CSV，它负责处理数据中嵌入的引号和逗号等内容

use warnings;
use strict;

use JSON qw/decode_json/;
use Text::CSV_XS;

my $json_data_as_string = <<EOL;
{
    "geolocation": {
        "city": "Random City", 
        "region": "State", 
        "country": "US"
    }, 
    "url": "https://www4.website.com/register/1234567", 
    "timestamp": "1237854980723.0239847"
}
EOL

my $s = decode_json($json_data_as_string);

my $csv = Text::CSV_XS->new({ binary => 1 });

$csv->combine(
    $s->{geolocation}{city},
    $s->{geolocation}{region},
    $s->{geolocation}{country},
    $s->{url},
    $s->{timestamp},
) || die $csv->error_diag;;

print $csv->string, "\n";

使用警告；
严格使用；
使用JSON qw/decodeu JSON/；
使用Text:：csvxs；
我的$json_data_as_string=1}）；
$csv->联合收割机(
$s->{地理位置}{城市}，
$s->{geolocation}{region}，
$s->{地理位置}{国家}，
$s->{url}，
$s->{timestamp}，
)| |模具$csv->错误诊断；；
打印$csv->字符串“\n”；

要将文件中的数据作为字符串读入$json\u data\u，您可以使用@coin解决方案中的代码。

use。我喜欢您的答案。他们按照我希望的方式工作，但请查看我刚才对我的问题所做的编辑。查看所需的输出和我提供的重新编辑的示例数据。如果有两组数据呢？因此csv将包含我们提取的标题，然后在它下面是数据行1、数据行2等等@coin@JDE876脚本的第二个版本将输出您期望的内容：每个城市两行。但是，我建议使用JSON解析器，而不是使用正则表达式来解析数据。有没有任何可能的方法可以提供一个用JSON解析器替换正则表达式的示例@以上评论只是为了一个应用示例。否则，由于您提供的修改器（即gmx、全局和多行匹配），正则表达式确实会产生奇迹。：）@硬币

rem How to call it...
@perl program.pl text.txt > report.txt

use warnings;
use strict;

use JSON qw/decode_json/;
use Text::CSV_XS;

my $json_data_as_string = <<EOL;
{
    "geolocation": {
        "city": "Random City", 
        "region": "State", 
        "country": "US"
    }, 
    "url": "https://www4.website.com/register/1234567", 
    "timestamp": "1237854980723.0239847"
}
EOL

my $s = decode_json($json_data_as_string);

my $csv = Text::CSV_XS->new({ binary => 1 });

$csv->combine(
    $s->{geolocation}{city},
    $s->{geolocation}{region},
    $s->{geolocation}{country},
    $s->{url},
    $s->{timestamp},
) || die $csv->error_diag;;

print $csv->string, "\n";