Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/linux/28.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 将postgres文本日志文件转换为csv文件_Regex_Linux_Csv_Awk_Sed - Fatal编程技术网

Regex 将postgres文本日志文件转换为csv文件

Regex 将postgres文本日志文件转换为csv文件,regex,linux,csv,awk,sed,Regex,Linux,Csv,Awk,Sed,我正在尝试将文本日志格式化为csv文件 文本日志文件格式。每个条目从前缀开始(“t= %m p= %p h=%h dB=%d u= %u x=%x”),它继续作为下一个前行被认为是一行。它可能包含\n和\r转义序列 t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKL

我正在尝试将文本日志格式化为csv文件 文本日志文件格式。每个条目从前缀开始(“t= %m p= %p h=%h dB=%d u= %u x=%x”),它继续作为下一个前行被认为是一行。它可能包含\n和\r转义序列

t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG:  duration: 0.011 ms  execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)
t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 DETAIL:  parameters: $1 = '9187372'
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG:  duration: 0.005 ms  bind S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG:  duration: 0.004 ms  execute S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG:  duration: 0.018 ms  bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)
t=2019-12-19 17:00:00.102 +03 p=58042 h= db= u= x=0 LOG:  automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0
    pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen
    tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983
    buffer usage: 90 hits, 0 misses, 0 dirtied
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
在SQL语句的前缀之后,它们通常是不稳定的

如果可能的话,没有前缀也很完美,每一行的格式应如下所示:

"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.011 ms  execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)"
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","DETAIL:"," parameters: $1 = '9187372'"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.005 ms  bind S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.004 ms  execute S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.018 ms  bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)"
"2019-12-19 17:00:00.102 +03","58042","","","","0","LOG:"," automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen    tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983    buffer usage: 90 hits, 0 misses, 0 dirtied    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
regex101:

但我不确定预期行的最后一部分在将csv文件复制到db时是否会出现一些问题,因为“表”有双引号

" automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen    tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983    buffer usage: 90 hits, 0 misses, 0 dirtied    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
谢谢大家。

给你们:

将匹配组,您可以这样替换它们:

"\1","\2","\3","\4","\5","\6","\7","\8"
使用Perl的CLI中的示例:

cat file.csv|perl -pe 's/^t=(.* .*) p=(\d+) h=(.*) db=(\w+) u=(\w+) x=(\d+) (\w+:) (.*)/"\1","\2","\3","\4","\5","\6","\7","\8"/g'

使用GNU awk表示
FPAT
,3rg arg to
match()
\s/\s
[:space:]
[^[:space:]
的缩写:

$ cat tst.awk
BEGIN {
    FPAT = "[[:alnum:]]+=[^=]* "
    OFS = ","
}
/^\S/ { if (NR>1) prt() }
{ prev = prev $0 }
END { prt() }

function prt(   orig, i, a) {
    orig = $0
    $0 = prev

    match($0,/(.* )(LOG|DETAIL): +(.*)/,a)

    $0 = a[1]
    $(NF+1) = a[2]
    $(NF+1) = a[3]

    for (i=1; i<=NF; i++) {
        gsub(/^\s+|\s+$/,"",$i)
        sub(/^\S+=/,"",$i)
        gsub(/"/,"\"\"",$i)
        printf "\"%s\"%s", $i, (i<NF ? OFS : ORS)
    }

    $0 = orig
    prev = ""
}

问题中预期输出的最后一行包含
表“postgres.pgagent.pga_job”的“自动真空”:index…
,但这不是有效的CSV,因为在双引号的字符串中不能有未经转换的双引号。它必须是
“表的自动真空”“postgres.pgagent.pga_job”“:索引…”或
“表的自动真空”“postgres.pgagent.pga_job\”:索引…”
(取决于在哪个“标准”中使用哪个转义构造,请参阅,哪个工具将使用哪个转义构造)才是有效的CSV。我决定在上面的脚本中使用
,因为这是MS Excel所期望的,但是如果您需要的话,使用
”是一个很小的调整-只需将
gsub(/“/”,“\”,$I)
更改为
gsub(/“/”,“\\\\”,$I)

谢谢,但我将如何使用这个正则表达式,我需要类似“sed regex input>output”的东西。我在回答中添加了一个示例。你能在原始问题的示例日志中添加这些行吗?我们可以改进Regexnice对gawk功能的使用!一个问题:如果在
$0
上使用
match
,然后覆盖
$0
,为什么需要
FPAT
?如果把FPAT注释掉,我发现结果是不同的,但我不明白为什么。想解释一下吗?@vgersh99我正在使用match()将$0分为3部分-日志之前的部分|详细说明将使用FPAT的位置,然后是日志或详细说明,然后是日志之后的部分。这有意义吗?还是仍然很模糊?我完全理解。问题是:如果只匹配$0(不进行任何字段拆分),为什么需要定义FPAT?然后你在覆盖$0…当我在调用match()之后执行
$0=a[1]
时,$0的值为
t=2020-08-25 15:00:00.000+03 p=16205 h=127.0.0.1 db=test u=test_app x=0
,当我将
$1
设置为
t=2020-08-25 15:00.000+03时,会使用FPAT进行字段拆分,和
$2
p=16205
,等等。非常感谢@EdMorton,你能逐行解释一下,或者给我看一下与每行相关的文档吗?
$ cat tst.awk
BEGIN {
    FPAT = "[[:alnum:]]+=[^=]* "
    OFS = ","
}
/^\S/ { if (NR>1) prt() }
{ prev = prev $0 }
END { prt() }

function prt(   orig, i, a) {
    orig = $0
    $0 = prev

    match($0,/(.* )(LOG|DETAIL): +(.*)/,a)

    $0 = a[1]
    $(NF+1) = a[2]
    $(NF+1) = a[3]

    for (i=1; i<=NF; i++) {
        gsub(/^\s+|\s+$/,"",$i)
        sub(/^\S+=/,"",$i)
        gsub(/"/,"\"\"",$i)
        printf "\"%s\"%s", $i, (i<NF ? OFS : ORS)
    }

    $0 = orig
    prev = ""
}
$ awk -f tst.awk file
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","LOG","duration: 0.011 ms  execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)"
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","DETAIL","parameters: $1 = '9187372'"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG","duration: 0.005 ms  bind S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG","duration: 0.004 ms  execute S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","16205","127.0.0.1","test","test_app","0","LOG","duration: 0.018 ms  bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)"
"2019-12-19 17:00:00.102 +03","58042","","","","0","LOG","automatic vacuum of table ""postgres.pgagent.pga_job"": index scans: 0    pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen    tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983    buffer usage: 90 hits, 0 misses, 0 dirtied    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"