Amazon s3 正则表达式组匹配文件中的每条记录,其中换行符不是新记录的指示符

Amazon s3 正则表达式组匹配文件中的每条记录,其中换行符不是新记录的指示符,amazon-s3,regex-group,Amazon S3,Regex Group,我有一个文件如下所示: '2021-05-26T09:06:29Z UTC [ db=dev user=rdsdb pid=4268 userid=1 xid=20341064 ]' LOG: SET statement_timeout TO 120000 '2021-05-26T09:06:29Z UTC [ db=dev user=rdsdb pid=4268 userid=1 xid=20341065 ]' LOG: select 'ConnectionCheckQuery' '2021-

我有一个文件如下所示:

'2021-05-26T09:06:29Z UTC [ db=dev user=rdsdb pid=4268 userid=1 xid=20341064 ]' LOG: SET statement_timeout TO 120000
'2021-05-26T09:06:29Z UTC [ db=dev user=rdsdb pid=4268 userid=1 xid=20341065 ]' LOG: select 'ConnectionCheckQuery'
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341067 ]' LOG: SET query_group to 'stmt.18679.sql'
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: BEGIN;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: SET datestyle TO ISO;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: SET TRANSACTION READ ONLY;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: SET STATEMENT_TIMEOUT TO 300000;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: /* hash: 720a01bd6ef3747b7f0585c0a70c01e9 */

select logtime, tbl_id, trim(tbl_name) as tbl_name, col_id, src_encode, tgt_encode, scan_rows,
case
when command_phase = 'Add shadow column complete' then 0
when command_phase = 'Reset Table Partition Manager complete' then 1
when command_phase like 'Shadow Col corrupt sorted regions%' then 2
when command_phase like 'shadow cols must contain same data%' then 3
when command_phase like 'Shadow Col not conform to range partition%' then 4
when command_phase = 'Data copy phase 1 complete' then 5
when command_phase = 'Data copy phase 2 complete' then 6
when command_phase = 'Drop existing shadow column complete' then 7
else -1
end as command_phase,
t2.metadatawritten as committed
from stl_alter_column_encode_events t1, stl_commit_stats t2
where logtime > getdate() - INTERVAL '1 day' and node = -1 and t1.xid = t2.xid;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: SELECT pg_catalog.stll_alter_column_encode_events.logtime AS logtime, pg_catalog.stll_alter_column_encode_events.tbl_id AS tbl_id, btrim( pg_catalog.stll_alter_column_encode_events.tbl_name ) AS tbl_name, pg_catalog.stll_alter_column_encode_events.col_id AS col_id, pg_catalog.stll_alter_column_encode_events.src_encode AS src_encode, pg_catalog.stll_alter_column_encode_events.tgt_encode AS tgt_encode, pg_catalog.stll_alter_column_encode_events.scan_rows AS scan_rows, CASE WHEN pg_catalog.stll_alter_column_encode_events.command_phase = 'Add shadow column complete'::Char(26) THEN 0 WHEN pg_catalog.stll_alter_column_encode_events.command_phase = 'Reset Table Partition Manager complete'::Char(38) THEN 1 WHEN pg_catalog.stll_alter_column_encode_events.command_phase LIKE 'Shadow Col corrupt sorted regions%' THEN 2 WHEN pg_catalog.stll_alter_column_encode_events.command_phase LIKE 'shadow cols must contain same data%' THEN 3 WHEN pg_catalog.stll_alter_column_encode_events.command_phase LIKE 'Shadow Col not conform to range partition%' THEN 4 WHEN pg_catalog.stll_alter_column_encode_events.command_phase = 'Data copy phase 1 complete'::Char(26) THEN 5 WHEN pg_catalog.stll_alter_column_encode_events.command_phase = 'Data copy phase 2 complete'::Char(26) THEN 6 WHEN pg_catalog.stll_alter_column_encode_events.command_phase = 'Drop existing shadow column complete'::Char(36) THEN 7 ELSE -1 END AS command_phase, pg_catalog.stll_commit_stats.metadatawritten AS committed FROM pg_catalog.stll_alter_column_encode_events, pg_catalog.stll_commit_stats WHERE pg_catalog.stll_alter_column_encode_events.xid = pg_catalog.stll_commit_stats.xid AND pg_catalog.stll_commit_stats.node = -1 AND pg_catalog.stll_alter_column_encode_events.logtime > getdate() - interval '1 day'::Interval;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341068 ]' LOG: COMMIT;
'2021-05-26T09:06:42Z UTC [ db=dev user=rdsdb pid=18771 userid=1 xid=20341069 ]' LOG: SET query_group to ''
最后我想把文件转换成csv,其中有
时间戳
db
用户
pid
xid
查询
。如您所见,换行符不是新“记录”的指示器,记录可以继续换行。如何捕获每条记录,以便将其转换为具有列的csv记录?我需要使用正则表达式组吗?如果是的话,正则表达式组看起来是什么样子