Linux 如何使用特定模式从TXT或CSV中删除行_Linux_Bash_Csv_Awk_Sed

Linux 如何使用特定模式从TXT或CSV中删除行

linux bash csv awk sed

Linux 如何使用特定模式从TXT或CSV中删除行,linux,bash,csv,awk,sed,Linux,Bash,Csv,Awk,Sed,我有一个txt文件，格式如下：其目的是删除以单词“小计组1”或“小计组2”或“总计”开头的行（此类字符串始终位于行的开头），但仅当行的其余部分具有空白字段（或填充空格）时，我才需要删除它们它可以通过awk或sed（1个过程）实现，但我目前使用3个单独的步骤（每个文本一个步骤）。更通用的语法会更好。谢谢大家我的txt文件如下所示： Some Generic Headers at the beginning of the file ===============================

我有一个txt文件，格式如下：

其目的是删除以单词“小计组1”或“小计组2”或“总计”开头的行（此类字符串始终位于行的开头），但仅当行的其余部分具有空白字段（或填充空格）时，我才需要删除它们

它可以通过awk或sed（1个过程）实现，但我目前使用3个单独的步骤（每个文本一个步骤）。更通用的语法会更好。谢谢大家

我的txt文件如下所示：

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_________________________________________________________________________
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1                                                2123.82
Subtotal Group 1
Subtotal Group 1

========================================================================
GROUP 2
========================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2                                                 612.79
Subtotal Group 2
_______________________________________________________________________
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total                                                      1511.03

我试图实现的目标输出是：

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_______________________________________________________________________
Subtotal Group 1                                                2123.82

=======================================================================
GROUP 2
=======================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2                                                 612.79
_______________________________________________________________________
Grand total                                                     1511.03

你可以做：

grep -v -P "^(Subtotal Group \d+|Grand total)[,\s]*$" inputfile > outputfile

根据评论编辑。

第二次编辑：适应新的规格问题不太清楚，目标是保留总计/小计行，还是删除它们

此外，还不清楚“#*”注释是输入文件的实际部分，还是仅仅是描述性的

幸运的是，这两个都是次要的细节。使用

perl

，这相当简单：

$ perl -n -e 'print if /^(Subtotal|Grand Total),(,| |#.*)*/' inputfile
Subtotal,,,                     #This is unuseful --> To be removed
Subtotal,,,                     #This is unuseful --> To be removed
Subtotal,,,125.40               #This is a good line
Subtotal,,,                     #This is unuseful --> To be removed
Grand Total,,,                  #This is unuseful --> To be removed
Grand Total,,,125.40            #This is a good line

这假设您希望保留总计行和小计行，并删除所有其他行

要以另一种方式执行此操作，要删除总计/小计行并保留其他行，请将

if

关键字替换为

除非

如果注释实际上不在输入文件中，则只需稍微调整模式：

perl -n -e 'print if /^(Subtotal|Grand Total),(,| )*/' inputfile

这也会忽略任何额外的空白。如果您想让空白变得重要，这将变成：

perl -n -e 'print if /^(Subtotal|Grand Total),(,)*/' inputfile

就像我说的，即使你的问题不是100%清楚，不清楚的部分只是一些小细节

perl

将轻松处理各种可能性

如示例所示，

perl

将在标准输出上打印编辑的

inputfile

。为了用编辑的内容替换

inputfile

，只需将

-i

选项添加到命令中（在

-e

选项之前）。

如果

好的

行总是以数字结尾，而

任何文本

行都没有，则可以使用：

sed -n '/^.*[0-9]$/p' file

其中

-n

将禁止打印图案空间，并且您将只输出以

[0-9]

结尾的行。给定示例文件，输出为：

Subtotal                                         2123.82
Total                                             625.80
Any Word                                         9999.99

还有一个awk解决方案的尝试

awk -F, '{for(i=2;i<=NF;i++){if($i~/[0-9.-]+/){print $0;next}}}' falzone
Subtotal,,,125.40               
Grand Total,,,125.40            
Any other text,,,9999.99

这就是格雷普发明的工作：

$ grep -Ev '^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$' file
Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_________________________________________________________________________
Subtotal Group 1                                                2123.82

========================================================================
GROUP 2
========================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2                                                 612.79
_______________________________________________________________________
Grand total                                                      1511.03

如果愿意，您可以在

awk

或

sed

中使用相同的regexp：

awk '!/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/' file
sed -E '/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/d' file

什么是

Field1，…

数字？除了小计或总计之外，他们还有什么别的开始吗？@David你是对的，这很混乱，我要编辑这个问题。谢谢。@EdMorton我有一个CSV（你在前几天帮了我很多忙，把它格式化并转换成一个可读的格式化txt，并对齐）。现在我获得了一个几乎可以打印的txt，最后要解决的问题是删除多余的无用行。也许以前可以使用更高效的编码，但我不太擅长在一个脚本中找出所有步骤，所以我会循序渐进。谢谢你，艾德！PD：如果为了避免“离题”，我必须删除帖子或重新措辞，我可以这样做。@EdMorton我完全同意，我只是重新措辞，很抱歉让人困惑！没有必要使用

cat

和

，只需使用：

grep-v“，，$”infle>outfile

我想你是那种像阅读母语一样阅读代码的人。非常有才华的程序员，5颗星！我希望不是因为它是口头的，而不是书面的（当它被书写时，语音拼写使得它非常易变并且难以阅读！）：-）。

awk '!/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/' file
sed -E '/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/d' file