Shell 除了行的最后一行，如何从管道删除的文件中删除额外的换行符？_Shell_Unix_Awk_Sed_Scripting

Shell 除了行的最后一行，如何从管道删除的文件中删除额外的换行符？

shell unix awk sed scripting

Shell 除了行的最后一行，如何从管道删除的文件中删除额外的换行符？,shell,unix,awk,sed,scripting,Shell,Unix,Awk,Sed,Scripting,我有一个包含以下数据的示例文件 No|Name|sal 1|abc|4500 2|gkdjkh|554 3|fgh cvb|678 4|tyu|789 5|ghl tyu|5677 6|yyui tyui uui|780 7|tpo|567 我需要输出数据如下 No|Name|sal 1|abc|4500 2|gkdjkh|554 3|fgh cvb|678 4|tyu|789 5|ghl tyu|5677 6|yyui

我有一个包含以下数据的示例文件

No|Name|sal  
1|abc|4500  
2|gkdjkh|554  
3|fgh  
cvb|678  
4|tyu|789  
5|ghl  
tyu|5677  
6|yyui  
tyui  
uui|780  
7|tpo|567

我需要输出数据如下

No|Name|sal  
1|abc|4500  
2|gkdjkh|554  
3|fgh cvb|678  
4|tyu|789  
5|ghl tyu|5677  
6|yyui tyui uui|780  
7|tpo|567

在我的测试中，Perl而不是sed似乎工作得很好，比sed更好：

$ perl -pe 's/^[0-9]+[|]/\0$&/g; s/\n/ /g; s/^\0/\n/g' file
No|Name|sal 
1|abc|4500 
2|gkdjkh|554 
3|fgh cvb|678 
4|tyu|789 
5|ghl tyu|5677 
6|yyui tyui uui|780 
7|tpo|567

awk解决方案（基于处理输入文件的下一行）：

重新排列\u字段。awk脚本：

#!/bin/awk -f
BEGIN{ FS="|" } 
{
    if (NR == 1) {print $0}  # print the first header line as is
    else {
        if (NF == 3) { print $0 }
        else { 
            while ((getline nl) > 0) {     # processing each next line
            if (nl !~ /^[0-9]+\|/) {   # if it's not a regular line (with starting order digit i.e. `1|`)
                    if (prepend) { 
                        $0 = prepend" "$0  # prepend the last partial line if exists
                    }
                    $0 = $0" "nl;          # append to previous line 
                    gsub(/[[:space:]]+/," ",$0)  # remove redundant spaces
                } 
                else {
                    if (nl !~ /.+\|.+\|.+/) { # if a loop ends up with line which starts with order number 
                                              # but hasn't enough fields
                        prepend = nl
                        print $0
                    } 
                    else {
                        prepend = ""
                        print $0 RS nl        # next line is a regular valid line
                    } 
                    break
                }
            }
        }
    }
}

用法：

awk -f rearrange_fields.awk yourfile

输出：

No|Name|sal  
1|abc|4500  
2|gkdjkh|554  
3|fgh cvb|678 
4|tyu|789  
5|ghl tyu|5677 
6|yyui tyui uui|780 
7|tpo|567

使用regex的gawk-only解决方案用于

RS

和gawk-only

RT

内置。（对于不同数量的字段，请将

{2}

更改为比字段数量少一个。）

awk

适用于这个问题，但我找到了一个使用

sed

和

grep

的解决方案
一个困难的部分是如何处理没有分隔符的行。您可以使用将这些行与前一行连接起来（输入中没有\d008和\r字符）

现在，我们可以将所有行连接到一个行字符串（用下一个

grep

所需的标记替换\n），并获得所需的子字符串。将-P用于特殊字符

\r

sed 's/^[^|]*$/\d008&\d008/' inputfile | tr '\n' '\r' |
   sed -r "s/\r\d008([^\d008]*)\d008/\1/g" |
   grep -Po "([^|]*\|){2}[^|\r]*" |
   tr -d '\r'

上面的解决方案对于OP来说太慢（也很复杂），但比使用

while循环快得多：
while IFS= read -r line; do
   # process header, determine nr of pipes
   if [ -z "${slashes}" ]; then           
      slashes=${line//[^|]}               
      n_slashes=${#slashes}               
      printf "%s\n" "${line}"             
      lastslashes=0                       
      continue
   fi
   # You have to print previous line when you have the required fields
   # and the next line has new fields
   new_slashes=${line//[^|]}
   n_new_slashes=${#new_slashes}
   if (( ${n_new_slashes} + ${lastslashes} > ${n_slashes} )); then
      printf "%s\n" "${last}"
      last="${line}"
      lastslashes=${n_new_slashes}
   else
      # Append new line to last one
      last="${last}${line}"
      ((lastslashes+=n_new_slashes))
   fi
done < inputfile
echo "${last}"

您必须添加您尝试的代码并解释它是如何失败的…谢谢@WalterA..但是如果我在最后添加换行符，这个解决方案就不起作用了field@Tarun只是无视我在评论中给出的部分解决方案。很难找到任何方法，我理解你没有找到解决方案的起点。您总是可以使用while循环，遍历一行并使用大量变量，但这将是一个缓慢的解决方案awk
是正确的工具，但我尝试在没有awk
的情况下解决它。我没有通过挑战。假设你有三行：a | b | c
，d
，和e | f | g
。d
应该附加到第一行还是第二行的一部分？如果文件中有3个带|分隔符的字段，那么这个“d”应该作为第一行的一部分附加。这假设
不是输入数据的一部分。。例如\u cvb | 678
。。。更好的解决方案可能是仅依靠字段分隔符perl-pe'while（！/\\\\\\\\\\//）{chomp；$n=；$\.=$n}s/+//g'
。。。替代品是处理OP中存在的多余空间sample@Sundeep事实上，该方法假设数据中没有uu。使用能保持正常工作的\0
怎么样<代码>perl-pe的/^[0-9]+[|]/\0$&/g；s/\n//g；s/^\0/\n/g'文件

@Sundeep顺便说一句，我正在使用数据中的

\uu

甚至

\u6

进行一些测试，但工作正常。我无法“打破”我的解决方案。原因是最后一个表达式

s/^\u/\n/g

-使用

保存了游戏。感谢George&Sundeep的快速响应…我认为这特别适用于只有3个字段的文件…如果我们想将字段数作为参数传递呢？@TarunPant 1）如果输入数据不正确，解决方案也不正确。2）在阿尔法字符的情况下，任何解决方案都不可能工作。如果所有行都以字母数字开头，脚本如何知道哪一行必须与前一行连接？谢谢@Roman。但是，如果我将所有3个字段都作为字符串，该怎么办？@TarunPant，您在问题中没有提到这一点。这似乎是另一种情况，接下来的3行如何：

1 | abc | 45

，

2 | gkdjkh | 554

？谢谢@jas。但是，当我在服务器上运行此命令时，它会给出空白输出。您知道这个问题吗？它在本地计算机上工作，但在服务器上不工作？请检查版本（

awk--version

）。此解决方案仅适用于gawk，并且仅适用于gawk版本3.0或更高版本。GNU Awk 3.1.7这是我的版本。该版本应该可以。两台服务器之间的数据有什么不同吗？谢谢Walter。它可以很好地工作，但是在大文件的情况下需要很长时间才能执行。如果您想要更快地执行它（并且不希望使用awk/perl/C/python/），您需要所有信息：第一列（除了标题）是否总是用数字填充？提升最后一列是否也只有数字（整数？）换行符是否只在第二个字段中？我有多个文件要处理，但字段数不同。我正在寻找一个通用的解决方案，可以适用于所有文件。您提供的解决方案是一种通用解决方案，无论字段数据类型如何，都能正常工作。唯一的问题是处理大文件需要时间。新行可以出现在除第一个字段之外的任何字段中。第一列可能包含字母数字字符。最后一列的数据类型可能因文件而异。我编辑了我的解决方案，并添加了一个漂亮的

awk

-脚本。

awk

工作正常，希望对您来说足够快。使用我的testfile，它比使用

sed

和

grep

的原始脚本快3倍。

sed 's/^[^|]*$/\d008&\d008/' inputfile | tr '\n' '\r' |
   sed -r "s/\r\d008([^\d008]*)\d008/\1/g" |
   grep -Po "([^|]*\|){2}[^|\r]*" |
   tr -d '\r'

while IFS= read -r line; do
   # process header, determine nr of pipes
   if [ -z "${slashes}" ]; then           
      slashes=${line//[^|]}               
      n_slashes=${#slashes}               
      printf "%s\n" "${line}"             
      lastslashes=0                       
      continue
   fi
   # You have to print previous line when you have the required fields
   # and the next line has new fields
   new_slashes=${line//[^|]}
   n_new_slashes=${#new_slashes}
   if (( ${n_new_slashes} + ${lastslashes} > ${n_slashes} )); then
      printf "%s\n" "${last}"
      last="${line}"
      lastslashes=${n_new_slashes}
   else
      # Append new line to last one
      last="${last}${line}"
      ((lastslashes+=n_new_slashes))
   fi
done < inputfile
echo "${last}"

awk -F '|' 'NR==1 {
        nfields=NF;
        lastfields=0;
        print
        next
        }
   NF+lastfields-1 > nfields { print last;last=$0; lastfields=NF; next }
   {lastfields+=NF-1} # Concat two fields, so substract 1
   {last=last $0}
   END {print last}
   ' inputfile