Bash 从文件中选择列并保存输出_Bash_Shell_For Loop_Awk

Bash 从文件中选择列并保存输出

bash shell for-loop awk

Bash 从文件中选择列并保存输出,bash,shell,for-loop,awk,Bash,Shell,For Loop,Awk,我是编程新手。我在一个目录中有许多文件，如下所示：每个文件由两列数据组成 TS.TST_X.1990-11-22 TS.TST_Y.1990-11-22 TS.TST_Z.1990-11-22 TS.TST_X.1990-12-30 TS.TST_Y.1990-12-30 TS.TST_Z.1990-12-30 首先，我只想选择具有相同名称的所有文件的第二列（仅在X、Y、Z字符串中有所不同）（TS.TST_X.1990-11-22，TS.TST_Y.1990-11-22，TS.TST_Z.1

我是编程新手。我在一个目录中有许多文件，如下所示：每个文件由两列数据组成

TS.TST_X.1990-11-22
TS.TST_Y.1990-11-22
TS.TST_Z.1990-11-22

TS.TST_X.1990-12-30
TS.TST_Y.1990-12-30
TS.TST_Z.1990-12-30

首先，我只想选择具有相同名称的所有文件的第二列（仅在X、Y、Z字符串中有所不同）（TS.TST_X.1990-11-22，TS.TST_Y.1990-11-22，TS.TST_Z.1990-11-22），并希望将输出保存在类似于TSTST19901112的文件中

同样，对于（TS.TST_X.1990-12-30，TS.TST_Y.1990-12-30，TS.TST_Z.1990-12-30）文件，也要保存输出，如TSTST19901230

例如：如果文件包含如下内容

TS.TST_X.1990-11-22                 TS.TST_Y.1990-11-22               TS.TST_Z.1990-11-22
1  2                                 1   3.4                          1    2.1
2  5                                 2   2.4                          2    4.2
3  2                                 3   1.2                          3    1.0
4  4                                 4   2.4                          4    3.5
5  8                                 5   6.3                          5    1.8

然后输出文件TST19901122如下

2   3.4    2.1
5   2.4    4.2
2   1.2    1.0
4   2.4    3.5
8   6.3    1.8

我试过密码

#!/bin/sh
for file in /home/min/data/*
do
awk '{print $2}' $file 
done

但是我编写的代码只读取所有文件的列，无法提供预期的输出。因此，我需要专家的帮助。

希望下面的示例能够帮助您开始，下次在您发布时，确保正确发布输入，以便读者能够方便地帮助您：

以下是网上：

编辑：因为OP在评论中提到实际文件名差别不大，所以在此添加相应的解决方案（因为OP中只有3种不同年份和月份的文件）

说明：添加上述内容的详细说明

for file in TS.TST_BHE*
##Going through TS.TST_BHE named files in for loop here, where variable file will have its name in it.
do
      year=${file/*\./}
      ##Creating year where removing everything till . here.
      year=${year//-/}
      ##Substituting all - with null in year variable.
      yfile=${file/BHE/BHN}
      ##Substituting BHE with BHN in file variable and saving it to yfile here.
      zfile=${file/BHE/BHZ}
      ##Substituting BHE with BHZ in file variable and saving it to zfile here.
      outfile="TSTST.$year"
      ##Creating outfile which has TSTST. with year variable value here.
      ##echo $file $yfile $zfile
      paste "$file" "$yfile" "$zfile"  | awk '{print $2,$4,$6}' > "$outfile"
      ##using paste to contenate values of 3 of the files(BHE BHN and BHZ) and printing only 2nd, 4th and 6th fields out of it.
done

基于OP的评论，您可以尝试下面的方法，我们可以简单地连接输入_文件而不检查第一列的值

for file in TS.TST_X*
do
      year=${file/*\./}
      year=${year//-/}
      yfile=${file/X/Y}
      zfile=${file/X/Z}
      outfile="TSTST.$year"
      ###echo $file $yfile $zfile ##Just to print variable values(optional)
      paste "$file" "$yfile" "$zfile"  | awk '{print $2,$4,$6}' > "$outfile"
done

对于显示样本，输出如下，上面将为显示的样本生成文件名d

TS.TST_X.19901122

cat TSTST.19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8

输入文件的以下重新创建：

cat <<EOF >TS.TST_X.2000-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.2000-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.2000-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

cat <<EOF >TS.TST_X.1990-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.1990-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.1990-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

生成以下输出：

TSTST19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
TSTST20001122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8

我建议对基本shell命令进行研究。阅读有关

查找的文档。阅读awk
和sed
脚本的介绍。阅读一篇关于bash的介绍，了解如何在bash中迭代、排序、合并和过滤文件列表。同时阅读。
您使用的是sh
还是bash
sh
（）通常不是bash
（）。欢迎使用，感谢您以代码的形式展示您的尝试。如果您想比较所有文件的第一列并比较它们的值，请告诉我们？或者我们可以简单地连接文件而不检查第一个字段？请确认once@RavinderSingh13我根本不需要第一个coulmn。@glkshu，我没有提到在输出中需要它们。我的意思是，您想比较不同文件的第一列值吗？就像文件1的第1行在1dt列中有1个，文件2的第2行在第1列中有2个一样，为了防止发生意外，它们应该连接起来？不，先生，我不需要复制/粘贴到其中，它会告诉您一些问题。其中一个可能没有告诉您的是f的（ls TS.TST*|…）；做当IFS=read-rf时，done
应为；做完成<谢谢，我将在返回系统后更正。现在mobile@EdMorton按照您的建议进行了修改，但是我仍然有一个不带引号的变量
cat TSTST.19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8

cat <<EOF >TS.TST_X.2000-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.2000-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.2000-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

cat <<EOF >TS.TST_X.1990-11-22 
1  2               
2  5              
3  2             
4  4          
5  8       
EOF
cat <<EOF >TS.TST_Y.1990-11-22
1   3.4
2   2.4
3   1.2
4   2.4
5   6.3
EOF
cat <<EOF >TS.TST_Z.1990-11-22
1    2.1
2    4.2
3    1.0
4    3.5
5    1.8
EOF

# get the filenames
find . -maxdepth 1 -name "TS.TST*" -printf "%f\n" |
# meh, sort them, so it looks nice
sort |
# group files according to suffix after the dot
awk -F. '
    { a[$3]=a[$3]" "$0 }
    END{ for (i in a) print i, a[i] }
' |
# here we have: YYYY-MM-DD  filename1 filename2 filename3
# let's transform it into TSTSTYYYYMMDD filename{1,2,3}
sed -E 's/^([0-9]{4})-([0-9]{2})-([0-9]{2})/TSTST\1\2\3/' |
while IFS=' ' read -r new f1 f2 f3; do
    # get second column from all files
    # if your awk doesn't sort files, they would have to be sorted here
    paste "$f1" "$f2" "$f3" | awk '{print $2,$4,$6}' > "$new"
done

# just output
for i in TSTST*; do echo "$i"; cat "$i"; done

TSTST19901122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8
TSTST20001122
2 3.4 2.1
5 2.4 4.2
2 1.2 1.0
4 2.4 3.5
8 6.3 1.8