bash-使用前缀文件名将列添加到csv重写标题
我更喜欢使用bash的解决方案,而不是在python中转换为数据帧,因为文件非常大 我有一个CSV文件夹,我想合并成一个CSV。CSV都有相同的标题,但有一些例外,因此我需要用文件名作为前缀重写每个添加列的名称,以跟踪列来自哪个文件bash-使用前缀文件名将列添加到csv重写标题,bash,csv,Bash,Csv,我更喜欢使用bash的解决方案,而不是在python中转换为数据帧,因为文件非常大 我有一个CSV文件夹,我想合并成一个CSV。CSV都有相同的标题,但有一些例外,因此我需要用文件名作为前缀重写每个添加列的名称,以跟踪列来自哪个文件 head globcover\u color.csv glds00g.csv ==> file1.csv <== id,max,mean,90 2870316.0,111.77777777777777 2870317.0,63.8888888888888
head globcover\u color.csv glds00g.csv
==> file1.csv <==
id,max,mean,90
2870316.0,111.77777777777777
2870317.0,63.888888888888886
2870318.0,73.6
2870319.0,83.88888888888889
==> file2.csv <==
ogc_fid,id,_sum
"1","2870316",9.98795110916615
"2","2870317",12.3311055738527
"3","2870318",9.81535963468479
"4","2870319",7.77729743926775
我不太确定如何做到这一点,但我想我会在某个时候使用该命令。我很惊讶在stackoverflow上找不到类似的问题,但我想在同一行号上有相同id的CSV并不常见
编辑:
我猜出了第一部分
paste-d,*>../rasterjointest.txt
实现了我想要的功能,但需要更换标头$cat tst.awk
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
fname = FILENAME
sub(/\.[^.]+$/,"",fname)
for (i=1; i<=NF; i++) {
$i = fname "_" $i
}
}
{ row[FNR] = (NR==FNR ? "" : row[FNR] OFS) $0 }
END {
for (rowNr=1; rowNr<=FNR; rowNr++) {
print row[rowNr]
}
}
$ awk -f tst.awk file1.csv file2.csv
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
开始{FS=OFS=“,”}
FNR==1{
fname=文件名
子(/\.[^.]+$/,“”,fname)
对于(i=1;iawk运行了整整8分钟,然后内存不足。是否有办法仅处理和输出带有awk的头文件?paste
似乎可以在不使用太多RAM的情况下正确连接文件创建只包含为我工作的头文件for i in*;do head-n 1“$i”>heads/$i;done
我添加了一个替代方案,在awk脚本中使用很少的内存,因为几乎所有的工作都是通过粘贴完成的,
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
fname = FILENAME
sub(/\.[^.]+$/,"",fname)
for (i=1; i<=NF; i++) {
$i = fname "_" $i
}
}
{ row[FNR] = (NR==FNR ? "" : row[FNR] OFS) $0 }
END {
for (rowNr=1; rowNr<=FNR; rowNr++) {
print row[rowNr]
}
}
$ awk -f tst.awk file1.csv file2.csv
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
$ cat tst.awk
BEGIN {
FS=OFS=","
for (fileNr=1; fileNr<ARGC; fileNr++) {
filename = ARGV[fileNr]
if ( (getline < filename) > 0 ) {
fname = filename
sub(/\.[^.]+$/,"",fname)
for (i=1; i<=NF; i++) {
$i = fname "_" $i
}
}
row = (fileNr==1 ? "" : row OFS) $0
}
print row
exit
}
$ awk -f tst.awk file1.csv file2.csv; paste -d, file1.csv file2.csv | tail -n +2
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775