Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
bash-使用前缀文件名将列添加到csv重写标题_Bash_Csv - Fatal编程技术网

bash-使用前缀文件名将列添加到csv重写标题

bash-使用前缀文件名将列添加到csv重写标题,bash,csv,Bash,Csv,我更喜欢使用bash的解决方案,而不是在python中转换为数据帧,因为文件非常大 我有一个CSV文件夹,我想合并成一个CSV。CSV都有相同的标题,但有一些例外,因此我需要用文件名作为前缀重写每个添加列的名称,以跟踪列来自哪个文件 head globcover\u color.csv glds00g.csv ==> file1.csv <== id,max,mean,90 2870316.0,111.77777777777777 2870317.0,63.8888888888888

我更喜欢使用bash的解决方案,而不是在python中转换为数据帧,因为文件非常大

我有一个CSV文件夹,我想合并成一个CSV。CSV都有相同的标题,但有一些例外,因此我需要用文件名作为前缀重写每个添加列的名称,以跟踪列来自哪个文件

head globcover\u color.csv glds00g.csv

==> file1.csv <==
id,max,mean,90
2870316.0,111.77777777777777
2870317.0,63.888888888888886
2870318.0,73.6
2870319.0,83.88888888888889


==> file2.csv <==
ogc_fid,id,_sum
"1","2870316",9.98795110916615
"2","2870317",12.3311055738527
"3","2870318",9.81535963468479
"4","2870319",7.77729743926775
我不太确定如何做到这一点,但我想我会在某个时候使用该命令。我很惊讶在stackoverflow上找不到类似的问题,但我想在同一行号上有相同id的CSV并不常见

编辑:

我猜出了第一部分

paste-d,*>../rasterjointest.txt
实现了我想要的功能,但需要更换标头

$cat tst.awk
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
    fname = FILENAME
    sub(/\.[^.]+$/,"",fname)
    for (i=1; i<=NF; i++) {
        $i = fname "_" $i
    }
}
{ row[FNR] = (NR==FNR ? "" : row[FNR] OFS) $0 }
END {
    for (rowNr=1; rowNr<=FNR; rowNr++) {
        print row[rowNr]
    }
}

$ awk -f tst.awk file1.csv file2.csv
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
开始{FS=OFS=“,”} FNR==1{ fname=文件名 子(/\.[^.]+$/,“”,fname)
对于(i=1;iawk运行了整整8分钟,然后内存不足。是否有办法仅处理和输出带有awk的头文件?
paste
似乎可以在不使用太多RAM的情况下正确连接文件创建只包含为我工作的头文件
for i in*;do head-n 1“$i”>heads/$i;done
我添加了一个替代方案,在awk脚本中使用很少的内存,因为几乎所有的工作都是通过粘贴完成的,
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
    fname = FILENAME
    sub(/\.[^.]+$/,"",fname)
    for (i=1; i<=NF; i++) {
        $i = fname "_" $i
    }
}
{ row[FNR] = (NR==FNR ? "" : row[FNR] OFS) $0 }
END {
    for (rowNr=1; rowNr<=FNR; rowNr++) {
        print row[rowNr]
    }
}

$ awk -f tst.awk file1.csv file2.csv
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
$ cat tst.awk
BEGIN {
    FS=OFS=","
    for (fileNr=1; fileNr<ARGC; fileNr++) {
        filename = ARGV[fileNr]
        if ( (getline < filename) > 0 ) {
            fname = filename
            sub(/\.[^.]+$/,"",fname)
            for (i=1; i<=NF; i++) {
                $i = fname "_" $i
            }
        }
        row = (fileNr==1 ? "" : row OFS) $0
    }
    print row
    exit
}

$ awk -f tst.awk file1.csv file2.csv; paste -d, file1.csv file2.csv | tail -n +2
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775