Bash 猛击？-将文件合并到CSV中_Bash_Csv_Text_Data Manipulation

Bash 猛击？-将文件合并到CSV中

bash csv text

Bash 猛击？-将文件合并到CSV中,bash,csv,text,data-manipulation,Bash,Csv,Text,Data Manipulation,我知道（请参阅）如果每个文件都包含一列，则可以使用粘贴将多个文件合并到一个.csv文件中 i、 e.paste-d，“column1.dat column2.dat column3.dat…”myDat.csv将导致 myDat.csv column1, column2, column3, ... c1-1, c2-1, c3-1, ... c1-2, c2-2, c3-2, ... ... ... ...

我知道（请参阅）如果每个文件都包含一列，则可以使用

粘贴

将多个文件合并到一个

.csv

文件中

i、 e.

paste-d，“column1.dat column2.dat column3.dat…”myDat.csv

将导致

myDat.csv

column1,   column2,   column3, ...
c1-1,      c2-1,      c3-1,    ...
c1-2,      c2-2,      c3-2,    ...
...        ...        ...

（不带选项卡。只需插入它们以使其更具可读性）

如果我有多个测量值呢

e、 g

file1.dat

具有格式

file2.dat

具有格式

file3.dat

具有格式

我最终想要一个类似csv的

<xvalue>, <y1value>, <y2value>, <empty column>, <uvalue>, <vvalue>

，

现在如何合并文件

编辑

请注意，尽管每个文件都已排序（如果未排序，则可以排序），但它们不一定在同一行中包含相同的XValue

如果一个文件没有另一个文件所具有的xvalue，则其相应的列条目应为空

（事实上，我认为删除所有文件中不存在的xvalues行也应该有效。）

只使用进程替换

paste -d, > myDat.csv \
  file1.dat \
  <(cut -d' ' -f2 file2.dat) \
  /dev/null \
  <(cut -d' ' -f2,3 file3.dat)

paste-d，>myDat.csv\
file1.dat\
只是使用过程替换
paste -d, > myDat.csv \
  file1.dat \
  <(cut -d' ' -f2 file2.dat) \
  /dev/null \
  <(cut -d' ' -f2,3 file3.dat)

paste-d，>myDat.csv\
file1.dat\
您可以使用paste
组合所有文件，然后使用awk
仅打印所需的列（包括空列）：
请注意，列$3
和$5
被排除在awk
命令之外，因为它们与列$1
相同（即它们都是
）。
您可以使用粘贴组合所有文件，然后使用awk
只打印所需的列（包括空列）:
请注意，列$3
和$5
被排除在awk
命令之外，因为它们与列$1
相同（即它们都是
）。
好的，下面是我在Gnu awk中的解决方案，它试图成为一个更通用的解决方案，并使用外部工具处理多余的空列。它在GNUAWK中，因为它使用多维数组，但也可能很容易推广到其他awk
该程序将希望每个文件的第一个字段作为键列的字段合并在一起。如果未找到要加入的键，它将创建一个新键，并在输出时将不存在的字段输出为空（注意数据文件中下面的键x_3
、x_4
和x_5
）
首先是数据文件：
$ cat file[123].dat             # 3 files, separated by empty lines for clarity
x_1 y1_1
x_2 y1_2
x_3 y1_3

x_1 y2_1
x_2 y2_2
x_4 y2_4

x_1 u_1 v_1
x_2 u_2 v_2
x_5 u_5 v_5

以及守则：
$ cat program.awk
BEGIN { OFS=", " }
FNR==1 { f++ }                                # counter of files
{
    a[0][$1]=$1                               # reset the key for every record 
    for(i=2;i<=NF;i++)                        # for each non-key element
        a[f][$1]=a[f][$1] $i ( i==NF?"":OFS ) # combine them to array element
}
END {                                         # in the end
    for(i in a[0])                            # go thru every key
        for(j=0;j<=f;j++)                     # and all related array elements
            printf "%s%s", a[j][i], (j==f?ORS:OFS)
}                                             # output them, nonexistent will output empty

好的，这是我在Gnu awk中的解决方案，它试图成为一个更通用的解决方案，并使用外部工具处理额外的空列。它在GNUAWK中，因为它使用多维数组，但也可能很容易推广到其他awk
该程序将希望每个文件的第一个字段作为键列的字段合并在一起。如果未找到要加入的键，它将创建一个新键，并在输出时将不存在的字段输出为空（注意数据文件中下面的键x_3
、x_4
和x_5
）
首先是数据文件：
$ cat file[123].dat             # 3 files, separated by empty lines for clarity
x_1 y1_1
x_2 y1_2
x_3 y1_3

x_1 y2_1
x_2 y2_2
x_4 y2_4

x_1 u_1 v_1
x_2 u_2 v_2
x_5 u_5 v_5

以及守则：
$ cat program.awk
BEGIN { OFS=", " }
FNR==1 { f++ }                                # counter of files
{
    a[0][$1]=$1                               # reset the key for every record 
    for(i=2;i<=NF;i++)                        # for each non-key element
        a[f][$1]=a[f][$1] $i ( i==NF?"":OFS ) # combine them to array element
}
END {                                         # in the end
    for(i in a[0])                            # go thru every key
        for(j=0;j<=f;j++)                     # and all related array elements
            printf "%s%s", a[j][i], (j==f?ORS:OFS)
}                                             # output them, nonexistent will output empty

这假设文件在xvalue上是一致的，不是吗？是的。如果不是这样，则必须在内存中缓冲多个文件，然后查找正确的行。我该怎么做？这假设文件在x值上一致，不是吗？确实如此。如果不是这样，则必须在内存中缓冲多个文件，然后查找正确的行。我该如何做？每个文件是否已排序？file2.dat是否包含一个不在file1.dat中的xvalue
，或者正好相反？@andlrc对已排序的文件是肯定的（如果不是，在合并文件之前对它们进行排序应该不会太困难）。不幸的是，xvalue
不一致也是肯定的，每个文件都排序了吗？file2.dat是否包含一个不在file1.dat中的xvalue
，或者正好相反？@andlrc对已排序的文件是肯定的（如果不是，在合并文件之前对它们进行排序应该不会太困难）。不幸的是，xvalue
不一致也是肯定的，与上述Andrlrc的回答相同：它假设文件同意xvalue，这不一定是事实。与上述Andrlrc的回答相同的问题：它假设文件同意xvalue，这不一定是事实。