Regex 如何匹配同名文件并在shell脚本中合并它们?

Regex 如何匹配同名文件并在shell脚本中合并它们?,regex,bash,shell,Regex,Bash,Shell,我正在尝试将多个文件合并到一个文件夹中,该文件夹的命名模式与目录相似 文件结构如下: 20170219-A20-L1-AB1234_S1_R1_001.txt 20170211-B21-L3-AB1234-2_S1_R1_001.txt 20170210-C20-L1-AB1234-3_S1_R1_001.txt 20170211-B21-L3-AB1234-2_S2_R1_001.txt 20170210-C20-L1-AB1234-3_S2_R1_001.txt 我的标准是找出包含\u

我正在尝试将多个文件合并到一个文件夹中,该文件夹的命名模式与目录相似

文件结构如下:

20170219-A20-L1-AB1234_S1_R1_001.txt
20170211-B21-L3-AB1234-2_S1_R1_001.txt
20170210-C20-L1-AB1234-3_S1_R1_001.txt  
20170211-B21-L3-AB1234-2_S2_R1_001.txt
20170210-C20-L1-AB1234-3_S2_R1_001.txt
我的标准是找出包含
\u S1
\u S2
的文件,并将所有
\u S1
文件合并到一个新的单个文件中,将所有
\u S2
文件合并到一个新的单个文件中

我的预期输出可以是
20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt
20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt
。我对合并文件名没有任何具体要求,但我希望这些合并文件位于同一文件夹中

我一直在尝试使用
grep
cut
命令,但是我的for循环不起作用。我发现很难理解shell中的正则表达式

请帮助我构造逻辑。

类似这样的内容:

#!/bin/bash

for i in 'S1' 'S2'
do
    cat *_"$i"_R[0-9]*_[0-9]*.txt > "$i".txt
done
使用
for
语句中给出的列表(
S1
&
S2
),使用正则表达式模式对文件进行分类,并将输出发送到列表中每个元素的单个文件。合并后的输出文件将是
S1.txt
S2.txt
。如果需要,您可以对正则表达式进行修改,使其更加严格。

类似于以下内容:

#!/bin/bash

for i in 'S1' 'S2'
do
    cat *_"$i"_R[0-9]*_[0-9]*.txt > "$i".txt
done
使用
for
语句中给出的列表(
S1
&
S2
),使用正则表达式模式对文件进行分类,并将输出发送到列表中每个元素的单个文件。合并后的输出文件将是
S1.txt
S2.txt
。如果需要,您可以使用正则表达式使其更加严格。

以下内容将有助于:

cat *_s1*  > 20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt
cat *_s2*  > 20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt
以下将有助于:

cat *_s1*  > 20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt
cat *_s2*  > 20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt

如果要搜索的文件位于工作目录中,但不会合并其他目录中的任何文件,则上述两种解决方案都适用。为了重新创建您的问题,我执行了以下操作,然后根据您的初始请求尝试解决它:

根据您的规范创建文件:

$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_{1,3}_S{1,2}_R1_001.txt
$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_S{1,2}_R1_001.txt
$ ls | wc -l
48
创建了一个参数
myText
,其中48行随机文本由Lorem Ipsum生成:

$ echo "${myText}" | wc -l
    48
myText
中的每一行中的一行赋予每个文件:

$ ls -t1 | awk '{print NR" "$0}' | while read i j; do echo "${myText}" | awk -v var=${i} 'NR==var {print}' >> ${j}; done
$ for i in `ls -t1`; do echo -n " ${i}: "; cat ${i}; done
 20170219_B21_L3_AB1234_3_S1_R1_001.txt: This is additional line two
 20170219_B21_L3_AB1234_3_S2_R1_001.txt: line three
...
 20170219_A20_L3_AB1234_S1_R1_001.txt: Phasellus ut quam eu lacus aliquet vehicula.
 20170219_A20_L1_AB1234_S1_R1_001.txt: Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
 20170219_B21_L3_AB1234_S2_R1_001.txt: Lorem ipsum dolor sit amet, consectetur adipiscing elit
然后我合并了所有…S1。。。还有…S2。。。文件(这将找到与我的标准匹配的任何文件,并且从我的主目录向下;若要追加而不是覆盖,请使用
cat>>file
而不是
cat>file
——这取决于脚本需要重新运行之前是否清理了文件):

结果:

$ for i in `ls | grep merged`; do echo; echo "--- ${i} ---"; cat ${i}; done

--- AB1234_S1_R1_001_merged.txt ---
Donec et ante tempor, hendrerit est ut, egestas massa.
Donec laoreet erat a sapien finibus venenatis.
Etiam eget urna eu ipsum dapibus aliquet.
Phasellus ut quam eu lacus aliquet vehicula.
Phasellus sed lorem ac odio rutrum vehicula.
Aliquam ac eros ut risus fringilla fringilla.
Curabitur a purus ultricies sem venenatis auctor.
Praesent dignissim justo non diam ultrices, nec fermentum lectus dictum.
Donec imperdiet mi sit amet quam iaculis rhoncus.
Nam vitae neque vehicula, consectetur dui porttitor, placerat libero.
Nulla eget diam iaculis augue interdum posuere.
Fusce a diam ac neque accumsan sagittis.
Sed feugiat mi eget augue euismod, et laoreet urna dictum.
This is additional line two
Vestibulum egestas tellus non justo fringilla viverra eget eu neque.
Aliquam porttitor nisi nec laoreet vestibulum.
Donec congue diam ut leo commodo mattis.
Quisque egestas odio sit amet diam efficitur, non accumsan magna blandit.
Donec convallis metus at iaculis pellentesque.
Nam a ligula venenatis, consectetur lectus et, dictum erat.
Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
Curabitur volutpat nibh nec leo tempus, at sagittis lacus euismod.
Mauris blandit sem ac lectus varius lobortis.
In eu ipsum et felis lobortis dictum.

--- AB1234_S2_R1_001_merged.txt ---
Aenean id orci sit amet lacus tincidunt molestie.
Duis pretium tellus dapibus lorem rhoncus, at tincidunt mauris pellentesque.
Integer hendrerit mauris sit amet nunc aliquam, id congue justo pulvinar.
Praesent dapibus augue ac enim consequat, vitae feugiat enim scelerisque.
This is additional line one
Sed sit amet dolor accumsan, commodo magna at, aliquet neque.
Quisque porttitor sapien sed orci vulputate, ac porta ante sollicitudin.
In malesuada leo sit amet purus accumsan porttitor commodo eu eros.
Integer ut odio elementum, viverra velit at, molestie nulla.
Suspendisse suscipit lorem id suscipit consectetur.
Donec vulputate nibh eget imperdiet volutpat.
Curabitur sit amet libero eget nulla viverra iaculis sit amet eget eros.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Maecenas imperdiet nisl quis arcu blandit, sed pretium mi auctor.
Sed sit amet nunc faucibus, ultricies elit quis, sodales magna.
Nulla pharetra mauris eu quam sollicitudin ornare in et metus.
Ut convallis nibh in tempus fringilla.
In ornare erat quis sodales hendrerit.
Phasellus molestie erat commodo est venenatis, ullamcorper tempus elit hendrerit.
Nam mollis ante in purus suscipit, quis facilisis risus efficitur.
Integer pellentesque sem eget diam ultrices, eget vulputate ante pharetra.
Mauris ac nisl vitae sapien lacinia ornare nec nec felis.
line three
Sed dapibus ipsum eu purus interdum, at varius libero ornare.

这是否回答了问题?

如果您正在搜索的文件位于工作目录中,但不会合并其他目录中的任何文件,则前面的两种解决方案都适用。为了重新创建您的问题,我执行了以下操作,然后根据您的初始请求尝试解决它:

根据您的规范创建文件:

$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_{1,3}_S{1,2}_R1_001.txt
$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_S{1,2}_R1_001.txt
$ ls | wc -l
48
创建了一个参数
myText
,其中48行随机文本由Lorem Ipsum生成:

$ echo "${myText}" | wc -l
    48
myText
中的每一行中的一行赋予每个文件:

$ ls -t1 | awk '{print NR" "$0}' | while read i j; do echo "${myText}" | awk -v var=${i} 'NR==var {print}' >> ${j}; done
$ for i in `ls -t1`; do echo -n " ${i}: "; cat ${i}; done
 20170219_B21_L3_AB1234_3_S1_R1_001.txt: This is additional line two
 20170219_B21_L3_AB1234_3_S2_R1_001.txt: line three
...
 20170219_A20_L3_AB1234_S1_R1_001.txt: Phasellus ut quam eu lacus aliquet vehicula.
 20170219_A20_L1_AB1234_S1_R1_001.txt: Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
 20170219_B21_L3_AB1234_S2_R1_001.txt: Lorem ipsum dolor sit amet, consectetur adipiscing elit
然后我合并了所有…S1。。。还有…S2。。。文件(这将找到与我的标准匹配的任何文件,并且从我的主目录向下;若要追加而不是覆盖,请使用
cat>>file
而不是
cat>file
——这取决于脚本需要重新运行之前是否清理了文件):

结果:

$ for i in `ls | grep merged`; do echo; echo "--- ${i} ---"; cat ${i}; done

--- AB1234_S1_R1_001_merged.txt ---
Donec et ante tempor, hendrerit est ut, egestas massa.
Donec laoreet erat a sapien finibus venenatis.
Etiam eget urna eu ipsum dapibus aliquet.
Phasellus ut quam eu lacus aliquet vehicula.
Phasellus sed lorem ac odio rutrum vehicula.
Aliquam ac eros ut risus fringilla fringilla.
Curabitur a purus ultricies sem venenatis auctor.
Praesent dignissim justo non diam ultrices, nec fermentum lectus dictum.
Donec imperdiet mi sit amet quam iaculis rhoncus.
Nam vitae neque vehicula, consectetur dui porttitor, placerat libero.
Nulla eget diam iaculis augue interdum posuere.
Fusce a diam ac neque accumsan sagittis.
Sed feugiat mi eget augue euismod, et laoreet urna dictum.
This is additional line two
Vestibulum egestas tellus non justo fringilla viverra eget eu neque.
Aliquam porttitor nisi nec laoreet vestibulum.
Donec congue diam ut leo commodo mattis.
Quisque egestas odio sit amet diam efficitur, non accumsan magna blandit.
Donec convallis metus at iaculis pellentesque.
Nam a ligula venenatis, consectetur lectus et, dictum erat.
Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
Curabitur volutpat nibh nec leo tempus, at sagittis lacus euismod.
Mauris blandit sem ac lectus varius lobortis.
In eu ipsum et felis lobortis dictum.

--- AB1234_S2_R1_001_merged.txt ---
Aenean id orci sit amet lacus tincidunt molestie.
Duis pretium tellus dapibus lorem rhoncus, at tincidunt mauris pellentesque.
Integer hendrerit mauris sit amet nunc aliquam, id congue justo pulvinar.
Praesent dapibus augue ac enim consequat, vitae feugiat enim scelerisque.
This is additional line one
Sed sit amet dolor accumsan, commodo magna at, aliquet neque.
Quisque porttitor sapien sed orci vulputate, ac porta ante sollicitudin.
In malesuada leo sit amet purus accumsan porttitor commodo eu eros.
Integer ut odio elementum, viverra velit at, molestie nulla.
Suspendisse suscipit lorem id suscipit consectetur.
Donec vulputate nibh eget imperdiet volutpat.
Curabitur sit amet libero eget nulla viverra iaculis sit amet eget eros.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Maecenas imperdiet nisl quis arcu blandit, sed pretium mi auctor.
Sed sit amet nunc faucibus, ultricies elit quis, sodales magna.
Nulla pharetra mauris eu quam sollicitudin ornare in et metus.
Ut convallis nibh in tempus fringilla.
In ornare erat quis sodales hendrerit.
Phasellus molestie erat commodo est venenatis, ullamcorper tempus elit hendrerit.
Nam mollis ante in purus suscipit, quis facilisis risus efficitur.
Integer pellentesque sem eget diam ultrices, eget vulputate ante pharetra.
Mauris ac nisl vitae sapien lacinia ornare nec nec felis.
line three
Sed dapibus ipsum eu purus interdum, at varius libero ornare.

这回答了问题吗?

如果您发布代码,我们可以更好地帮助您调试它。此外,“不工作”一词也可能是毫无帮助的含糊不清get@anon这种编辑使HTML标记看起来像是文件结构的一部分。。。可能不是你的初衷。如果你发布你的代码,我们可以更好地帮助你调试它。而且“不工作”这个词也可能是毫无帮助的含糊不清get@anon这种编辑使HTML标记看起来像是文件结构的一部分。。。可能不是有意的。