Bash fastq文件中的修剪顺序和质量
我在目录中有一堆fastq文件,我想用2个核苷酸和质量(如果读取有51个碱基对,以CTG或TTG结尾)来修剪序列 以下是我作为shell脚本编写的内容,但我遇到了一些错误,需要帮助,因为我是shell脚本新手 输入:Bash fastq文件中的修剪顺序和质量,bash,shell,unix,awk,fastq,Bash,Shell,Unix,Awk,Fastq,我在目录中有一堆fastq文件,我想用2个核苷酸和质量(如果读取有51个碱基对,以CTG或TTG结尾)来修剪序列 以下是我作为shell脚本编写的内容,但我遇到了一些错误,需要帮助,因为我是shell脚本新手 输入: @HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG + #0<BFFFFFFFFF<BFFFIIFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAGAGAGAGGTGTTTTAGGGAGATTTTAAAATGGTTGTTTTG
+
#0不是100%理解你在做什么,但是修复了一些事情。试试下面
#!/bin/bash
for sample in *.fastq; do
name="${sample/.fastq/}"
while read -r line; do
if [[ $line == '@'* ]]; then
head="$line" && echo "$head" >> "${name}_new.fq"
elif [[ -n $head && ${#line} == 51 && $line =~ (CTG|TTG)$ ]]; then
sequence="${line:0:49}" && echo "$sequence" >> "${name}_new.fq"
elif [[ $line == '+'* ]]; then
plus="$line" && echo "$line" >> "${name}_new.fq"
else
quality="$line" && echo "$quality" >> "${name}_new.fq"
fi
done < "$sample"
done
#/bin/bash
对于*.fastq中的样本;做
name=“${sample/.fastq/}”
而read-r行;做
如果[[$line='@'*];然后
head=“$line”和&echo“$head”>>“${name}\u new.fq”
elif[-n$head&${line}==51&&$line=~(CTG | TTG)$];然后
sequence=“${line:0:49}”和&echo“$sequence”>>“${name}\u new.fq”
elif[[$line='+'*]];然后
plus=“$line”和&echo“$line”>>“${name}\u new.fq”
其他的
quality=“$line”和&echo“$quality”>>“${name}\u new.fq”
fi
完成<“$sample”
完成
示例输出
> cat sample_new.fq
> cat sample.fastq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI
> ./abovescript
> cat sample_new.fq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTT
+
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGC
+
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
>cat sample\u new.fq
>cat sample.fastq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAGAGAGAGGTGTTTTAGGGAGATTTTAAAATGGTTGTTTTG
+
#0我在创建substr时遇到错误!是否有一种方法可以拆分行并保存在变量中Shell是一个调用工具的环境。它具有编程语言结构,允许您对这些调用进行排序。awk是处理文本文件的UNIX命令。因此,到目前为止,你所做的是完全错误的方法-在shell中这样做的方法是编写一个awk脚本来解析你的文本文件,然后从shell中调用它。它删除序列中的每2个核苷酸!!!但我只想删除2个核苷酸,如果序列以CTG或TTG结尾@BroSlow@user2243831我想我真的不明白。如果一行以#
开头,您想做什么?e、 g.请参阅更新,其中一行不是51个字符,并且与其他参数匹配(例如,刚打印了一行以#
开头)。如果序列有两个条件(长度应为51,结尾有CTG或TTG),我只想将序列修剪为2个核苷酸.可能还有一些序列为51,但如果它们没有CTG或TTG,我不会对它们进行修剪。#根据情况,也应删除该行@BroSlow@user2243831再试一次。如果不是这样,你需要更新预期的输出。我只需要第二行序列根据条件修改,第四行我可以从0打印到49!我们需要substr函数@BroSlow吗
#!/bin/bash
for sample in *.fastq; do
name="${sample/.fastq/}"
while read -r line; do
if [[ $line == '@'* ]]; then
head="$line" && echo "$head" >> "${name}_new.fq"
elif [[ -n $head && ${#line} == 51 && $line =~ (CTG|TTG)$ ]]; then
sequence="${line:0:49}" && echo "$sequence" >> "${name}_new.fq"
elif [[ $line == '+'* ]]; then
plus="$line" && echo "$line" >> "${name}_new.fq"
else
quality="$line" && echo "$quality" >> "${name}_new.fq"
fi
done < "$sample"
done
> cat sample_new.fq
> cat sample.fastq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI
> ./abovescript
> cat sample_new.fq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTT
+
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGC
+
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+