bash变量重复，仅在命令中使用一个实例_Bash

bash变量重复，仅在命令中使用一个实例

bash

bash变量重复，仅在命令中使用一个实例,bash,Bash,在下面的bash中，我将遍历.fastq文件对，并在注释命令中使用它们。变量$pre中有名称，它确实提取了它，但我不明白的问题是如何在注释命令中只使用它一次？在下面的示例中，$pre是NA11111，但提取了两次。有没有办法在命令中只使用一次？我尝试过用awk删除重复项，但运气不好，尝试过cut。谢谢：） Bash for file in /home/cmccabe/Desktop/fastq/*.fastq ; do sample=${file%.fastq} bname=`basena

在下面的

bash

中，我将遍历

.fastq

文件对，并在注释命令中使用它们。变量

$pre

中有名称，它确实提取了它，但我不明白的问题是如何在注释命令中只使用它一次？在下面的示例中，

$pre

是

NA11111

，但提取了两次。有没有办法在命令中只使用一次？我尝试过用

awk

删除重复项，但运气不好，尝试过

cut

。谢谢：）

Bash

 for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
 sample=${file%.fastq}
 bname=`basename $sample`
 pre="$(echo $bname|cut -d- -f1,1)"

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam
   echo "$sample.fastq"
   echo "$sample"
   echo "$pre"
   done

电流输出

/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001         `this is $sample`
NA11111                                                                   `this is $pre`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001         `this is $sample`
NA11111                                                                   `this is $pre`

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam

$sample.fastq = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq
$sample = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001
$pre = NA11111

所需输出

/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001         `this is $sample`
NA11111                                                                   `this is $pre`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001         `this is $sample`
NA11111                                                                   `this is $pre`

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam

$sample.fastq = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq
$sample = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001
$pre = NA11111

最简单的方法就是记录你已经看过的东西，如果当前文件匹配，则跳过该文件

declare -A seen=()

for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
  sample=${file%.fastq}
  bname=$(basename "$sample")
  pre=${name%%-*}

  # Go to the next file if $pre has already been seen
  [[ -v seen[$pre] ]] && continue

  # Remember that we've now seen $pre
  seen[$pre]=

  bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" "/home/cmccabe/Desktop/fastq/${pre}_aln.sam"
done

最简单的方法就是记录你已经看过的东西，如果当前文件匹配，则跳过该文件

declare -A seen=()

for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
  sample=${file%.fastq}
  bname=$(basename "$sample")
  pre=${name%%-*}

  # Go to the next file if $pre has already been seen
  [[ -v seen[$pre] ]] && continue

  # Remember that we've now seen $pre
  seen[$pre]=

  bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" "/home/cmccabe/Desktop/fastq/${pre}_aln.sam"
done

我认为你正在努力实现以下目标：

for file in /home/cmccabe/Desktop/fastq/*_R1_*.fastq
do
    file2=$(echo $file | sed 's/_R1_/_R2_/')
    sample=$(basename $file .fastq | cut -d- -f1)

    bwa mem -M -t 16 -R "@RG\tID:$sample\tSM:$sample" /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta $file $file2 > /home/cmccabe/Desktop/fastq/${sample}_aln.sam
done

在我看来，这是处理数据的最佳常识。我假设您将需要两端，并且您将对结果进行后处理，因此需要ReadGroup行。

我认为您正在尝试实现以下目标：

for file in /home/cmccabe/Desktop/fastq/*_R1_*.fastq
do
    file2=$(echo $file | sed 's/_R1_/_R2_/')
    sample=$(basename $file .fastq | cut -d- -f1)

    bwa mem -M -t 16 -R "@RG\tID:$sample\tSM:$sample" /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta $file $file2 > /home/cmccabe/Desktop/fastq/${sample}_aln.sam
done

在我看来，这是处理数据的最佳常识。我假设您将需要两端，并且您将对结果进行后处理，因此需要ReadGroup行。

在

R1

和

R2

文件中都可以看到

NA11111

值。那么，使用什么逻辑来确定R1或R2是您想要的文件呢？当然，一旦您找到了

NA11111

的文件，其他带有

NA11111

的文件将被分类？如果是这样，您可以提取当前的

NA？？？？

值，列出带有该前缀的文件，并仅保留第一个（

head-1

）。您是否有意忽略其中包含“R2”的.fastq文件？为什么要尝试对端实验进行单端对齐？是否有任何理由放弃一半的数据？

bwa

命令使用同一样本的R1和R2进行配对的最终维护，但这是重复的

NA11111

。我错过什么了吗？谢谢：）谢谢你的捕获，我想我在命令中遗漏了一些东西，需要修改它。这对我来说有点新，习惯于使用单端对齐。谢谢：）。在

R1

和

R2