Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/magento/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Awk 比较两个文件,打印第二个文件中出现的每个图案_Awk_Compare_Text Processing - Fatal编程技术网

Awk 比较两个文件,打印第二个文件中出现的每个图案

Awk 比较两个文件,打印第二个文件中出现的每个图案,awk,compare,text-processing,Awk,Compare,Text Processing,标题可能不是不言自明的,但是,我试图在比较两个文件之后创建一个文件 File1.txt GO:0016020 GO:0043065 GO:0003713 GO:0007090 File2.txt Gene1 "GO:0016020,GO:0003713" Gene2 "GO:0016020,GO:0003713,GO:0007090" Gene3 "GO:0003713" Output.txt GO:0016020 Gene1 GO:0016020 Gene2 GO:0003713 Gene

标题可能不是不言自明的,但是,我试图在比较两个文件之后创建一个文件

File1.txt

GO:0016020
GO:0043065
GO:0003713
GO:0007090
File2.txt

Gene1 "GO:0016020,GO:0003713"
Gene2 "GO:0016020,GO:0003713,GO:0007090"
Gene3 "GO:0003713"
Output.txt

GO:0016020 Gene1
GO:0016020 Gene2
GO:0003713 Gene1
GO:0003713 Gene2
GO:0003713 Gene3
GO:0007090 Gene2
基本上我想打印file1加上第一列所在行的第一列

我尝试过以下代码:

awk 'FNR==NR{a[FNR]=$1; next}{print a[FNR],$1}' File1.txt File2.txt > output.txt
output.txt如下所示:

GO:0016020 Gene1
GO:0043065 Gene2
GO:0003713 Gene3
我只收到一个报告


是否有人能使用GNU awk 4.*帮助我实现真正的多维阵列:

$ cat tst.awk
BEGIN { FS="[ \"]+" }
NR==FNR {
    split($2,a,/,/)
    for (i=1; i in a; i++) {
        genes[a[i]][$1]
    }
    next
}
{
    if ($0 in genes) {
        for (gene in genes[$0]) {
            print $0, gene
        }
    }
}

$ awk -f tst.awk file2 file1
GO:0016020 Gene1
GO:0016020 Gene2
GO:0003713 Gene1
GO:0003713 Gene2
GO:0003713 Gene3
GO:0007090 Gene2

使用GNU awk 4.*实现真正的多维阵列:

$ cat tst.awk
BEGIN { FS="[ \"]+" }
NR==FNR {
    split($2,a,/,/)
    for (i=1; i in a; i++) {
        genes[a[i]][$1]
    }
    next
}
{
    if ($0 in genes) {
        for (gene in genes[$0]) {
            print $0, gene
        }
    }
}

$ awk -f tst.awk file2 file1
GO:0016020 Gene1
GO:0016020 Gene2
GO:0003713 Gene1
GO:0003713 Gene2
GO:0003713 Gene3
GO:0007090 Gene2

我知道这个问题是关于awk的,但我已经用php开发了一个有效的解决方案

<?php
//Read File1.txt to a varaiable
$file1 = file_get_contents("File1.txt");
//Read File2.txt to a varaiable
$file2 = file_get_contents("File2.txt");
//Creates an empty array to hold the Output
$output = array();

//matches all genes on File1.txt
preg_match_all('/GO:\d+/i', $file1, $matches, PREG_PATTERN_ORDER);

//Loop results of genes on File1.txt
foreach($matches[0] as $gene){
    //match gene number for this specific gene in File2
    preg_match_all("/(Gene\d+).*?$gene/i", $file2, $matches2, PREG_PATTERN_ORDER);
    //Loop matches of File2.txt
for ($i = 0; $i < count($matches2[1]); $i++) {
    //add to array output
    array_push($output, $gene." ".$matches2[1][$i]);
}
}

//unique genes
$output = array_unique($output);
//output the unique genes to Output.txt
foreach($output as $sortedGene){
    file_put_contents('Output.txt',$sortedGene."\n", FILE_APPEND);
}
/*
GO:0016020 Gene1
GO:0016020 Gene2
GO:0003713 Gene1
GO:0003713 Gene2
GO:0003713 Gene3
GO:0007090 Gene2
*/

我知道这个问题是关于awk的,但我已经用php开发了一个有效的解决方案

<?php
//Read File1.txt to a varaiable
$file1 = file_get_contents("File1.txt");
//Read File2.txt to a varaiable
$file2 = file_get_contents("File2.txt");
//Creates an empty array to hold the Output
$output = array();

//matches all genes on File1.txt
preg_match_all('/GO:\d+/i', $file1, $matches, PREG_PATTERN_ORDER);

//Loop results of genes on File1.txt
foreach($matches[0] as $gene){
    //match gene number for this specific gene in File2
    preg_match_all("/(Gene\d+).*?$gene/i", $file2, $matches2, PREG_PATTERN_ORDER);
    //Loop matches of File2.txt
for ($i = 0; $i < count($matches2[1]); $i++) {
    //add to array output
    array_push($output, $gene." ".$matches2[1][$i]);
}
}

//unique genes
$output = array_unique($output);
//output the unique genes to Output.txt
foreach($output as $sortedGene){
    file_put_contents('Output.txt',$sortedGene."\n", FILE_APPEND);
}
/*
GO:0016020 Gene1
GO:0016020 Gene2
GO:0003713 Gene1
GO:0003713 Gene2
GO:0003713 Gene3
GO:0007090 Gene2
*/
或者您可以在一些操作之后使用Unix“join”。Join还需要排序的文件:

sort file1.txt > file1.sort.txt
cat file2.txt|tr -d \"|tr , " "|awk '{for(i=2;i<=NF;++i)print $i,$1}'|sort|join - file1.sort.txt
sort file1.txt>file1.sort.txt
cat file2.txt | tr-d\ | tr,“| awk'{for(i=2;i或您可以在一些操作后使用Unix“join”。join还需要排序文件:

sort file1.txt > file1.sort.txt
cat file2.txt|tr -d \"|tr , " "|awk '{for(i=2;i<=NF;++i)print $i,$1}'|sort|join - file1.sort.txt
sort file1.txt>file1.sort.txt

cat file2.txt | tr-d\ | tr,“| awk'{for(i=2;ialternative
awk
无多维数组的解决方案

$ awk 'NR==FNR{a[$2]=$1;next} {for(r in a) if(r~$1) print $1,a[r]}' file2 file1
GO:0016020 Gene2
GO:0016020 Gene1
GO:0003713 Gene2
GO:0003713 Gene1
GO:0003713 Gene3
GO:0007090 Gene2

替代
awk
无多维数组的解决方案

$ awk 'NR==FNR{a[$2]=$1;next} {for(r in a) if(r~$1) print $1,a[r]}' file2 file1
GO:0016020 Gene2
GO:0016020 Gene1
GO:0003713 Gene2
GO:0003713 Gene1
GO:0003713 Gene3
GO:0007090 Gene2

谢谢@EdMorton。当我尝试代码时,我得到了一个错误:awk:syntax error在源代码第5行源文件tst.awk context是>>>基因[a[I]][您没有使用GNU awk 4.*正如我所说的,这是这个答案所必需的。与任何其他awk一样,您缺少大量有用的功能。谢谢@EdMorton。当我尝试代码时,我遇到了一个错误:awk:源代码第5行的语法错误源文件tst.awk上下文是>>基因[a[I]][您没有使用GNU awk 4.*正如我所说的,这是这个答案所必需的。与任何其他awk一样,您缺少大量有用的功能。好主意,只要
继续:无论什么
字符串都与示例中的字符串长度相同。+1我认为长度不相关,但它们应该有一个不相关的通用前缀在内容中重复。“GO:”将确保.Length是相关的,因为如果一个文件中存在
GO:001602
,那么它将匹配
GO:001602[0-9]*
在另一种情况下。好主意,只要
GO:which
字符串的长度都与示例中的长度相同。+1我认为长度不相关,但它们应该有一个通用前缀,不会在内容中重复。“GO:将确保.Length是相关的,因为如果一个文件中存在
GO:001602
,那么它将匹配另一个文件中的
GO:001602[0-9]*