AWK：解析有向图/根据与输入文件部分匹配的字段从文件中提取记录问题_Awk_Graph

AWK：解析有向图/根据与输入文件部分匹配的字段从文件中提取记录问题

awk graph

AWK：解析有向图/根据与输入文件部分匹配的字段从文件中提取记录问题,awk,graph,Awk,Graph,所以我有一个包含三个字段的输入文件。它基本上是一个描述有向图类型的列表。第一个字段是起始节点，第二个连接类型（在本例中有多个），最后一个是投影到的节点问题是，这是一个非常大且难以控制的直接图，我只对一些路径感兴趣。所以我想提供一个输入文件，其中包含我关心的节点名称。如果在图形文件的第一个或第三个字段中提到了节点，那么我需要整个记录（因为路径类型可能会有所不同）问题: 如何仅提取有向图的某些记录另外，如何只提取那些通过最多一个邻居连接感兴趣节点的路径（即感兴趣节点可以是第二个最近邻居）要求

所以我有一个包含三个字段的输入文件。它基本上是一个描述有向图类型的列表。第一个字段是起始节点，第二个连接类型（在本例中有多个），最后一个是投影到的节点

问题是，这是一个非常大且难以控制的直接图，我只对一些路径感兴趣。所以我想提供一个输入文件，其中包含我关心的节点名称。如果在图形文件的第一个或第三个字段中提到了节点，那么我需要整个记录（因为路径类型可能会有所不同）

问题: 如何仅提取有向图的某些记录

另外，如何只提取那些通过最多一个邻居连接感兴趣节点的路径（即感兴趣节点可以是第二个最近邻居）

要求我正在努力改进我的AWK编程，这就是为什么1）我想在AWK中实现这一点，2）我非常希望对代码进行详细的解释：）

问题的例子输入文件：

A  
C  
D

要分析的文件：

A -> B  
A -> C  
A -> D  
B -> A  
B -> D  
C -> E  
D -> F  
E -> B  
E -> F  
F -> C

输出：

A -> B  
A -> C  
A -> D  
B -> A   
B -> D    
C -> E   
D -> F 
F -> C

奖金示例：

 A -> B -> D  -> F -> C

如果我正确理解了您的问题，那么这将可以：

awk 'NR==FNR { data[$1] = 1; next } $1 in data || $3 in data { print }' graph[12]

工作原理：读取第一个文件时，将所有感兴趣的节点添加到

数据中。在读取第二个文件时，仅打印字段1或字段3位于数据中的行，即是一个有趣的节点。
如果我正确理解您的问题，则可以执行以下操作：
awk 'NR==FNR { data[$1] = 1; next } $1 in data || $3 in data { print }' graph[12]

工作原理：读取第一个文件时，将所有感兴趣的节点添加到数据中。读取第二个文件时，仅打印字段1或字段3在数据中的行，即是一个有趣的节点。
获得奖金：
function left(str) { # returns the leftmost char of a given edge (A -> B)
    return substr(str,1,1)
}
function right(str) { # returns the rightmost...
    return substr(str,length(str),1)
}

function cmp_str_ind(i1, v1, i2, v2) # array travese order function
{ # this forces the start from the node in the beginning of input file
    if(left(i1)==left(a)&&left(i2)!=left(a)) # or leftmost value in a
        return -1
    else if(left(i2)==left(a)&&left(i1)!=left(a))
        return 1
    else if(i1 < i2)
        return -1
    return (i1 != i2)
}

function trav(a,b,c,d) { # goes thru edges in AWK order
#   print "A:"a," C:"c," D:"d
    if(index(d,c)||index(d,right(c))) {
        return ""
    }
    d=d", "c  # c holds the current edge being examined
    if(index(a,right(c))) { # these edges affect a
#       print "3:"c
        sub(right(c),"",a)
        if(a=="") { # when a is empty, path is found
            print d # d has the traversed path
            exit
        }
        for (i in b) {
            if(left(i)==right(c)) # only try the ones that can be added to the end
                trav(a,b,i,d)
        }
        a=a""right(c)
    } else {
#   print "4:"c
        for (i in b)
            if(left(i)==right(c))
                trav(a,b,i,d)
    }
}
BEGIN { # playing with the traverse order
    PROCINFO["sorted_in"]="cmp_str_ind"
}  
NR==FNR {
    a=a""$0 # a has the input (ADC)
    next
}
{
    b[$0]=$0 # b has the edges
}
END {           # after reading in the data, recursively test every path
    for(i in b) # candidate pruning the unfit ones first. CLR or Dijkstra 
        if(index(a,left(i))) {        # were not consulted on that logic.
#           print "3: "i
            sub(left(i),"",a)
            trav(a,b,i,left(i))
            a=a""left(i)
        }
        else {
#           print "2: "i
            trav(a,b,i,left(i))
        }
}
$ awk -f graph.awk input parse
A, A -> D, D -> F, F -> C

函数left（str）{#返回给定边（a->B）的最左边字符
返回substr（str，1,1）
}
函数right（str）{#返回最右边的。。。
返回substr（str，长度（str），1）
}
函数cmp_stru_ind（i1，v1，i2，v2）#数组travese order函数
{#这将强制从输入文件开头的节点开始
如果（左（i1）=左（a）和左（i2）！=左（a））#或a中最左边的值
返回-1
else if（左（i2）=左（a）和左（i1）！=左（a））
返回1
否则如果（i1D，D->F，F->C

如果取消注释开始
部分，将得到A，A->B，B->D，D->F，F->C
。我知道，我应该做得更多，评论得更好，但现在已经是午夜了。也许明天。
争取奖金：
function left(str) { # returns the leftmost char of a given edge (A -> B)
    return substr(str,1,1)
}
function right(str) { # returns the rightmost...
    return substr(str,length(str),1)
}

function cmp_str_ind(i1, v1, i2, v2) # array travese order function
{ # this forces the start from the node in the beginning of input file
    if(left(i1)==left(a)&&left(i2)!=left(a)) # or leftmost value in a
        return -1
    else if(left(i2)==left(a)&&left(i1)!=left(a))
        return 1
    else if(i1 < i2)
        return -1
    return (i1 != i2)
}

function trav(a,b,c,d) { # goes thru edges in AWK order
#   print "A:"a," C:"c," D:"d
    if(index(d,c)||index(d,right(c))) {
        return ""
    }
    d=d", "c  # c holds the current edge being examined
    if(index(a,right(c))) { # these edges affect a
#       print "3:"c
        sub(right(c),"",a)
        if(a=="") { # when a is empty, path is found
            print d # d has the traversed path
            exit
        }
        for (i in b) {
            if(left(i)==right(c)) # only try the ones that can be added to the end
                trav(a,b,i,d)
        }
        a=a""right(c)
    } else {
#   print "4:"c
        for (i in b)
            if(left(i)==right(c))
                trav(a,b,i,d)
    }
}
BEGIN { # playing with the traverse order
    PROCINFO["sorted_in"]="cmp_str_ind"
}  
NR==FNR {
    a=a""$0 # a has the input (ADC)
    next
}
{
    b[$0]=$0 # b has the edges
}
END {           # after reading in the data, recursively test every path
    for(i in b) # candidate pruning the unfit ones first. CLR or Dijkstra 
        if(index(a,left(i))) {        # were not consulted on that logic.
#           print "3: "i
            sub(left(i),"",a)
            trav(a,b,i,left(i))
            a=a""left(i)
        }
        else {
#           print "2: "i
            trav(a,b,i,left(i))
        }
}
$ awk -f graph.awk input parse
A, A -> D, D -> F, F -> C

函数left（str）{#返回给定边（a->B）的最左边字符
返回substr（str，1,1）
}
函数right（str）{#返回最右边的。。。
返回substr（str，长度（str），1）
}
函数cmp_stru_ind（i1，v1，i2，v2）#数组travese order函数
{#这将强制从输入文件开头的节点开始
如果（左（i1）=左（a）和左（i2）！=左（a））#或a中最左边的值
返回-1
else if（左（i2）=左（a）和左（i1）！=左（a））
返回1
否则如果（i1D，D->F，F->C

如果你取消短信