在awk中，当字符串来自一个单独的文件时，gsub字符串的最有效方法_Awk

在awk中，当字符串来自一个单独的文件时，gsub字符串的最有效方法

awk

在awk中，当字符串来自一个单独的文件时，gsub字符串的最有效方法,awk,Awk,我有一个名为cities的选项卡式文件，如下所示： Washington Washington N 3322 +Geo+Cap+US Munich München N 3842 +Geo+DE Paris Paris N 4948 +Geo+Cap+FR US DE IT #!/usr/bin/env bash countrylist=$(<countries.txt) awk -v countrylist="$countrylist" -f countries.awk ci

我有一个名为

cities

的选项卡式文件，如下所示：

Washington Washington N 3322 +Geo+Cap+US
Munich  München  N 3842  +Geo+DE
Paris Paris N 4948  +Geo+Cap+FR

US
DE
IT

#!/usr/bin/env bash

countrylist=$(<countries.txt)
awk -v countrylist="$countrylist" -f countries.awk cities

我有一个名为

countries.txt

的文本文件，如下所示：

Washington Washington N 3322 +Geo+Cap+US
Munich  München  N 3842  +Geo+DE
Paris Paris N 4948  +Geo+Cap+FR

US
DE
IT

#!/usr/bin/env bash

countrylist=$(<countries.txt)
awk -v countrylist="$countrylist" -f countries.awk cities

我将此文件读入一个Bash变量，并将其发送到一个awk程序，如下所示：

Washington Washington N 3322 +Geo+Cap+US
Munich  München  N 3842  +Geo+DE
Paris Paris N 4948  +Geo+Cap+FR

US
DE
IT

#!/usr/bin/env bash

countrylist=$(<countries.txt)
awk -v countrylist="$countrylist" -f countries.awk cities

我被困在最后一位，因为我不知道如何检查

$5

是否有来自数组

国家的值，然后才将其删除
非常感谢
[编辑]
输出应以制表符分隔：
Washington  Washington  N   3322    +Geo+Cap
Munich  München N   3842    +Geo
Paris   Paris   N   4948    +Geo+Cap+FR

awk '
FNR==NR{
  a[$0]=$0
  next
}
{
  for(i in a){
    if(index($5,a[i])){
      gsub(a[i],"",$5)
    }
  }
}
1
'  countries.txt  cities

这是awk的做法：
$ awk '
BEGIN {
    FS=OFS="\t"                # delimiters
}
NR==FNR {                      # process countries file
    countries[$0]              # hash the countries to an array
    next                       # skip to next citi while there are cities left
}
{
    n=split($5,city,"+")       # split the 5th colby +
    if(city[n] in countries)   # search the last part in countries
        sub(city[n] "$","",$5) # if found, replace in the 5th
}1' countries cities           # output and mind the order of files

输出（数据中包含实际选项卡）：
这是awk的做法：
$ awk '
BEGIN {
    FS=OFS="\t"                # delimiters
}
NR==FNR {                      # process countries file
    countries[$0]              # hash the countries to an array
    next                       # skip to next citi while there are cities left
}
{
    n=split($5,city,"+")       # split the 5th colby +
    if(city[n] in countries)   # search the last part in countries
        sub(city[n] "$","",$5) # if found, replace in the 5th
}1' countries cities           # output and mind the order of files

输出（数据中包含实际选项卡）：
如果我没弄错你的要求，你能试一下吗
awk 'FNR==NR{a[$0]=$0;next} {for(i in a){if(index($5,a[i])){gsub(a[i],"",$5)}}} 1'  countries.txt  cities

非一行代码形式如下所示（如果输入文件以制表符分隔，则可以将FS
和OFS
设置为\t
）：
输出如下
Washington Washington N 3322 +Geo+Cap+
Munich München N 3842 +Geo+
Paris Paris N 4948  +Geo+Cap+FR

如果我没弄错你的要求，你能试一下吗
awk 'FNR==NR{a[$0]=$0;next} {for(i in a){if(index($5,a[i])){gsub(a[i],"",$5)}}} 1'  countries.txt  cities

非一行代码形式如下所示（如果输入文件以制表符分隔，则可以将FS
和OFS
设置为\t
）：
输出如下
Washington Washington N 3322 +Geo+Cap+
Munich München N 3842 +Geo+
Paris Paris N 4948  +Geo+Cap+FR

请您在代码标签中发布预期输出，然后通知我们。请在代码标签中发布预期输出，然后通知我们。谢谢您的快速回复。oneliner的工作原理是：为了去掉加号，gsub（a[i]，“”，$5）应该是gsub（“+”a[i]，“”，，$5），但这是一个小问题。问题是我不知道在哪里粘贴“FS=“\t”；OFS=“\t”因为我的输入文件和输出文件都需要用制表符分隔。我需要从Bash运行这个，当我输入'FS=“\t”；OFS=“\t”在awk文件的BEGIN部分，结果不是correct@Tench是的，我们应该把FS
和OFS
放在BEGIN
部分，如果\t
不起作用，那么你的文件中可能没有确切的选项卡？你是对的-我的测试文件没有真正的选项卡，但当我在实际文件上尝试它时，它起作用了。谢谢！谢谢你的快速回复。oneliner的工作原理是：为了去掉加号，gsub（a[i]，“”，$5）应该是gsub（“+”a[i]，“”，，$5），但这是一个小问题。问题是我不知道在哪里粘贴“FS=“\t”；OFS=“\t”因为我的输入文件和输出文件都需要用制表符分隔。我需要从Bash运行这个，当我输入'FS=“\t”；OFS=“\t”在awk文件的BEGIN部分，结果不是correct@Tench是的，我们应该把FS
和OFS
放在BEGIN
部分，如果\t
不起作用，那么你的文件中可能没有确切的选项卡？你是对的-我的测试文件没有真正的选项卡，但当我在实际文件上尝试它时，它起作用了。谢谢！