Macos 在匹配的情况下，用其他文件中的相关缩写替换术语_Macos_Unix_Awk_Terminal

Macos 在匹配的情况下，用其他文件中的相关缩写替换术语

macos unix awk terminal

Macos 在匹配的情况下，用其他文件中的相关缩写替换术语,macos,unix,awk,terminal,Macos,Unix,Awk,Terminal,我有两个文件： 1.模式文件=Pattern.txt 2.包含不同术语的文件=terms.txt Berlin The Berlinale ended yesterday Checkpoint Charly is still in Friedrichstrasse There will be a fiesta in the Hall of Barcelona Paris is a very nice city Brln The Berlinale ended Ystrdy

我有两个文件：
1.模式文件=Pattern.txt
2.包含不同术语的文件=terms.txt

Berlin  
The Berlinale ended yesterday  
Checkpoint Charly is still in Friedrichstrasse  
There will be a fiesta in the Hall of Barcelona  
Paris is a very nice city

Brln  
The Berlinale ended Ystrdy  
ChckpntChrl is still in Fridrchstr  
There will be a fiesta in the HllOfBarcln  
Prs is a very nice city

pattern.txt包含两列，由

分隔

在第一列中，我有几个术语，在第二列中有缩写，

关联到同一行的第一列
terms.txt包含单个单词和由单个单词定义的术语，但也包含

用词的组合
pattern.txt
Berlin;Brln
Barcelona;Barcln
Checkpoint Charly;ChckpntChrl
Friedrichstrasse;Fridrchstr
Hall of Barcelona;HllOfBarcln
Paris;Prs
Yesterday;Ystrdy

terms.txt
Berlin  
The Berlinale ended yesterday  
Checkpoint Charly is still in Friedrichstrasse  
There will be a fiesta in the Hall of Barcelona  
Paris is a very nice city 

Brln  
The Berlinale ended Ystrdy  
ChckpntChrl is still in Fridrchstr  
There will be a fiesta in the HllOfBarcln  
Prs is a very nice city  

目标是用标准化缩写替换术语，并找出哪些术语

没有缩写。

因此，我希望有两个文件。

第一个文件是一个新的术语文件，术语替换为可以替换的缩写。

第二个文件包含一个列表，其中包含所有没有缩写的术语。

输出不区分大小写，我不区分“The
”和“The
”
new_terms.txt
Berlin  
The Berlinale ended yesterday  
Checkpoint Charly is still in Friedrichstrasse  
There will be a fiesta in the Hall of Barcelona  
Paris is a very nice city 

Brln  
The Berlinale ended Ystrdy  
ChckpntChrl is still in Fridrchstr  
There will be a fiesta in the HllOfBarcln  
Prs is a very nice city  

不带缩写的术语。txt
a  
be  
Berlinale  
city  
ended  
fiesta  
in  
is  
nice  
of  
still  
The  
There  
very  
will  

我将感谢您的帮助，并提前感谢您的时间和提示
 这正是您最需要的：
BEGIN { FS=";"; }
FNR==NR { dict[tolower($1)] = $2; next }
{
    line = "";
    count = split($0, words, / +/);
    for (i = 1; i <= count; i++) {
        key = tolower(words[i]);
        if (key in dict) {
            words[i] = dict[key];
        } else {
            result[key] = words[i];
        }
        line = line " " words[i];
    }
    print substr(line, 2);
}
END {
    count = asorti(result, sorted);
    for (i = 1; i <= count; i++) {
        print result[sorted[i]];
    }
}

开始{FS=“；”；}
FNR==NR{dict[tolower（$1）]=$2；next}
{
第“”行；
计数=拆分（$0，字，/+/）；
对于（i=1；i这是您最需要的：
BEGIN { FS=";"; }
FNR==NR { dict[tolower($1)] = $2; next }
{
    line = "";
    count = split($0, words, / +/);
    for (i = 1; i <= count; i++) {
        key = tolower(words[i]);
        if (key in dict) {
            words[i] = dict[key];
        } else {
            result[key] = words[i];
        }
        line = line " " words[i];
    }
    print substr(line, 2);
}
END {
    count = asorti(result, sorted);
    for (i = 1; i <= count; i++) {
        print result[sorted[i]];
    }
}

开始{FS=“；”；}
FNR==NR{dict[tolower（$1）]=$2；next}
{
第“”行；
计数=拆分（$0，字，/+/）；
对于（i=1；i好的，因此我有一点裂缝，但将解释问题：
如果pattern.txt中的多个更改可能与一行有关，则第一个更改将进行更改，第二个不会进行更改（例如巴塞罗那；Barcln和Barcelona Hall；HllOfBarcln，显然，如果在使用较长的版本时Barcln已经进行了更改，则它将不再存在，因此不会进行更改）
与上述内容类似，“霍尔”一词没有缩写，因此，如果我们假设上述内容属实，并且仅进行了第一次更改，则您的新更改文件将包括霍尔，因为霍尔没有缩写
！/usr/bin/awk-f

BEGIN{
    FS = ";"

    IGNORECASE = 1
}

FNR == NR{
    abbr[tolower($1)] = $2
    next
}

FNR == 1{ FS = " " }

{
    for(i = 1; i <= NF; i++){
        item = tolower($i)
        if(!(item in abbr) && !(item in twa)){
            twa[item]
            print item > "terms_without_abbreviations.txt"
        }
    }

    for(i in abbr)
        gsub("\\<"i"\\>", abbr[i])

    print > "new_terms.txt"

}

开始{
FS=“；”
IGNORECASE=1
}
FNR==NR{
缩写[tolower（$1）]=$2
下一个
}
FNR==1{FS=”“}
{
对于（i=1；i“不带缩写的术语.txt”
}
}
for（缩写为i）
gsub（“\\”，缩写[i]）
打印>“new_terms.txt”
}

可能还有其他的陷阱需要寻找，但这是一个模糊的方向。不确定你将如何绕过我上面的观点？？？
好的，所以我有点裂缝，但将解释问题：
如果pattern.txt中的多个更改可能与一行有关，则第一个更改将进行更改，第二个不会进行更改（例如巴塞罗那；Barcln和Barcelona Hall；HllOfBarcln，显然，如果在使用较长的版本时Barcln已经进行了更改，则它将不再存在，因此不会进行更改）
与上述内容类似，“霍尔”一词没有缩写，因此，如果我们假设上述内容属实，并且仅进行了第一次更改，则您的新更改文件将包括霍尔，因为霍尔没有缩写
！/usr/bin/awk-f

BEGIN{
    FS = ";"

    IGNORECASE = 1
}

FNR == NR{
    abbr[tolower($1)] = $2
    next
}

FNR == 1{ FS = " " }

{
    for(i = 1; i <= NF; i++){
        item = tolower($i)
        if(!(item in abbr) && !(item in twa)){
            twa[item]
            print item > "terms_without_abbreviations.txt"
        }
    }

    for(i in abbr)
        gsub("\\<"i"\\>", abbr[i])

    print > "new_terms.txt"

}

开始{
FS=“；”
IGNORECASE=1
}
FNR==NR{
缩写[tolower（$1）]=$2
下一个
}
FNR==1{FS=”“}
{
对于（i=1；i“不带缩写的术语.txt”
}
}
for（缩写为i）
gsub（“\\”，缩写[i]）
打印>“new_terms.txt”
}

可能还有其他的陷阱需要寻找，但这是一个模糊的方向。不确定你将如何绕过我上面的观点？？？
你自己尝试了什么？在这里发布你的尝试。我一年使用几次终端JU，但我不是很有经验。尝试过，但不符合我的情况。你必须记住，所以社区会帮助你解决问题您发布了您的尝试，但失败了（这不是问题）所以我们可以帮助你。人们不能只为你写免费代码。这不是不尊重的意思！因为我非常尊重这个论坛的知识，所以我在这里寻求帮助。我做了几次尝试，但都没能解决问题。举一次尝试，我可以替换一些术语，问题是我只使用了缩写de>awk-F”；“'NR==FNR{a[$1]=2；next}{$1=a[$1]；}1'pattern.txt terms.txt>new_terms.txt
好吧，两个文件中都没有空格会有伤害吗，即Checkpoint Charly
要成为Checkpoint Charly
，你对文件有控制权吗？你自己尝试了什么？在这里发布你的尝试。我每年都会使用几次终端JU，我很高兴我不是很有经验。尝试过但不符合我的情况。你必须记住，所以如果你发布了尝试但失败了，社区会帮助你（这不是问题）所以我们可以帮助你。人们不能只为你写免费代码。这不是不尊重的意思！因为我非常尊重这个论坛的知识，所以我在这里寻求帮助。我做了几次尝试，但都没能解决问题。举一次尝试，我可以替换一些术语，问题是我只使用了缩写de>awk-F”；“'NR==FNR{a[$1]=2；next}{$1=a[$1]；}1'pattern.txt terms.txt>new_terms.txt
好吧，两个文件中都没有空格会有伤害吗，例如Checkpoint Charly
要成为Checkpoint Charly
，你控制了文件吗？还是不可能？非常感谢！两种解决方案都对我有效！你的和Michaels。我正在测试很多示例和你的答案方法我马上就明白了，但一开始不起作用。在决定使用gawk
后，效果非常好！你的假设是正确的，我也不得不面对这些问题。术语总是在第一列。到目前为止，这个术语出现在terms.txt中，应该被替换。在第一次尝试使用awk
时，我没有得到e预期的结果，但对我来说是合乎逻辑的，所以我尝试使用thengawk
，然后我得到了预期的结果