Awk 如何阅读字典并替换文件中的单词?

Awk 如何阅读字典并替换文件中的单词?,awk,Awk,我们有一个如下的源文件(“source-a”)(如果您看到蓝色文本,它来自stackoverflow,而不是文本文件): “source-A”中的每个句子都有自己的一行,并以换行符(\n)结尾 我们有一个字典/转换文件(“converse-B”),如下所示: aluminium<tab>aluminum analyse<tab>analyze white spirit<tab>mineral spirits stag night<tab>bache

我们有一个如下的源文件(“source-a”)(如果您看到蓝色文本,它来自stackoverflow,而不是文本文件):

“source-A”中的每个句子都有自己的一行,并以换行符(\n)结尾

我们有一个字典/转换文件(“converse-B”),如下所示:

aluminium<tab>aluminum
analyse<tab>analyze
white spirit<tab>mineral spirits
stag night<tab>bachelor party
savoury<tab>savory
potato crisp<tab>potato chip
mashed potato<tab>mashed potatoes
The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
棘手的部分是土豆这个词

如果“简单”
awk
解决方案无法处理单数项(土豆)和复数项(土豆),我们将使用手动替换方法。
awk
解决方案可以跳过该用例

换句话说,
awk
解决方案可以规定它只适用于明确的单词或由空格分隔的明确单词组成的术语


awk
解决方案将使我们达到90%的完成率;剩下的10%我们将手动完成。

sed
可能更适合,因为它只是短语/单词的替换。注意,如果相同的单词出现在多个短语中,先到先得;因此,相应地改变你的字典顺序

$ sed -f <(sed -E 's_(.+)\t(.+)_s/\1/\2/g_' dict) content

The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
...
more sentences

$sed-f感谢您对sed/sed的解释。您是否可以链接到使用单词大小写和单词边界的说明?
$ sed -f <(sed -E 's_(.+)\t(.+)_s/\1/\2/g_' dict) content

The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
...
more sentences