Regex Sed正则表达式识别单个字母而不是单词

Regex Sed正则表达式识别单个字母而不是单词,regex,sed,grep,Regex,Sed,Grep,我创建了一个记录随机森林分类器和逻辑回归训练的文件。其案文如下: Creating logistic regression model... Done. Creating random forest classifier model... building tree 1 of 27 building tree 2 of 27 building tree 3 of 27 building tree 4 of 27 building tree 5 of 27 building tree 6 of 2

我创建了一个记录随机森林分类器和逻辑回归训练的文件。其案文如下:

Creating logistic regression model...
Done.
Creating random forest classifier model...
building tree 1 of 27
building tree 2 of 27
building tree 3 of 27
building tree 4 of 27
building tree 5 of 27
building tree 6 of 27
building tree 7 of 27
building tree 8 of 27
building tree 9 of 27
building tree 10 of 27
building tree 11 of 27
building tree 12 of 27
building tree 13 of 27
building tree 14 of 27
building tree 15 of 27
building tree 16 of 27
building tree 17 of 27
building tree 18 of 27
building tree 19 of 27
building tree 20 of 27
building tree 21 of 27
building tree 22 of 27
building tree 23 of 27
building tree 24 of 27
building tree 25 of 27
building tree 26 of 27
building tree 27 of 27
Train scores:
    Logistic Regression Recall: 0.6892336879192357
    Random Forest Recall: 0.5848905752422251
Test scores:
    Logistic Regression Recall: 0.6746186562629912
    Random Forest Recall: 0.5647724728982124
我只想提取分数线。我尝试了
sed-n'/[Train | Test | Recall]/p'分数
分数
是文件名),但出于某种原因,即使
-n
应该禁止打印除模式匹配行以外的所有打印,它仍然打印文件的全文

当我运行
cat评分| grep“[Train | Test | Recall]”-
时,模式匹配荧光灯检查了那些似乎匹配
[Train | Test | Recall]
的行中的每个字母,而不是实际的单词:例如,
创建逻辑回归模型…
突出显示了
\u creatin\u l\u istic resist\u n\u el…
。即使我添加了边界,问题仍然存在:
cat分数| grep“[\bTrain\b |\bTest\b |\bRecall\b]”-


我对grep的理解是,它应该匹配这些单词的全文;每个单词之间的管道应将每个单词标识为其自己要检查的模式。我需要如何编写这个正则表达式,以及如何在sed中指定我需要的任何参数?

方括号
[]
包含一个可能匹配字符的列表,因此您经常看到像
gr[ae]y
这样的示例来匹配
gray
gray

您可以省略括号
Train | Test | Recall
,或使用括号
(Train | Test | Recall)

对于常规模式下的
grep
,您的命令将变为

cat scores | grep "\(Train\|Test\|Recall\)"
或者在扩展正则表达式模式下,它变为

cat scores | grep -E "(Train|Test|Recall)"
或在
sed
中:

cat scores | sed -E -n "/(Train|Test|Recall)/p"

要仅提取分数行,请使用此扩展grep命令:

$ cat scores | egrep '^(Train|Test|Recall) scores:'
Train scores:
Test scores:
我想你也需要分数线后面的数字:

$ cat scores | egrep '^(Train|Test|Recall) scores:|^ '
Train scores:
    Logistic Regression Recall: 0.6892336879192357
    Random Forest Recall: 0.5848905752422251
Test scores:
    Logistic Regression Recall: 0.6746186562629912
    Random Forest Recall: 0.5647724728982124
说明:egrep附加了一个
| ^
,表示前面的正则表达式或以空格开头的行

为了便于阅读,您现在可以组合分数和数字:

$ cat scores | egrep '^(Train|Test|Recall) scores:|^ ' | perl -0777 -pe 's/(scores:)[^:]*: ([0-9\.]*)[^:]*: ([0-9\.]*)/$1 $2, $3/g'
Train scores: 0.6892336879192357, 0.5848905752422251
Test scores: 0.6746186562629912, 0.5647724728982124
说明:

  • perl命令中的
    -0777
    \n
    行分隔符替换为八进制
    777
    ,以便我们可以扫描换行符
  • -pe
    指示perl是一个单行命令
  • s/../../g
    是一个搜索并替换为
    g
    全局标志

使用
(…)
进行分组,而不是使用
(…)
sed-En'/(Train | Test | Recall)/p'分数
@anubhava当前输出底部的六行分数。