Javascript 匹配大量不同的句子（使用regexp模式解析）_Javascript_Regex_Nlp_Pattern Matching_Classification

Javascript 匹配大量不同的句子（使用regexp模式解析）

javascript regex nlp

Javascript 匹配大量不同的句子（使用regexp模式解析）,javascript,regex,nlp,pattern-matching,classification,Javascript,Regex,Nlp,Pattern Matching,Classification,我想使用regexps构建一个文本句子分类器（用于chatbot自然语言处理）我有大量（例如，>100）不同类型的文本句子来匹配regexps模式当一个句子匹配一个regexp（比如，一个意图）时，激活一个特定的动作（一个函数处理程序）我预设了特定的正则表达式以匹配任何不同的句子集，例如： // I have a long list of regexps (also many regexp for a many intents) const regexps = [

我想使用regexps构建一个文本句子分类器（用于chatbot自然语言处理）

我有大量（例如，>100）不同类型的文本句子来匹配regexps模式

当一个句子匹配一个regexp（比如，一个意图）时，激活一个特定的动作（一个函数处理程序）

我预设了特定的正则表达式以匹配任何不同的句子集，例如：

     // I have a long list of regexps (also many regexp for a many intents)

    const regexps = [ 
      /from (?<fromCity>.+)/,  // ---> actionOne()
      /to (?<toCity>.+)/,      // ---> actionTwo()
      /.../,                   // ---> anotherAction()
      /.../                   // ---> yetAnotherAction()
    ]

   // I have a long list of actions (function handlers)

   const actions = [
     actionOne(),
     actionTwo(),
     ...,
     ...
   ]

//我有一个很长的regexp列表（也有许多regexp用于多种目的）
常量regexps=[
/from（？.+）/，//-->actionOne（）
/到（？.+）/，//-->操作二（）
/…另一个动作（）
/…/->yetAnotherAction（）
]
//我有一长串操作（函数处理程序）
常量动作=[
actionOne（），
动作二（），
...,
...
]

如何构建最快的（多regexp）分类器（Javascript）？

我目前的快速解决方案是按顺序检查每个regexp：

    // at run time        
    ...
    sentence = 'from Genova'
    ...

    if (sentence.match(/from (?<fromCity>.+)/)
      actionOne()

    else if(sentence.match(/to (?<toCity>.+)/)
      actionTwo()

    else if ...
    else if ...
    else 
      fallback()

//在运行时
...
句子=‘来自热那亚’
...
if（句子匹配（/from（？）/）
行动一（）
else if（句子匹配（/到（？）/）
行动二
否则如果。。。
否则如果。。。
其他的
回退

上述if-then序列方法的可伸缩性不强，最重要的是性能较慢（即使使用最频繁的regexp-sort也有帮助）

提高绩效的另一种方法可以是：要创建一个由命名组（每个matcher regexp对应一个）交替组成的单个（大）regexp

如最小示例所示：

   const regexp = /(?<one>from (?<toCity>.+))|(?<two>to (?<toCity>.+))/

const regexp=/（？from（？.+）|（？to（？.+））/

因此，我创建regexp分类器的方法很简单（请将下面的代码作为javascript伪代码）：

//在构建时
//我收集所有可能的regexp，每个都作为一个命名组
常量意图=[
“（？从（？）”，
“（？至（？）”，
'...',
'...'
]
常量分类器=新正则表达式（intents.join（'|'））
//函数处理程序的集合，每个regexp一个
常量动作={
“一”：行动一，
“两个”：“行动二”，
...,
...
}
//在运行时
常量匹配=句子匹配（分类器）
//如果匹配，则调用相应的函数处理程序
//match.groups包含匹配的命名组
const action=操作[match.groups]
如果（行动）
行动（）
其他的
fallback（）//不匹配

这有意义吗？

有没有更好的方法的建议？

这很可能取决于很多事情，比如每个单独的RegExp（例如，有多少个捕获组）、列表的实际大小和输入的长度

但是，当测试大量的RegExp（10000个简单的RegExp）时，大型组合RegExp的任何变化都比逐个执行单个RegExp慢得多

考虑到这些信息，以及它总体上使代码更简单的事实，我建议不要使用那种大的RegExp方法

为了使事情更易于维护，我建议将每个触发器及其操作存储在同一个位置，例如一个对象数组。这还可以让您在以后需要时向这些对象添加更多内容（例如命名意图）：

const意图=[
{regexp:/from（？.+）/，操作：fromCity}，
{regexp:/to（？.+）/，action:toCity}，
{regexp:/…/，action:anotherAction}，
];
//我们一有结果就用find停止
让结果=意图。查找（意图=>{
让match=句子.match（intent.regexp）；
如果（匹配）{
//如果未在意图对象中指定操作，则可以包括默认操作
//在此处决定发送给操作函数的内容
（match.action | | defaultAction）（匹配、判决、意图）；
}
复赛；
});
如果（！结果）{
后退（）；
}

一个改进是创建一个

函数映射器

，并根据匹配的组名调用函数，而不是编写大量if-elseright。我更新了代码我投票关闭这个问题，因为它属于我不同意。我的立场是我提出了一个解决方案，但问题是关于另一个建议（和/或验证我的提案草案）回答得很好！让我仔细阅读你的JSPerf代码；它证明了big regexp是最慢的解决方案！感谢你的工作。我完全同意对象数组方法。如果regexp按概率频率排序，则顺序评估也很明智（意图[0].regexp->最可能的值。）：）

    // at build time

    // I collect all possible regexps, each one as a named group
    const intents = [
      '(?<one>from (?<fromCity>.+))',
      '(?<two>to (?<toCity>.+))',
      '...',
      '...'
    ]

    const classifier = new RegExp(intents.join('|'))

    // collection of functions handlers, one for each regexp
    const Actions = {
     'one': 'actionOne',
     'two': 'actionTwo',
     ...,
     ...
    }

    // at run time

    const match = sentence.match(classifier)

    // if match, call corresponding function handler
    // match.groups contains the matching named group
    const action = Actions[match.groups]

    if ( action )
      action()
    else
      fallback() // no match

const intents = [
    { regexp: /from (?<fromCity>.+)/, action: fromCity },
    { regexp: /to (?<toCity>.+)/, action: toCity },
    { regexp: /.../, action: anotherAction },
];

// We use find to stop as soon as we've got a result
let result = intents.find(intent => {
    let match = sentence.match(intent.regexp);
    if (match) {
        // You can include a default action in case the action is not specified in the intent object
        // Decide what you send to your action function here
        (match.action || defaultAction)(match, sentence, intent);
    }
    return match;
});
if (!result) {
    fallback();
}