Nlp 查找表在Rasa NLU的训练数据中不起作用_Nlp_Markdown_Rasa Nlu

Nlp 查找表在Rasa NLU的训练数据中不起作用

nlp markdown

Nlp 查找表在Rasa NLU的训练数据中不起作用,nlp,markdown,rasa-nlu,Nlp,Markdown,Rasa Nlu,我有一个特定意图的例子，也展示了实体，我希望模型能够识别其他单词，它们可能是该特定意图的实体，但它无法识别它 ## intent: frequency * what is the frequency of [region](field)? * what's the frequency of[region](field)? * frequency of [region](field)? * [region](field)s frequency? * [region](field) frequenc

我有一个特定意图的例子，也展示了实体，我希望模型能够识别其他单词，它们可能是该特定意图的实体，但它无法识别它

## intent: frequency
* what is the frequency of [region](field)?
* what's the frequency of[region](field)?
* frequency of [region](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?

## lookup: field
* price
* phone type
* region

所以当我输入文本“区域的频率是多少？”时，我得到了输出

{'intent': {'name': 'frequency', 'confidence': 0.9517087936401367},
'entities': [{'start': 17, 'end': 23, 'value': 'region', 
'entity': 'field', 'confidence': 0.9427971487440825, 
'extractor': 'CRFEntityExtractor'}], 'text': 'What is the frequency of region?'}

{'intent': {'name': 'frequency', 'confidence': 0.9276150465011597},
'entities': [], 'text': 'What is the frequency of price?'}

但是当我输入文本“价格的频率是多少？”时，我得到了输出

{'intent': {'name': 'frequency', 'confidence': 0.9517087936401367},
'entities': [{'start': 17, 'end': 23, 'value': 'region', 
'entity': 'field', 'confidence': 0.9427971487440825, 
'extractor': 'CRFEntityExtractor'}], 'text': 'What is the frequency of region?'}

{'intent': {'name': 'frequency', 'confidence': 0.9276150465011597},
'entities': [], 'text': 'What is the frequency of price?'}

根据RasaNLU文档，为了使查找工作正常，您需要包含查找表中的一些示例

此外，您需要了解“phone type”和“region”是不同的模式，因为“phone type”有两个单词，“region”是一个单词。记住这一点，我将您的数据集扩展为

## intent: frequency
* what is the frequency of [region](field)?
* what is the frequency of [city](field)?
* what is the frequency of [work](field)?
* what's the frequency of [phone type](field)?
* what is the frequency of [phone type](field)?
* frequency of [region](field)?
* frequency of [phone type](field)?
* [region](field)s frequency?
* [region](field) frequency?
* frequency [region](field)?

现在，当我尝试您提到的所有示例时，它们都有效，即使“价格”不包括在数据集中，但模式都包括在内

Enter a message: What is the frequency of price?
{
  "intent": {
    "name": "frequency",
    "confidence": 0.966820478439331
  },
  "entities": [
    {
      "start": 25,
      "end": 30,
      "value": "price",
      "entity": "field",
      "confidence": 0.7227365687405007,
      "extractor": "CRFEntityExtractor"
    }
  ]
}

我建议使用它来生成简单的数据集，这将使您更容易，并自动生成同义词等

另外，万一您不知道，您也可以使用文件来指向大型查找，例如

## lookup:city
  data/lookups/city_lookup.txt

在config.yml中使用以下管道

管道：

名称：WhitespaceTokenizer
名称：RegexFeatureizer
名称：CRFEntityExtractor
名称：LexicalSyntacticFeaturer
名称：CountVectorsFeaturizer
名称：CountVectorsFeaturizer 分析器：“char_wb” 最小内存：1 最大内存：4
姓名：DIETClassifier 实体识别：错误纪元：100
名称：EntitySynonymMapper
姓名：响应选择器纪元：100

如果我们提供一个用于查找的文件，我们是否仍需要为每个查找添加示例？文件只是列表的一个替代品，因此功能相同，但在进行大型查找时非常有用。我使用了城市和州的查找，添加几个示例就足以让rasa完成它的工作。我已经尝试过，似乎除非至少有一个查找示例，否则它不会检测到它。如果是这种情况，那么查找表将变得无用，因为我们需要为每个不同的实体值写下一个示例。。。在查找表中有超过3k个城市，我只需添加10到20个城市的示例。。。感觉没那么没用…你用的是什么管道tensorflow还是spacy？它可能正在工作，因为您正在使用spacy。我正在使用tensorflow，但它似乎不起作用。