如何使用Azure认知搜索搜索字符串的一部分
我是Azure认知搜索的新手,并且已经成功地配置了我的索引以实现自动完成(使用部分搜索,多亏了) 但现在我有了另一个用例,其中我有许多文件存储在带有元数据的Azure Blob容器中: (每个文件的)元数据字段中有一个称为PartNumber,其值是一个用逗号分隔的产品SKU字符串(如“12345678900110293809876”)。 我建立索引是为了将此信息存储为Edm.String,如下所示:如何使用Azure认知搜索搜索字符串的一部分,azure,search,azure-storage-blobs,azure-cognitive-search,azure-blob-storage,Azure,Search,Azure Storage Blobs,Azure Cognitive Search,Azure Blob Storage,我是Azure认知搜索的新手,并且已经成功地配置了我的索引以实现自动完成(使用部分搜索,多亏了) 但现在我有了另一个用例,其中我有许多文件存储在带有元数据的Azure Blob容器中: (每个文件的)元数据字段中有一个称为PartNumber,其值是一个用逗号分隔的产品SKU字符串(如“12345678900110293809876”)。 我建立索引是为了将此信息存储为Edm.String,如下所示: { "name": "my-index", &
{
"name": "my-index",
"fields": [
{
"name": "partnumbers",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"facetable": true,
"filterable": true,
"retrievable": false,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "key",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialPartnumbers",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
},
],
"suggesters": [
{
"name": "my-index_suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"partnumbers"
]
}
],
"scoringProfiles": [
{
"name": "exactFirst",
"functions": [],
"functionAggregation": null,
"text": {
"weights": {
"partnumbers": 2,
"partialPartnumbers": 1,
}
}
}
],
"defaultScoringProfile": "exactFirst",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "standardCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding"
],
"charFilters": []
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "prefixCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding",
"edgeNGramCmTokenFilter"
],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "edgeNGramCmTokenFilter",
"minGram": 2,
"maxGram": 20,
"side": "front"
}
],
"tokenizers": [],
"@odata.etag": "\"0x8D8184F367A74XX\""
}
现在,我正在努力寻找一种方法(通过特定语法?分析器?标记器?)来找到所有具有PartNumber元数据字段的文件,该字段包含一个SKU(以便我可以检索与一个产品相关的所有文档):我希望传递SKU“102938”到Azure Search,它将返回在其PartNumber元数据字段(可能包括其他SKU)中包含此SKU的所有文件
但是我很难在谷歌上找到例子,而且现在的文档似乎有点超出我的能力范围(我不确定什么是分析器、标记器等,以及它们是如何工作的!这是我第一次深入“搜索”世界…)
所以我真的很感激社区能在这方面帮助我,我很想读一些文章,让初学者了解每一件事,或者教程,或者任何能帮助我向前迈进的东西
提前感谢。好的,我刚刚尝试了一些有效的方法:我在我的零件号字段中定义了,当我使用进行测试时,它确实将我的SKU分割为几个令牌。 在那之后,我可以搜索一个SKU,它给了我所有我想要的文件! 以下是我的索引JSON定义:
{
"name": "my-index",
"fields": [
{
"name": "partnumbers",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"facetable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "key",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialPartnumbers",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
},
{
"name": "partialName",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
}
],
"suggesters": [
{
"name": "conformity-certificates-index_suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"name"
]
}
],
"scoringProfiles": [
{
"name": "exactFirst",
"functions": [],
"functionAggregation": null,
"text": {
"weights": {
"partnumbers": 4,
"partialPartnumbers": 3,
"name": 2,
"partialName": 1
}
}
}
],
"defaultScoringProfile": "exactFirst",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "standardCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding"
],
"charFilters": []
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "prefixCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding",
"edgeNGramCmTokenFilter"
],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "edgeNGramCmTokenFilter",
"minGram": 2,
"maxGram": 20,
"side": "front"
}
],
"tokenizers": [],
"@odata.etag": "\"0x8D818EC80CXXXX\""
}
好的,我刚刚尝试了一些有效的方法:我在我的PartNumber字段中定义了,当我使用进行测试时,它确实将我的SKU分割为几个令牌。 在那之后,我可以搜索一个SKU,它给了我所有我想要的文件! 以下是我的索引JSON定义:
{
"name": "my-index",
"fields": [
{
"name": "partnumbers",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"facetable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "key",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialPartnumbers",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
},
{
"name": "partialName",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
}
],
"suggesters": [
{
"name": "conformity-certificates-index_suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"name"
]
}
],
"scoringProfiles": [
{
"name": "exactFirst",
"functions": [],
"functionAggregation": null,
"text": {
"weights": {
"partnumbers": 4,
"partialPartnumbers": 3,
"name": 2,
"partialName": 1
}
}
}
],
"defaultScoringProfile": "exactFirst",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "standardCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding"
],
"charFilters": []
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "prefixCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding",
"edgeNGramCmTokenFilter"
],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "edgeNGramCmTokenFilter",
"minGram": 2,
"maxGram": 20,
"side": "front"
}
],
"tokenizers": [],
"@odata.etag": "\"0x8D818EC80CXXXX\""
}
这应该可以通过正则表达式和通配符搜索实现 这可以应用于在索引上配置了查询分析器的任何可搜索字段 “…通过设置queryType=Full获得的完整Lucene查询语言通过添加对更多运算符和查询类型(如通配符、模糊、正则表达式和字段范围查询)的支持,扩展了默认的简单查询语言。例如,以简单查询语法发送的正则表达式将被解释为查询字符串而不是表达式。本文中的示例请求使用完整的Lucene查询语言。” 字段名:searchExpression e、 g.searchFields=partnumber&$select=partnumber&search=partnumber:102938*
这应该可以通过正则表达式和通配符搜索实现 这可以应用于在索引上配置了查询分析器的任何可搜索字段 “…通过设置queryType=Full获得的完整Lucene查询语言通过添加对更多运算符和查询类型(如通配符、模糊、正则表达式和字段范围查询)的支持,扩展了默认的简单查询语言。例如,以简单查询语法发送的正则表达式将被解释为查询字符串而不是表达式。本文中的示例请求使用完整的Lucene查询语言。” 字段名:searchExpression e、 g.searchFields=partnumber&$select=partnumber&search=partnumber:102938*
您可以使用常规过滤器搜索零件号 $filter=search.in(零件号,'102938',' 您将在此处的文档中找到更多示例: 不要在此用例中使用通配符或正则表达式。您的示例具有长度可变的零件号。因此,对102938*的通配符搜索也会无意中匹配1029381、10293810、102938123等
您的数据已经明确准确地列出了一组零件号。您可以根据该列表进行查询。您可以使用常规过滤器搜索零件号 $filter=search.in(零件号,'102938',' 您将在此处的文档中找到更多示例: 不要在此用例中使用通配符或正则表达式。您的示例具有长度可变的零件号。因此,对102938*的通配符搜索也会无意中匹配1029381、10293810、102938123等 您的数据已经明确准确地列出了一组零件号。您可以根据该列表进行查询