Regex MongoDB为搜索引擎编写查询_Regex_Mongodb

Regex MongoDB为搜索引擎编写查询

regex mongodb

Regex MongoDB为搜索引擎编写查询,regex,mongodb,Regex,Mongodb,我试图在MongoDB中编写一个搜索脚本，但不知道怎么做…我不想做的事情如下让我有一个字符串数组XD={“the”，“new”，“world”} 现在我想在MongoDB文档中搜索字符串数组XD（使用regex）并获得结果文档。例如 { _id: 1, _content: "there was a boy" } { _id: 2, _content: "there was a boy in a new world" } { _id: 3, _content: "a boy" } { _id:

我试图在MongoDB中编写一个搜索脚本，但不知道怎么做…我不想做的事情如下

让我有一个字符串数组

XD={“the”，“new”，“world”}

现在我想在MongoDB文档中搜索字符串数组

XD

（使用regex）并获得结果文档。例如

{ _id: 1, _content: "there was a boy" }
{ _id: 2, _content: "there was a boy in a new world" }
{ _id: 3, _content: "a boy" }
{ _id: 4, _content: "there was a boy in world" }

现在我想根据

\u content

包含字符串数组XD中的字符串得到结果

{ _id: 2, _content: "there was a boy in a new world", _times: 3 }
{ _id: 4, _content: "there was a boy in world", times: 2 }
{ _id: 1, _content: "there was a boy", times: 1 }

作为第一个文档

（\u id:2）

包含了所有三个

{“the”在那里，“new”作为new，“world”作为world}

，因此它得到了

第二个文档

（\u id:4）

只有两个

{“world”作为world}

，因此它得到了

以下是您可以做的

创建与
\u内容匹配的正则表达式

XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");

db.test.mapReduce(
    function(regex) {
       emit(this._id, findMatchCount(this._content, regex));
    },
    function(key,values) {
        return values;
    },
    { "out": { "inline": 0 } }
);

在服务器上存储一个JS函数，该函数将
\u内容
与
XD
匹配，并返回匹配的计数

db.system.js.save(
   {
     _id: "findMatchCount",
     value : function(str, regexStr) {
        XD = ["the","new","world"];
        var matches = str.match(regexStr);
        return (matches !== null) ? matches.length : 0;
     }
   }
)

将该功能与
mapReduce一起使用

XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");

db.test.mapReduce(
    function(regex) {
       emit(this._id, findMatchCount(this._content, regex));
    },
    function(key,values) {
        return values;
    },
    { "out": { "inline": 0 } }
);

这将产生如下输出：

{
    "results" : [
        {
            "_id" : 1,
            "value" : 1
        },
        {
            "_id" : 2,
            "value" : 1
        },
        {
            "_id" : 3,
            "value" : 1
        },
        {
            "_id" : 4,
            "value" : 1
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 4,
        "emit" : 4,
        "reduce" : 0,
        "output" : 4
    },
    "ok" : 1
}

我不确定这个解决方案有多有效，但它确实有效

希望这有帮助。

您看过吗？通常，像“the”这样的词会被忽略（可以说应该被忽略），但“new”和“world”会匹配并排名，就像你已经期望的那样。这只是一个例子……单词可以是任何东西……这里我尝试使用正则表达式（比如，如果我想搜索“exam”，而不是包含“example”或“examed”的文档）应该返回…这里我说的是聚合函数，它可以返回这些类型的文档，只是建议可能更适合您的需要。谢谢…但这里我还想知道哪个文档包含字符串数组中的字符串数，以便我可以对它们进行排序。