Java 如何在mongoquery中展开两个数组

Java 如何在mongoquery中展开两个数组,java,mongodb,mongodb-query,aggregation-framework,Java,Mongodb,Mongodb Query,Aggregation Framework,我在mongodb有这样一个收藏: { "_id" : ObjectId("5490a00879dc6a138dcefb0f"), "Date" : 20141012, "Type" : "Twitter", "Entities" : [ { "ID" : 2, "Name" : "test1", "Sentiment" : { "Value" : 0.1, "Neutral" : 12

我在mongodb有这样一个收藏:

{
"_id" : ObjectId("5490a00879dc6a138dcefb0f"),
"Date" : 20141012,
"Type" : "Twitter",
"Entities" : [ 
    {
        "ID" : 2,
        "Name" : "test1",
        "Sentiment" : {
            "Value" : 0.1,
            "Neutral" : 12
        }
     }
],
"Topics" : [ 
    {
        "ID" : 1,
        "Name" : "Test2",
        "Sentiment" : {
            "Value" : 0.5,
            "Neutral" : 1
        }
    }
]
}
现在,我需要解开数组主题和实体,然后我想按日期分组,并对情绪的所有值求和,因此我按如下方式进行操作:

    DBObject unwind = new BasicDBObject("$unwind", "$Entities"); 
    unwind.put("$unwind", "$Topics");
    collectionG = db.getCollection("GraphDataCollection");
    DBObject groupFields = new BasicDBObject( "_id", "$Date");
    groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
    DBObject groupBy = new BasicDBObject("$group", groupFields );
    AggregationOutput output = collectionG.aggregate(where,unwind, groupBy);
现在的问题是,对于情绪值之和,只返回0,但如果我删除以下行:

    unwind.put("$unwind", "$Topics");
groupFields.put("value1", new BasicDBObject( "$sum", "$Topics.Sentiment.Value"));
它工作正常,所以我的问题是如何用一个聚合展开两个数组

更新:

我更改了代码如下:

DBObject unwind = new BasicDBObject("$unwind", "$Entities"); // "$unwind" converts object with array into many duplicate objects, each with one from array
    DBObject unwindT = new BasicDBObject("$unwind", "$Topics"); // "$unwind" converts object with array into many duplicate objects, each with one from array
    collectionG = db.getCollection("GraphDataCollection");
    DBObject groupFields = new BasicDBObject( "_id", "$Date");
   groupFields.put("value", new BasicDBObject( "$sum", "$Entities.Sentiment.Value"));
    groupFields.put("value1", new BasicDBObject( "$sum", "$Topics.Sentiment.Value"));
    DBObject groupBy = new BasicDBObject("$group", groupFields );
    List<DBObject> pipeline = Arrays.asList(unwind, unwindT);
    DBObject sort = new BasicDBObject("$sort", new BasicDBObject("_id", 1));
    AggregationOutput output = collectionG.aggregate(where,unwind,unwindT, groupBy,sort);
返回的value1和value的数字不正确,我认为我没有正确展开。有人能帮忙吗?

这是mongo查询(不是java):

爪哇:


这是一个很容易出错的查询,因为大多数查询都在细节中,您应该进行彻底的测试。好的测试用例的一个好来源是不同条件下的不同数据,这里的一个明显错误是,作为一个样本,每个数组只有一个数组项

在现实世界中,这些字段之所以是数组,是因为您希望在其中包含多个条目。因此,简单地处理两个管道阶段不起作用,因为它会将第一个数组中的项目数乘以每个文档第二个数组中的项目数

因此,一个更好的测试数据表示可以考虑如下:

{
    "_id" : ObjectId("5490a00879dc6a138dcefb0f"),
    "Date" : 20141012,
    "Type" : "Twitter",
    "Entities" : [
            {
                    "ID" : 2,
                    "Name" : "test1",
                    "Sentiment" : {
                            "Value" : 0.1,
                            "Neutral" : 12
                    }
            }
    ],
    "Topics" : [
            {
                    "ID" : 1,
                    "Name" : "Test2",
                    "Sentiment" : {
                            "Value" : 0.5,
                            "Neutral" : 1
                    }
            },
            {
                    "ID" : 3,
                    "Name" : "Test3",
                    "Sentiment" : {
                            "Value" : 0.4,
                            "Neutral" : 1
                    }
            }
    ]
}
为了正确地使用文档中的两个数组,您需要按类型区分条目,并且只添加特定的成员。首先是注释的JSON序列化表单,以便于阅读:

[
    // Unwind both arrays, produces duplicates
    { "$unwind": "$Entities" },
    { "$unwind": "$Topics" },

    // Add another field to discern type as an array
    { "$project": {
        "Date": 1,
        "Entities": 1,
        "Topics": 1,
        "select": { "$literal": [ "E", "T" ] }
    }},

    // Unwind that array as well
    { "$unwind": "$select" },


    // Group in documents by individual array ID values and per select condition
    // makes everything unique again
    { "$group": {
        "_id": {
            "_id": "$_id",
            "Date": "$Date",
            "innerId": {
               "$cond": [
                   { "$eq": [ "$select", "E" ] },
                   "$Entities.ID",
                   "$Topics.ID"
               ]
            }
        },
        "value": {
            "$first": {
                "$cond": [
                   { "$eq": [ "$select", "E" ] },
                   "$Entities.Sentiment.Value",
                   "$Topics.Sentiment.Value"
                ]
            }
        }
    }},

    //Now just sum the values per date grouping
    { "$group": {
        "_id": "$_id.Date",
        "value": { "$sum": "$value" }
    }}
])
还有另一种稍微冗长的方法,但我认为内部数组“ID”字段值是唯一的,至少在文档中是唯一的,这应该可以。整个过程实际上是将两个单独的文档属性合并到一个单一字段中,并处理这些属性是数组这一事实

因此,将数组分开,用另一种类型标记每个文档,然后再次复制它们。现在,对于基本上是每个文档和每个数组成员,检查匹配类型并从适当的数组值中选择。此时,每个数组成员都有一个单独的文档和一个单独的“值”字段,其中包含
*.touction.value
中的相应值,具体取决于所选的字段,但总的来说,所有值现在都存在,并且没有重复。您所做的只是对结果的值字段求和

事实上,这里要学习的主要内容是,您首先应该将其记录为单个数组,结构如下:

{
    "_id" : ObjectId("5490a00879dc6a138dcefb0f"),
    "Date" : 20141012,
    "Type" : "Twitter",
    "Data" : [
            {
                    "ID" : 2,
                    "Name" : "test1",
                    "Sentiment" : {
                            "Value" : 0.1,
                            "Neutral" : 12
                    },
                    "Class": "Entity"
            },
            {
                    "ID" : 1,
                    "Name" : "Test2",
                    "Sentiment" : {
                            "Value" : 0.5,
                            "Neutral" : 1
                    },
                    "Class": "Topic"
            },
            {
                    "ID" : 3,
                    "Name" : "Test3",
                    "Sentiment" : {
                            "Value" : 0.4,
                            "Neutral" : 1
                    },
                    "Class": "Topic"
            }
    ]
}
这将是一个简单的问题,在单个数组上处理一次,然后对所有值求和。如果希望单独使用数据“类”,则可以对其进行筛选或使用条件。但对于大多数操作来说,以这种方式简单地构建结构要容易得多

将其转换为Java很简单,但只是为了防止您在转换过程中迷失方向:

    DBObject unwind1 = new BasicDBObject("$unwind", "$Entities");
    DBObject unwind2 = new BasicDBObject("$unwind", "$Topics");

    DBObject project = new BasicDBObject("$project",
        new BasicDBObject( "Date", 1 )
            .append( "Entities", 1)
            .append( "Topics", 1)
            .append( "select", 
                new BasicDBObject( "$literal", new String[]{ "E", "T" })
            )
        );

    DBObject unwind3 = new BasicDBObject("$unwind", "select");

    DBObject group1 = new BasicDBObject("$group",
        new BasicDBObject("_id",
           new BasicDBObject("_id","$_id")
                .append("Date", "$Date")
                .append("innerId",
                    new BasicDBObject("$cond",
                        new Object[]{
                            new BasicDBObject("$eq", new String[]{"$select", "E"}),
                            "$Entities.ID",
                            "$Topics.ID"
                        }
                    )
                )
        )
        .append("value",
            new BasicDBObject("$first",
                new BasicDBObject("$cond",
                    new Object[]{
                        new BasicDBObject("$eq", new String[]{"$select", "E"}),
                        "$Entities.Sentiment.Value",
                        "$Topics.Sentiment.Value"
                    }
                )
            )
        )
    );

    DBObject group2 = new BasicDBObject("$group",
        new BasicDBObject("_id", "$_id.Date")
            .append("value", new BasicDBObject("$sum","$value"))
    );

    AggregationOutput output = coll.aggregate(unwind1,unwind2,project,unwind3,group1,group2);
再来一张。虽然您现在可能已经到了,但该操作符是在MongoDB 2.6及更高版本中引入的。对于早期的服务器版本,有一个未记录的
$const
操作符,它实际上是相同的东西。如果必须针对早期服务器版本的MongoDB运行,只需交换代码即可。

另一种方法

  • 展开
    实体
    数组
  • \u id
    分组
    ,以获取
    实体的
  • 展开
    主题
    数组
  • \u id
    分组
    ,以获取
    主题的总和
  • 项目
    显示
    主题
    实体之和的字段
    价值观
  • 分组
    日期
    获取净额
这样,每个管道中的文档数量最少,并且不涉及太多的自连接

聚合代码:

db.collection.aggregate([
{$unwind:"$Entities"},
{$group:{"_id":"$_id",
         "Date":{$first:"$Date"},
         "Topics":{$first:"$Topics"},
         "EntitiesSum":{$sum:"$Entities.Sentiment.Value"}}},
{$unwind:"$Topics"},
{$group:{"_id":"$_id",
         "Date":{$first:"$Date"},
         "EntitiesSum":{$first:"$EntitiesSum"},
         "TopicsSum":{$sum:"$Topics.Sentiment.Value"}}},
{$project:{"_id":0,"Date":1,"EntitiesSum":1,"TopicsSum":1,
           "indSum":{$add:["$EntitiesSum","$TopicsSum"]}}},
{$group:{"_id":"$Date",
         "EntitiesSentimentSum":{$sum:"$EntitiesSum"},
         "TopicsSentimentSum":{$sum:"$TopicsSum"},
         "netSentimentSum":{$sum:"$indSum"}}}
])
Java等价物:

     DBObject unwindEntities = new BasicDBObject("$unwind","$Entities");

     DBObject groupSameIdEntities = new BasicDBObject("_id","$_id");
     groupSameIdEntities.put("Date", new BasicDBObject("$first","$Date"));
     groupSameIdEntities.put("Topics", new BasicDBObject("$first","$Topics"));
     groupSameIdEntities.put("EntitiesSum", 
                    new BasicDBObject("$sum","$Entities.Sentiment.Value"));


     DBObject unwindTopics = new BasicDBObject("$unwind","$Topics");

     DBObject groupSameIdTopics = new BasicDBObject("_id","$_id");
     groupSameIdTopics.put("Date", new BasicDBObject("$first","$Date"));
     groupSameIdTopics.put("EntitiesSum", 
                         new BasicDBObject("$first","$EntitiesSum"));
     groupSameIdTopics.put("TopicsSum",
                        new BasicDBObject("$sum","$Topics.Sentiment.Value"));

     DBObject project = new BasicDBObject("_id",0);
     project.put("Date",1);
     project.put("EntitiesSum",1);
     project.put("TopicsSum",1);
     project.put("netSumPerId",
             new BasicDBObject("$add",
                   new String[]{"$EntitiesSum","$TopicsSum"}));

     DBObject groupByDate = new BasicDBObject("_id","$Date");
     groupByDate.put("EntitiesSentimentSum", 
                     new BasicDBObject("$sum","$EntitiesSum"));
     groupByDate.put("TopicsSentimentSum", 
                     new BasicDBObject("$sum","$TopicsSum"));
     groupByDate.put("netSentimentSum", 
                      new BasicDBObject("$sum","$netSumPerId"));

     AggregationOutput output = col.aggregate(unwindEntities,
                                new BasicDBObject("$group",
                                             groupSameIdEntities),
                                unwindTopics,
                                new BasicDBObject("$group",groupSameIdTopics),
                                new BasicDBObject("$project",project),
                                new BasicDBObject("$group",groupByDate));
订单样本(附两份文件):


请将日期字段另存为
ISODate()

谢谢你的回答,但你知道我如何使用java驱动程序做到这一点吗?我是一名c#开发人员,但我更新了答案,它不可靠,可能有一些错误幸运的是它不起作用,但我从你所做的事情中得到了线索,我将更新帖子(仍然有一个小问题)@hamed minae,我安装了eclipsed并测试了java代码,两者都运行良好。我用两种方法更新了我的答案。我希望现在对你有用你掉进了陷阱。@HamedMinaee提供的数据和问题并没有真正突出问题所在,即当您刚刚解开第一个数组后,第二个数组中有多个项目时会发生什么情况?有一种不同的处理方法。谢谢你@BatScream,一个快速的问题,为什么你要使用这行:groupSameIdTopics.put(“Type”,newbasicdbobject(“$first”,“$Type”));什么是NetEntitySum?如果您想要一个日期的所有实体和主题值的总和,您可以使用该字段,或者忽略它并将其从项目阶段删除。添加文档中的类型和其他值只是为了说明您也可以包括这些字段,但在这种情况下,它们不是必需的,您可以排除这些字段。谢谢您,当我尝试您的代码时,我还收到以下错误:方法聚合(DBObject,DBObject…)在类型中,DBCollection不适用于行的参数(列表):AggregationOutput output=col.aggregate(stages);似乎参数的数量并没有改变match@HamedMinaee这是因为,您使用的是旧版本的Java MongoDB驱动程序,其中没有引入方法
aggregate(List param)
。我修改了我的答案。请现在检查。
db.collection.aggregate([
{$unwind:"$Entities"},
{$group:{"_id":"$_id",
         "Date":{$first:"$Date"},
         "Topics":{$first:"$Topics"},
         "EntitiesSum":{$sum:"$Entities.Sentiment.Value"}}},
{$unwind:"$Topics"},
{$group:{"_id":"$_id",
         "Date":{$first:"$Date"},
         "EntitiesSum":{$first:"$EntitiesSum"},
         "TopicsSum":{$sum:"$Topics.Sentiment.Value"}}},
{$project:{"_id":0,"Date":1,"EntitiesSum":1,"TopicsSum":1,
           "indSum":{$add:["$EntitiesSum","$TopicsSum"]}}},
{$group:{"_id":"$Date",
         "EntitiesSentimentSum":{$sum:"$EntitiesSum"},
         "TopicsSentimentSum":{$sum:"$TopicsSum"},
         "netSentimentSum":{$sum:"$indSum"}}}
])
     DBObject unwindEntities = new BasicDBObject("$unwind","$Entities");

     DBObject groupSameIdEntities = new BasicDBObject("_id","$_id");
     groupSameIdEntities.put("Date", new BasicDBObject("$first","$Date"));
     groupSameIdEntities.put("Topics", new BasicDBObject("$first","$Topics"));
     groupSameIdEntities.put("EntitiesSum", 
                    new BasicDBObject("$sum","$Entities.Sentiment.Value"));


     DBObject unwindTopics = new BasicDBObject("$unwind","$Topics");

     DBObject groupSameIdTopics = new BasicDBObject("_id","$_id");
     groupSameIdTopics.put("Date", new BasicDBObject("$first","$Date"));
     groupSameIdTopics.put("EntitiesSum", 
                         new BasicDBObject("$first","$EntitiesSum"));
     groupSameIdTopics.put("TopicsSum",
                        new BasicDBObject("$sum","$Topics.Sentiment.Value"));

     DBObject project = new BasicDBObject("_id",0);
     project.put("Date",1);
     project.put("EntitiesSum",1);
     project.put("TopicsSum",1);
     project.put("netSumPerId",
             new BasicDBObject("$add",
                   new String[]{"$EntitiesSum","$TopicsSum"}));

     DBObject groupByDate = new BasicDBObject("_id","$Date");
     groupByDate.put("EntitiesSentimentSum", 
                     new BasicDBObject("$sum","$EntitiesSum"));
     groupByDate.put("TopicsSentimentSum", 
                     new BasicDBObject("$sum","$TopicsSum"));
     groupByDate.put("netSentimentSum", 
                      new BasicDBObject("$sum","$netSumPerId"));

     AggregationOutput output = col.aggregate(unwindEntities,
                                new BasicDBObject("$group",
                                             groupSameIdEntities),
                                unwindTopics,
                                new BasicDBObject("$group",groupSameIdTopics),
                                new BasicDBObject("$project",project),
                                new BasicDBObject("$group",groupByDate));
{ "_id" : 2.0141012E7, 
"EntitiesSentimentSum" : 0.30000000000000004 ,
"TopicsSentimentSum" : 1.2 , 
"netSentimentSum" : 1.5}