Sesam 将实体拆分为碎片
如何安全地将实体拆分为多个部分?例如,我有一份如下所示的文件:Sesam 将实体拆分为碎片,sesam,Sesam,如何安全地将实体拆分为多个部分?例如,我有一份如下所示的文件: { "_id": "Britney Spears", "hits": [ { "title": "Crazy", "rating": 2 }, { "title": "Oops! I Did It Again", "rating": 3 } ] } [ { "_id": "Britney Spears - Crazy",
{
"_id": "Britney Spears",
"hits": [
{
"title": "Crazy",
"rating": 2
},
{
"title": "Oops! I Did It Again",
"rating": 3
}
]
}
[
{
"_id": "Britney Spears - Crazy",
"artist": "Britney Spears",
"title": "Crazy",
"rating": 2
},
{
"_id": "Britney Spears - Oops! I Did It Again",
"artist": "Britney Spears",
"title": "Oops! I Did It Again",
"rating": 3
}
]
分为两个实体,如下所示:
{
"_id": "Britney Spears",
"hits": [
{
"title": "Crazy",
"rating": 2
},
{
"title": "Oops! I Did It Again",
"rating": 3
}
]
}
[
{
"_id": "Britney Spears - Crazy",
"artist": "Britney Spears",
"title": "Crazy",
"rating": 2
},
{
"_id": "Britney Spears - Oops! I Did It Again",
"artist": "Britney Spears",
"title": "Oops! I Did It Again",
"rating": 3
}
]
要使用删除跟踪安全地处理流,您需要创建两个管道。在第一个管道中,使用
创建子实体
功能建立子实体列表(注意,它们需要\u id
)。然后,您必须将输出存储在中间数据集中,并记住在此数据集中将track_children
设置为true
:
{
"_id": "artists",
"type": "pipe",
"source": {
"type": "embedded",
"entities": [{
"_id": "Britney Spears",
"hits": [{
"rating": 2,
"title": "Crazy"
}, {
"rating": 3,
"title": "Oops! I Did It Again"
}]
}]
},
"sink": {
"type": "dataset",
"dataset": "artists-with-hits",
"track_children": true
},
"transform": {
"type": "dtl",
"rules": {
"default": [
["copy", "_id"],
["create-child",
["apply", "song", "_S.hits"]
]
],
"song": [
["add", "_id",
["concat", "_P._S._id", " - ", "_S.title"]
],
["add", "artist", "_P._S._id"],
["copy", "*"]
]
}
}
}
在下一个管道中,可以拆分此实体:
{
"_id": "hits",
"type": "pipe",
"source": {
"type": "dataset",
"dataset": "artists-with-hits"
},
"transform": {
"type": "emit_children"
}
}
如果尝试在一个管道中使用多个变换执行此操作,则删除跟踪将不起作用
这将在hits
数据集中为您提供所需的输出