Sesam 将实体拆分为碎片

Sesam 将实体拆分为碎片,sesam,Sesam,如何安全地将实体拆分为多个部分?例如,我有一份如下所示的文件: { "_id": "Britney Spears", "hits": [ { "title": "Crazy", "rating": 2 }, { "title": "Oops! I Did It Again", "rating": 3 } ] } [ { "_id": "Britney Spears - Crazy",

如何安全地将实体拆分为多个部分?例如,我有一份如下所示的文件:

{
  "_id": "Britney Spears",
  "hits": [
    {
      "title": "Crazy",
      "rating": 2
    },
    {
      "title": "Oops! I Did It Again",
      "rating": 3
    }
  ]
}
[
    {
      "_id": "Britney Spears - Crazy",
      "artist": "Britney Spears",
      "title": "Crazy",
      "rating": 2
    },
    {
      "_id": "Britney Spears - Oops! I Did It Again",
      "artist": "Britney Spears",
      "title": "Oops! I Did It Again",
      "rating": 3
    }
]
分为两个实体,如下所示:

{
  "_id": "Britney Spears",
  "hits": [
    {
      "title": "Crazy",
      "rating": 2
    },
    {
      "title": "Oops! I Did It Again",
      "rating": 3
    }
  ]
}
[
    {
      "_id": "Britney Spears - Crazy",
      "artist": "Britney Spears",
      "title": "Crazy",
      "rating": 2
    },
    {
      "_id": "Britney Spears - Oops! I Did It Again",
      "artist": "Britney Spears",
      "title": "Oops! I Did It Again",
      "rating": 3
    }
]

要使用删除跟踪安全地处理流,您需要创建两个管道。在第一个管道中,使用
创建子实体
功能建立子实体列表(注意,它们需要
\u id
)。然后,您必须将输出存储在中间数据集中,并记住在此数据集中将
track_children
设置为
true

{
  "_id": "artists",
  "type": "pipe",
  "source": {
    "type": "embedded",
    "entities": [{
      "_id": "Britney Spears",
      "hits": [{
        "rating": 2,
        "title": "Crazy"
      }, {
        "rating": 3,
        "title": "Oops! I Did It Again"
      }]
    }]
  },
  "sink": {
    "type": "dataset",
    "dataset": "artists-with-hits",
    "track_children": true
  },
  "transform": {
    "type": "dtl",
    "rules": {
      "default": [
        ["copy", "_id"],
        ["create-child",
          ["apply", "song", "_S.hits"]
        ]
      ],
      "song": [
        ["add", "_id",
          ["concat", "_P._S._id", " - ", "_S.title"]
        ],
        ["add", "artist", "_P._S._id"],
        ["copy", "*"]
      ]
    }
  }
}
在下一个管道中,可以拆分此实体:

{
  "_id": "hits",
  "type": "pipe",
  "source": {
    "type": "dataset",
    "dataset": "artists-with-hits"
  },
  "transform": {
    "type": "emit_children"
  }
}
如果尝试在一个管道中使用多个变换执行此操作,则删除跟踪将不起作用

这将在
hits
数据集中为您提供所需的输出