使用Kettle在MongoDB中创建子表
我有两个包含以下数据的PostgreSQL表: 房屋:使用Kettle在MongoDB中创建子表,mongodb,etl,kettle,Mongodb,Etl,Kettle,我有两个包含以下数据的PostgreSQL表: 房屋: -# select * from houses; id | address ----+---------------- 1 | 123 Main Ave. 2 | 456 Elm St. 3 | 789 County Rd. (3 rows) -# select * from people; id | name | house_id ----+-------+---------- 1 | Fred |
-# select * from houses;
id | address
----+----------------
1 | 123 Main Ave.
2 | 456 Elm St.
3 | 789 County Rd.
(3 rows)
-# select * from people;
id | name | house_id
----+-------+----------
1 | Fred | 1
2 | Jane | 1
3 | Bob | 1
4 | Mary | 2
5 | John | 2
6 | Susan | 2
7 | Bill | 3
8 | Nancy | 3
9 | Adam | 3
(9 rows)
和人:
-# select * from houses;
id | address
----+----------------
1 | 123 Main Ave.
2 | 456 Elm St.
3 | 789 County Rd.
(3 rows)
-# select * from people;
id | name | house_id
----+-------+----------
1 | Fred | 1
2 | Jane | 1
3 | Bob | 1
4 | Mary | 2
5 | John | 2
6 | Susan | 2
7 | Bill | 3
8 | Nancy | 3
9 | Adam | 3
(9 rows)
在Spoon中,我有两个表输入,第一个名为House Input,使用SQL:
SELECT
id
, address
FROM houses
ORDER BY id;
SELECT
"name"
, house_id
FROM people
ORDER BY house_id;
第二个表输入名为People input,使用SQL:
SELECT
id
, address
FROM houses
ORDER BY id;
SELECT
"name"
, house_id
FROM people
ORDER BY house_id;
我让两个表输入进入一个合并联接,第一步使用House input作为第一步,键为id
,第二步使用House\u id
然后,我将其输入到MongoDb输出中,其中包含数据库demo、集合house、Mongo文档字段address
和name
。(因为我希望MongoDB分配\u id
)
当我运行转换并键入db.houses.find()时代码>从Mongo shell中,我得到:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "name" : "Fred" }
{ "_id" : ObjectId("52083706b251cc4be9813154"), "address" : "123 Main Ave.", "name" : "Jane" }
{ "_id" : ObjectId("52083706b251cc4be9813155"), "address" : "123 Main Ave.", "name" : "Bob" }
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "456 Elm St.", "name" : "Mary" }
{ "_id" : ObjectId("52083706b251cc4be9813157"), "address" : "456 Elm St.", "name" : "John" }
{ "_id" : ObjectId("52083706b251cc4be9813158"), "address" : "456 Elm St.", "name" : "Susan" }
{ "_id" : ObjectId("52083706b251cc4be9813159"), "address" : "789 County Rd.", "name" : "Bill" }
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "name" : "Nancy" }
{ "_id" : ObjectId("52083706b251cc4be981315b"), "address" : "789 County Rd.", "name" : "Adam" }
我想得到的是:
{ "_id" : ObjectId("52083706b251cc4be9813153"), "address" : "123 Main Ave.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813154"), "name" : "Fred"} ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Jane" } ,
{ "_id" : ObjectId("52083706b251cc4be9813155"), "name" : "Bob" }
]
},
{ "_id" : ObjectId("52083706b251cc4be9813156"), "address" : "345 Elm St.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be9813157"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be9813158"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be9813159"), "name" : "Susan" }
]
},
{ "_id" : ObjectId("52083706b251cc4be981315a"), "address" : "789 County Rd.", "people" : [
{ "_id" : ObjectId("52083706b251cc4be981315b"), "name" : "Mary"} ,
{ "_id" : ObjectId("52083706b251cc4be981315c"), "name" : "John" } ,
{ "_id" : ObjectId("52083706b251cc4be981315d"), "name" : "Susan" }
]
}
}
我知道为什么我得到了我所得到的,但似乎在网上或示例中找不到任何东西让我达到我想要的目的
我希望有人能把我推向正确的方向,指出一个更接近我想要实现的例子,或者告诉我这超出了Kettle应该做的范围(希望不是后者)。事实证明,创建子表都在MongoDB输出步骤中
首先确保在“配置连接”选项卡上选中了“向上插入”和“修改器更新”
然后在Mongo Documents字段选项卡上输入以下内容(第一行是列名):
现在当我运行db.houses.find()时代码>我得到:
{ "_id" : ObjectId("520ccb8978d96b204daa029d"), "address" : "123 Main Ave.", "people" : [ { "name" : "Fred" }, { "name" : "Jane" }, { "name" : "Bob" } ] }
{ "_id" : ObjectId("520ccb8978d96b204daa029e"), "address" : "456 Elm St.", "people" : [ { "name" : "Mary" }, { "name" : "John" }, { "name" : "Susan" } ] }
{ "_id" : ObjectId("520ccb8a78d96b204daa029f"), "address" : "789 County Rd.", "people" : [ { "name" : "Bill" }, { "name" : "Nancy" }, { "name" : "Adam" } ] }
我想指出两件事:
这假设我的地址是唯一的,我的名字在房子里是唯一的。如果不是这样,我需要将我的id从OLTP表设置为MongoDB中的id(非_id)字段,并与我的房屋id上的字段upsert匹配
正如上面@G Gordon Worley III所指出的,如果这两个表在同一个数据库中,我可以在表输出步骤中进行连接,这将是一个两步转换(更快)李>
顺便说一句,我认为您不应该在转换中使用合并-连接步骤。相反,连接数据库中的表并使用连接釜的输出。您的数据库将能够比Kettle更好地进行连接:Kettle连接步骤最适合于来自由没有本地连接的数据源或来自混合源的数据填充的流的数据。非常好的一点@GGordonWorleyIII。这只是我用来说明我在MongoDB中试图实现的简单数据。但是,如果数据源位于同一个DB中,那么在表输出的SQL中进行连接将是最好的方法。顺便说一句,我正在接近一个决议,希望我能尽快发布一些东西。