如何使用pig在mongodb中按_id进行过滤
我有这样的mongo文档:如何使用pig在mongodb中按_id进行过滤,mongodb,hadoop,apache-pig,Mongodb,Hadoop,Apache Pig,我有这样的mongo文档: db.activity_days.findOne() { "_id" : ObjectId("54b4ee617acf9ce0440a3185"), "aca" : 0, "ca" : 0, "cbdw" : true, "day" : ISODate("2014-12-10T00:00:00Z"), "dm" : 0, "fbc" : 0, "go" : 2500, "gs" : [ ],
db.activity_days.findOne()
{
"_id" : ObjectId("54b4ee617acf9ce0440a3185"),
"aca" : 0,
"ca" : 0,
"cbdw" : true,
"day" : ISODate("2014-12-10T00:00:00Z"),
"dm" : 0,
"fbc" : 0,
"go" : 2500,
"gs" : [ ],
"its" : [
{
"_id" : ObjectId("551ac8d44f9f322e2b055d3a"),
"at" : 2000,
"atn" : "Running",
"cas" : 386.514909469507,
"dis" : 2.788989730832084,
"du" : 1472,
"ibr" : false,
"ide" : false,
"lcs" : false,
"pt" : 0,
"rpt" : 0,
"src" : 1001,
"stp" : 0,
"tcs" : [ ],
"ts" : 1418257729,
"u_at" : ISODate("2015-01-13T00:32:10.954Z")
}
],
"po" : 0,
"se" : 0,
"st" : 0,
"tap3c" : [ ],
"tzo" : -21600,
"u_at" : ISODate("2015-01-13T00:32:10.952Z"),
"uid" : ObjectId("545eb753ae9237b1df115649")
}
db.activity_day.find(_id:{$gt:ObjectId("54a48e000000000000000000"),$lt:ObjectId("54cd6c800000000000000000")})
我想使用pig过滤特殊的_id范围,我可以这样编写mongo查询:
db.activity_days.findOne()
{
"_id" : ObjectId("54b4ee617acf9ce0440a3185"),
"aca" : 0,
"ca" : 0,
"cbdw" : true,
"day" : ISODate("2014-12-10T00:00:00Z"),
"dm" : 0,
"fbc" : 0,
"go" : 2500,
"gs" : [ ],
"its" : [
{
"_id" : ObjectId("551ac8d44f9f322e2b055d3a"),
"at" : 2000,
"atn" : "Running",
"cas" : 386.514909469507,
"dis" : 2.788989730832084,
"du" : 1472,
"ibr" : false,
"ide" : false,
"lcs" : false,
"pt" : 0,
"rpt" : 0,
"src" : 1001,
"stp" : 0,
"tcs" : [ ],
"ts" : 1418257729,
"u_at" : ISODate("2015-01-13T00:32:10.954Z")
}
],
"po" : 0,
"se" : 0,
"st" : 0,
"tap3c" : [ ],
"tzo" : -21600,
"u_at" : ISODate("2015-01-13T00:32:10.952Z"),
"uid" : ObjectId("545eb753ae9237b1df115649")
}
db.activity_day.find(_id:{$gt:ObjectId("54a48e000000000000000000"),$lt:ObjectId("54cd6c800000000000000000")})
但我不知道如何用pig编写,谁知道呢?您可以尝试使用
mongohadoop
pig连接器,请参阅
一旦注册了jar(核心、pig和Java驱动程序),例如,REGISTER/path to/mongo-hadoop-pig-.jar代码>通过,您可以运行:
SET mongo.input.query '{"_id":{"\$gt":{"\$oid":"54a48e000000000000000000},"\$lt":{"\$oid":"54cd6c800000000000000000}}}'
rangeActivityDay = LOAD 'mongodb://localhost:27017/database.collection' USING com.mongodb.hadoop.pig.MongoLoader()
DUMP rangeActivityDay
您可能还希望在转储数据之前使用
上面的测试使用:mongo-java-driver-3.0.0-rc1.jar
,mongo-hadoop-pig-1.4.0.jar
,mongo-hadoop-core-1.4.0.jar
和