Python 将Mongo文档条目保存到CSV+；格式化等日期_Python_Mongodb_Csv_Isodate_Mongoexport

Python 将Mongo文档条目保存到CSV+；格式化等日期

python mongodb csv

Python 将Mongo文档条目保存到CSV+；格式化等日期,python,mongodb,csv,isodate,mongoexport,Python,Mongodb,Csv,Isodate,Mongoexport,我有一个叫“你好”的mongo集合中的数据。这些文档如下所示： { name: ..., size: ..., timestamp: ISODate("2013-01-09T21:04:12Z"), data: { text:..., place:...}, other: ... } 我想将时间戳和每个文档中的文本导出到CSV文件中，第一列是时间戳，第二列是文本 data = db.hello for i in data: try: connection.me.

我有一个叫“你好”的mongo集合中的数据。这些文档如下所示：

{ 
name: ..., 
size: ..., 
timestamp: ISODate("2013-01-09T21:04:12Z"), 
data: { text:..., place:...},
other: ...
}

我想将时间戳和每个文档中的文本导出到CSV文件中，第一列是时间戳，第二列是文本

data = db.hello
for i in data:
    try:
        connection.me.hello2.insert(i["data"]["text"], i["timestamp"])
    except:
        print "Unable", sys.exc_info()

我尝试创建一个新集合（hello2），其中文档只有时间戳和文本

data = db.hello
for i in data:
    try:
        connection.me.hello2.insert(i["data"]["text"], i["timestamp"])
    except:
        print "Unable", sys.exc_info()

然后我想使用mongoexport：

mongoexport --db me --collection hello2 --csv --out /Dropbox/me/hello2.csv

但这是行不通的，我不知道如何进行

PS：我还想在CSV文件中只存储ISODate的时间，即仅21:04:12而不是ISODate（“2013-01-09T21:04:12Z”）

感谢您的帮助。

您可以直接从数据收集中导出，无需临时收集：

for r in db.hello.find(fields=['text', 'timestamp']):
     print '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

或写入文件：

with open(output, 'w') as fp:
   for r in db.hello.find(fields=['text', 'timestamp']):
       print >>fp, '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

要筛选出重复项并仅打印最新的，该过程应分为两个步骤。首先，在字典中积累数据：

recs = {}
for r in d.foo.find(fields=['data', 'timestamp']):
    text, time = r['data']['text'], r['timestamp']
    if text not in recs or recs[text] < time:
        recs[text] = time

非常感谢。文本字段嵌套在“data”中，因此应该使用：db.hello.find（字段=['data.text'，'timestamp']）和%（r['data']['text']）？@Julia:我认为，

字段

应该是

['data'，'timestamp']

r['data']['text']

正确。是否有办法过滤掉重复的

r['data']['text']

，并且在两个文档具有相同文本的情况下，只保留最近的一个文档？谢谢