Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将Pandas dataframe中的行作为单个文档插入mongodb集合_Python_Mongodb_Pandas_Dataframe_Pymongo - Fatal编程技术网

使用Python将Pandas dataframe中的行作为单个文档插入mongodb集合

使用Python将Pandas dataframe中的行作为单个文档插入mongodb集合,python,mongodb,pandas,dataframe,pymongo,Python,Mongodb,Pandas,Dataframe,Pymongo,我一直在尝试将pandas数据帧的行作为单个文档插入mongodb集合。我使用pymongo从MongoDB中提取数据,执行一些转换,运行评分算法,并将评分作为附加列添加到数据帧中。最后一步是将行作为单个文档插入mongoDB数据库的特殊集合中,但我完全被卡住了。我的示例dataframedf如下所示 memberID dxCodes dxCount score 0 856589080 [

我一直在尝试将pandas数据帧的行作为单个文档插入mongodb集合。我使用pymongo从MongoDB中提取数据,执行一些转换,运行评分算法,并将评分作为附加列添加到数据帧中。最后一步是将行作为单个文档插入mongoDB数据库的特殊集合中,但我完全被卡住了。我的示例dataframedf如下所示

    memberID                                       dxCodes  dxCount  score
0  856589080          [4280, 4293, 4241, 4240, 4242, 4243]        6    1.8 
1  906903383                                       [V7612]        1    2.6
2  837210554                           [4550, 4553, V1582]        3    3.1
3  935634391       [78791, 28860, V1582, 496, 25000, 4019]        6    1.1
4  929185103  [30500, 42731, 4280, 496, 59972, 4019, 3051]        7    2.8
MemberID是一个字符串,dx代码是一个数组(在MongoDB术语中),dxCount是一个int,score是一个float。我一直在玩弄一段代码,这段代码是我在回答一个模糊不清的类似问题时发现的

import json
import datetime
df = pandas.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
records = json.loads(df.T.to_json()).values()     
db.temp.insert_many(records) 
这是我在收藏中得到的:

{
    "_id" : ObjectId("565a8f206d8bc51a08745de0"),
    "A" : NumberLong(1448753856695)
}

虽然不多,但我已经离得很近了。我花了很多时间在谷歌上搜索和在黑暗中拍照,但还没有破解它。非常感谢您的指导,提前感谢您的帮助

您需要使用该方法将数据帧转换为字典列表

从pprint导入pprint以漂亮地打印光标结果。 >>>作为pd进口熊猫 >>>进口pymongo >>>client=pymongo.MongoClient() >>>db=client.test >>>collection=db.collection >>>memberID=['856589080','906903383','837210554','935634391','929185103'] >>>DXCODE=[[42804293424142404243],[7612],[455045531582],[78791288601582496250004019],[30500427314280496599724011]] >>>dxCount=[6,1,3,6,7] >>>分数=[1.8,2.6,3.1,1.1,2.8] >>>df=pd.DataFrame({'memberID':memberID,'dxcode':dxcode,'score':score}) >>>df dxCodes成员ID分数 0 [4280, 4293, 4241, 4240, 4242, 4243] 856589080 1.8 1 [7612] 906903383 2.6 2 [4550, 4553, 1582] 837210554 3.1 3 [78791, 28860, 1582, 496, 25000, 4019] 935634391 1.1 4 [30500, 42731, 4280, 496, 59972, 4019, 3051] 929185103 2.8 >>>collection.insert_many(df.to_dict('records'))#您需要将'records'作为参数传递,以便获得dict列表。 >>>pprint(列表(collection.find()) [{u id':ObjectId('565b189f0acf45181c69d464'), “dxCodes”:[428042934241424042424243], 'memberID':'856589080', “分数”:1.8}, {“u id”:ObjectId('565b189f0acf45181c69d465'), “dxCodes”:[7612], 'memberID':'906903383', “分数”:2.6}, {“u id”:ObjectId('565b189f0acf45181c69d466'), “dxCodes”:[455045531582], 'memberID':'837210554', “分数”:3.1}, {“u id”:ObjectId('565b189f0acf45181c69d467'), "dxcode":[78791,28860,1582,496,25000,4019],, 'memberID':'935634391', “分数”:1.1}, {“u id”:ObjectId('565b189f0acf45181c69d468'), “dxCodes”:[305004273142804965997240193051], “memberID”:“929185103”, “分数”:2.8}] >>>
df=pandas.DataFrame.from_dict({'A':{1:datetime.datetime.now()})是什么?您正在此处创建新的
DataFrame
。我从这个问题的公认答案中得到了这个代码片段:我显示的结果是运行该代码片段后插入数据库的文档。你得到了不同的结果吗?如果是这样,我很好奇你得到了什么。在再次查看之后,我发现它正在创建一个没有意义的新数据帧。然而,这是我上面提到的问题的公认答案,尽管我不知道它是如何实现OP的目标的。我感谢你的反馈。
>>> from pprint import pprint # to pretty print the cursor result.
>>> import pandas as pd
>>> import pymongo
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> memberID = ['856589080', '906903383', '837210554', '935634391', '929185103']
>>> dxCodes = [[4280, 4293, 4241, 4240, 4242, 4243], [7612], [4550, 4553, 1582],[78791, 28860, 1582, 496, 25000, 4019], [30500, 42731, 4280, 496, 59972, 4019, 3051]]
>>> dxCount = [6, 1, 3, 6, 7]
>>> score = [1.8, 2.6, 3.1, 1.1, 2.8]
>>> df = pd.DataFrame({'memberID': memberID, 'dxCodes': dxCodes, 'score': score})
>>> df
                                        dxCodes   memberID  score
0          [4280, 4293, 4241, 4240, 4242, 4243]  856589080    1.8
1                                        [7612]  906903383    2.6
2                            [4550, 4553, 1582]  837210554    3.1
3        [78791, 28860, 1582, 496, 25000, 4019]  935634391    1.1
4  [30500, 42731, 4280, 496, 59972, 4019, 3051]  929185103    2.8
>>> collection.insert_many(df.to_dict('records')) # you need to pass the 'records' as argument in order to get a list of dict.
<pymongo.results.InsertManyResult object at 0x7fcd7035d990>
>>> pprint(list(collection.find()))
[{'_id': ObjectId('565b189f0acf45181c69d464'),
  'dxCodes': [4280, 4293, 4241, 4240, 4242, 4243],
  'memberID': '856589080',
  'score': 1.8},
 {'_id': ObjectId('565b189f0acf45181c69d465'),
  'dxCodes': [7612],
  'memberID': '906903383',
  'score': 2.6},
 {'_id': ObjectId('565b189f0acf45181c69d466'),
  'dxCodes': [4550, 4553, 1582],
  'memberID': '837210554',
  'score': 3.1},
 {'_id': ObjectId('565b189f0acf45181c69d467'),
  'dxCodes': [78791, 28860, 1582, 496, 25000, 4019],
  'memberID': '935634391',
  'score': 1.1},
 {'_id': ObjectId('565b189f0acf45181c69d468'),
  'dxCodes': [30500, 42731, 4280, 496, 59972, 4019, 3051],
  'memberID': '929185103',
  'score': 2.8}]
>>>