Python 如何分组/合并具有各种数据类型的数据帧
我有一个具有不同数据类型(列表、字典、字典列表、字符串等)的数据框架 我想通过Jon Snow将这两行合并,并将所有其他字段合并在一起,使其看起来像Python 如何分组/合并具有各种数据类型的数据帧,python,pandas,Python,Pandas,我有一个具有不同数据类型(列表、字典、字典列表、字符串等)的数据框架 我想通过Jon Snow将这两行合并,并将所有其他字段合并在一起,使其看起来像 name category description connection Jon Snow ['House Targaryen','House
name category description connection
Jon Snow ['House Targaryen','House Stark','Nights Watch'] Jon Snow, born ...... his army to Daenerys Targaryen. ['Rhaena Targaryen',...,'Bran Stark']
使用字典列表可能有点棘手,因为这是一个玩具示例,它只包含两行,很容易分解它并将两行类别组合在一起。但我认为在我的实际数据集中这样做是不现实的
我还考虑过使用df.groupby('name').aggregate('category':func1,'description':func2,'connection':func3)
,但我不确定是否有适合我需要的内置函数
谢谢亚尔的帮助 查看您的数据,可以先执行一个简单的
groupby
和sum
。然后使用列表理解处理类别:
import pandas as pd
df = pd.DataFrame([{'category': [{'id': 1, 'name':'House Targaryen'}],
'name': 'Jon Snow',
'description':'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark and Rhaegar Targaryen, the late Prince of Dragonstone',
'connection':['Rhaena Targaryen', 'Aegon Targaryen']},
{'category': [{'id': 2, 'name': 'House Stark'},{'id': 3, 'name': 'Nights Watch'}],
'name': 'Jon Snow',
'description': 'After successfully capturing a wight and presenting it to the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'connection':['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark']},
{"category":[{"id":4,"name":"Some house"}],
"name": "Some name",
"description": "some desc",
"connection":["connection 1"]}])
result = df.groupby("name").sum()
result["category"] = [[item.get("name") for item in i] for i in result["category"]]
result.reset_index(inplace=True)
print (result)
#
name category description connection
0 Jon Snow [House Targaryen, House Stark, Nights Watch] Jon Snow, born Aegon Targaryen, is the son of ... [Rhaena Targaryen, Aegon Targaryen, Robb Stark...
1 Some name [Some house] some desc [connection 1]
您可以使用
groupby()。对其执行任何转换并返回df。应用类似于df.groupby(“group\u col”).Apply(func)
我认为它解决了示例情况,但我意识到我的df在几乎所有列中都缺少值,似乎sum()
不知道如何处理它
import pandas as pd
df = pd.DataFrame([{'category': [{'id': 1, 'name':'House Targaryen'}],
'name': 'Jon Snow',
'description':'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark and Rhaegar Targaryen, the late Prince of Dragonstone',
'connection':['Rhaena Targaryen', 'Aegon Targaryen']},
{'category': [{'id': 2, 'name': 'House Stark'},{'id': 3, 'name': 'Nights Watch'}],
'name': 'Jon Snow',
'description': 'After successfully capturing a wight and presenting it to the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'connection':['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark']},
{"category":[{"id":4,"name":"Some house"}],
"name": "Some name",
"description": "some desc",
"connection":["connection 1"]}])
result = df.groupby("name").sum()
result["category"] = [[item.get("name") for item in i] for i in result["category"]]
result.reset_index(inplace=True)
print (result)
#
name category description connection
0 Jon Snow [House Targaryen, House Stark, Nights Watch] Jon Snow, born Aegon Targaryen, is the son of ... [Rhaena Targaryen, Aegon Targaryen, Robb Stark...
1 Some name [Some house] some desc [connection 1]