Python 从数据帧列表中的字典中提取元素
假设我们有一个具有以下格式的数据帧:Python 从数据帧列表中的字典中提取元素,python,json,pandas,dataframe,dictionary,Python,Json,Pandas,Dataframe,Dictionary,假设我们有一个具有以下格式的数据帧: col1 [{'overall_prop': '0.812'}, {'overall_prop': '0.125'}, {'overall_prop': '0.062'}] {} 原始数据是json格式的。我想从每行列表的第一个元素中提取'totall_prop'的值,下面是我试图提取第一个元素的内容: df['col1'].str[0] 一切正常,然后提取以下内容以提取“总体道具”: df['col1'].str[0].map(lambda x: x
col1
[{'overall_prop': '0.812'}, {'overall_prop': '0.125'}, {'overall_prop': '0.062'}]
{}
原始数据是json格式的。我想从每行列表的第一个元素中提取'totall_prop'
的值,下面是我试图提取第一个元素的内容:
df['col1'].str[0]
一切正常,然后提取以下内容以提取“总体道具”
:
df['col1'].str[0].map(lambda x: x.get('overall_prop'))
但他抱怨说:
{AttributeError}'float' object has no attribute 'get'
因为{}
(python dict对象)变成了nan
然后我试了一下:
df['col1'].where(df['col1'].notna(), lambda x: [{}]).str[0].map(lambda x: x.get('overall_prop'))
但这次:
{TypeError}argument of type 'NoneType' is not iterable
总之,我正在寻找一种解决方案,从列表中的字典中提取可以处理空值的元素。编辑版本1:col1是dict的列表,x[0]具有全局属性
你可以这样做。使用df.col1.apply(lambda x:x[0]['overall_prop'])
从列表中获取第一个元素,并从第一个元素的字典中获取overall_prop
值
这里的假设是col1
中的每一行都是一个字典,并且具有键overall\u prop
import pandas as pd
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
{'overall_prop': '0.002'},
{'overall_prop': '0.003'}],
[{'overall_prop': '0.004'},
{'overall_prop': '0.005'},
{'overall_prop': '0.006'}],
[{'overall_prop': '0.007'},
{'overall_prop': '0.008'},
{'overall_prop': '0.009'}],
[{'overall_prop': '0.010'},
{'overall_prop': '0.011'},
{'overall_prop': '0.012'}],
[{'overall_prop': '0.013'},
{'overall_prop': '0.014'},
{'overall_prop': '0.015'}]]})
print (df)
df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'])
print (df)
其输出将为:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{'overall_prop': '0.004'}, {'overall_prop': '... 0.004
2 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
3 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
4 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
3 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
4 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
5 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 {'bad': '0.999'} NaN
3 {} NaN
4 just a bad string NaN
5 250 NaN
6 35.25 NaN
7 True NaN
8 False NaN
9 (10, 20) NaN
10 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
11 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
12 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
13 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
编辑版本2:col1是dict列表,列表中的dict为空
如果您的行没有将totall_prop
作为键,则可以使用此键
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
{'overall_prop': '0.002'},
{'overall_prop': '0.003'}],
[{}],
[{'incorrect_key': '0.004'},
{'overall_prop': '0.005'},
{'overall_prop': '0.006'}],
[{'overall_prop': '0.007'},
{'overall_prop': '0.008'},
{'overall_prop': '0.009'}],
[{'overall_prop': '0.010'},
{'overall_prop': '0.011'},
{'overall_prop': '0.012'}],
[{'overall_prop': '0.013'},
{'overall_prop': '0.014'},
{'overall_prop': '0.015'}]]})
import numpy as np
df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'] if 'overall_prop' in x[0] else np.NaN)
其输出将为:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{'overall_prop': '0.004'}, {'overall_prop': '... 0.004
2 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
3 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
4 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
3 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
4 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
5 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 {'bad': '0.999'} NaN
3 {} NaN
4 just a bad string NaN
5 250 NaN
6 35.25 NaN
7 True NaN
8 False NaN
9 (10, 20) NaN
10 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
11 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
12 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
13 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
编辑版本3:col1具有不同类型的数据
其输出将为:
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{'overall_prop': '0.004'}, {'overall_prop': '... 0.004
2 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
3 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
4 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
3 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
4 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
5 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
col1 overall_prop
0 [{'overall_prop': '0.001'}, {'overall_prop': '... 0.001
1 [{}] NaN
2 {'bad': '0.999'} NaN
3 {} NaN
4 just a bad string NaN
5 250 NaN
6 35.25 NaN
7 True NaN
8 False NaN
9 (10, 20) NaN
10 [{'incorrect_key': '0.004'}, {'overall_prop': ... NaN
11 [{'overall_prop': '0.007'}, {'overall_prop': '... 0.007
12 [{'overall_prop': '0.010'}, {'overall_prop': '... 0.010
13 [{'overall_prop': '0.013'}, {'overall_prop': '... 0.013
您是否尝试过
df['col1'].str.map(lambda x:x[0]['overall_prop'])
@JoeFerndz:StringMethods没有mapmy bad。我应该多加注意。您不能在str上使用map。而是使用df.col1.apply(lambda x:x[0]['totall_prop'])
。下面是我的回答。我在描述中说过,问题是其中一行中有一个空的dictionary对象会破坏所有内容。这也解决了。请参阅我的答案部分底部的更新代码谢谢您更新解决方案,但仍在抱怨:{TypeError}'float'对象不是subscriptable我想原因是在您的示例中,列表中有空的dic。就我而言,空的dic不在列表中,明白了。我显式地检查它是否是一个列表,列表中是否有一个字典,以及字典的第一个元素是否是totall\u prop
。看看这能否解决问题