Python 使用apply将从一列(json类型)提取的值插入到另一列
我有以下数据集:Python 使用apply将从一列(json类型)提取的值插入到另一列,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据集: userid sub_id event 1 NaN {'score':25, 'sub_id':5} 1 5 {'score':1} 当sub_id列为NaN时,我想使用以下代码从事件列中提取此信息: df['sub_id'] = df.apply(lambda row: row['event'].split('sub_id')[1]
userid sub_id event
1 NaN {'score':25, 'sub_id':5}
1 5 {'score':1}
当sub_id
列为NaN时,我想使用以下代码从事件
列中提取此信息:
df['sub_id'] = df.apply(lambda row:
row['event'].split('sub_id')[1]
if pd.isnull(row['sub_id'])
else row['sub_id'])
但是,我收到了这个错误:KeyError:('sub_id',u'发生在索引处')
我正在尝试获取此数据帧:
userid sub_id event
1 5 {'score':25, 'sub_id':5}
1 5 {'score':1}
对这个错误有什么想法,或者对不同的解决方案有什么建议
更新
我需要提取嵌套dict元素中的值:
event
{u'POST': {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}, u'GET': {}}
我正在使用以下代码:
df['POST'] = df['event'].apply(pd.Series)['POST']
这将创建以下列:
POST
{u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
但是,我需要获得总体反馈值。由于POST
字段的格式设置,以下代码不起作用:
df['POST'].apply(pd.Series)['overall_feedback']
它抛出此错误KeyError:“总体反馈”
有什么想法吗?您可以使用或:
编辑1:
对于转换为dict,可使用:
import ast, yaml
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})
df.event = df.event.astype(str)
print (type(df.loc[0, 'event']))
<class 'str'>
df['event'] = df['event'].apply(ast.literal_eval)
#df['event'] = df['event'].apply(yaml.load)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1
print (type(df.loc[0, 'event']))
<class 'dict'>
导入ast、yaml
df=pd.DataFrame({'userid':[1,1],
“sub_id”:[np.nan,5],
'event':[{'post':{'score':25,'sub_id':5},{'post':{'score':1}]})
df.event=df.event.astype(str)
打印(类型(df.loc[0,'事件']))
df['event']=df['event'].apply(ast.literal_eval)
#df['event']=df['event'].apply(yaml.load)
打印(df)
事件子用户id用户id
0{'post':{'sub_id':5,'score':25}}1
1{'post':{'score':1}}5.0 1
打印(类型(df.loc[0,'事件']))
编辑2:
d = {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
d1 = {u'{"options_selected":{"Ideas":"2"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_2"}': [u'']}
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[d,d1]})
df['event'] = df['event'].astype(str).apply(yaml.load).apply(lambda x : yaml.load(list(x.keys())[0]))
print (type(df.event.iloc[0]))
<class 'dict'>
print (df.event.apply(pd.Series)['overall_feedback'])
0 Feedback_text_goes_here_1
1 Feedback_text_goes_here_2
Name: overall_feedback, dtype: object
d={u'{“选择的选项”:{“想法”:“0”},“标准反馈”:{},“总体反馈”:“反馈”{u文本{u转到这里{u 1”}:[u']}
d1={u'{“选择的选项”:{“想法”:“2”},“标准反馈”:{},“总体反馈”:“反馈”{u文本{u去这里{u 2”}:[u']}
df=pd.DataFrame({'userid':[1,1],
“sub_id”:[np.nan,5],
'事件':[d,d1]})
df['event']=df['event'].astype(str).apply(yaml.load).apply(lambda x:yaml.load(list(x.keys())[0]))
打印(类型(df.event.iloc[0]))
打印(df.event.apply(pd.Series)[“总体反馈])
0反馈\u文本\u转到\u此处\u 1
1反馈\u文本\u发送\u此处\u 2
名称:总体反馈,数据类型:对象
谢谢,请您先解释一下什么是合并_
?它取代了NaN,类似于fillna。什么是打印(类型(df.loc[0',event'])
?我明白了,我很感激!如果事件
值是这样的:{post:{'score':25,'sub_id':5}
先生,我使用fillna
选项收到索引器:列表索引超出范围
错误。我认为sub_id
是由NA值过滤的,因此它的长度小于event
列。
s = df['event'].apply(pd.Series)['post'].apply(pd.Series)['score']
print (s)
0 25.0
1 1.0
Name: score, dtype: float64
df['sub_id'] = df['sub_id'].combine_first(s)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} 25.0 1
1 {'post': {'score': 1}} 5.0 1
import ast, yaml
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})
df.event = df.event.astype(str)
print (type(df.loc[0, 'event']))
<class 'str'>
df['event'] = df['event'].apply(ast.literal_eval)
#df['event'] = df['event'].apply(yaml.load)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1
print (type(df.loc[0, 'event']))
<class 'dict'>
d = {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
d1 = {u'{"options_selected":{"Ideas":"2"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_2"}': [u'']}
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[d,d1]})
df['event'] = df['event'].astype(str).apply(yaml.load).apply(lambda x : yaml.load(list(x.keys())[0]))
print (type(df.event.iloc[0]))
<class 'dict'>
print (df.event.apply(pd.Series)['overall_feedback'])
0 Feedback_text_goes_here_1
1 Feedback_text_goes_here_2
Name: overall_feedback, dtype: object