Python 从DataFrame中解包未知对象
我是新手。我有一个数据框,其中包含一个我无法识别的对象,我需要将其解包并转换为一个新的单独数据框,以形成一个新的规范化结构 df的简化版本为:Python 从DataFrame中解包未知对象,python,pandas,dataframe,Python,Pandas,Dataframe,我是新手。我有一个数据框,其中包含一个我无法识别的对象,我需要将其解包并转换为一个新的单独数据框,以形成一个新的规范化结构 df的简化版本为: trasaction_id customer_details 0 1 <customer {id:'A123', name: 'Tina'} as x > 0 2 <customer {id:'B456', name: 'Tony'} as x > 0 3 <cust
trasaction_id customer_details
0 1 <customer {id:'A123', name: 'Tina'} as x >
0 2 <customer {id:'B456', name: 'Tony'} as x >
0 3 <customer {id:'C789', name: 'Tim'} as x >
Name: customer_details, dtype: object
这让我快发疯了。谢谢你的帮助 似乎您的对象/类具有属性
id
和name
,因此您可以尝试获取
{'id': st.id, 'name': st.name}
也就是说
df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
或直接连接到分离的列
df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
示例代码:
import pandas as pd
class customer:
def __init__(self, id_, name):
self.id = id_
self.name = name
def __str__(self):
return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)
data = {
'trasaction_id': [1,2,3],
'customer_details': [
customer('A123', 'Tina'),
customer('B456', 'Tony'),
customer('C789', 'Tim')
],
}
df = pd.DataFrame(data)
print(df)
# ---
df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)
df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)
#new_df = pd.DataFrame( df['customer_details'].to_list() )
如果它是具有属性
id
和name
的对象,那么{'id':st.id,'name':st.name}
怎么样。但如果它只是普通字符串,那么您必须使用字符串函数或正则表达式切断字符串中的值。更详细的信息:df['customer\u details']=df['customer\u details'].apply(lambda st:{'id':st.id,'name':st.name})
谢谢@furas-我试过df['customer\u details']=df['customer\u details'])。apply(lambda x:{'id':x.id,'name':x.name})
但被告知该属性是字符串,并给出了错误AttributeError:'str'对象没有属性'id'
如果您有字符串,则必须使用re
从字符串中获取值。我添加了字符串示例。我尝试应用myfunc,现在得到以下结果:TypeError:'NoneType'对象不是t subscriptable
在某些地方,您似乎有空单元格-None
而不是字符串。或者字符串有不同的元素,它找不到模式“id:'(.*),name:'(.*)}”
,它可能需要不同的模式。
import pandas as pd
class customer:
def __init__(self, id_, name):
self.id = id_
self.name = name
def __str__(self):
return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)
data = {
'trasaction_id': [1,2,3],
'customer_details': [
customer('A123', 'Tina'),
customer('B456', 'Tony'),
customer('C789', 'Tim')
],
}
df = pd.DataFrame(data)
print(df)
# ---
df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)
df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)
#new_df = pd.DataFrame( df['customer_details'].to_list() )
trasaction_id customer_details
0 1 <customer {id: A123, name: Tina} as x>
1 2 <customer {id: B456, name: Tony} as x>
2 3 <customer {id: C789, name: Tim} as x>
trasaction_id customer_details id name
0 1 <customer {id: A123, name: Tina} as x> A123 Tina
1 2 <customer {id: B456, name: Tony} as x> B456 Tony
2 3 <customer {id: C789, name: Tim} as x> C789 Tim
trasaction_id customer_details id name
0 1 {'id': 'A123', 'name': 'Tina'} A123 Tina
1 2 {'id': 'B456', 'name': 'Tony'} B456 Tony
2 3 {'id': 'C789', 'name': 'Tim'} C789 Tim
import pandas as pd
import re
data = {
'trasaction_id': [1,2,3],
'customer_details': [
"<customer {id:'A123', name: 'Tina'} as x >",
"<customer {id:'B456', name: 'Tony'} as x >",
"<customer {id:'C789', name: 'Tim'} as x >",
]
}
df = pd.DataFrame(data)
print(df)
# ---
df['id'] = df['customer_details'].apply(lambda x: re.search("id:'(.*)',", x)[1])
df['name'] = df['customer_details'].apply(lambda x: re.search("name: '(.*)'}", x)[1])
print(df)
def myfunc(x):
r = re.search("id:'(.*)', name: '(.*)'}", x)
return {'id': r[1], 'name': r[2]}
df['customer_details'] = df['customer_details'].apply(myfunc)
print(df)
#new_df = pd.DataFrame( df['customer_details'].to_list() )