Python 从DataFrame中解包未知对象

Python 从DataFrame中解包未知对象,python,pandas,dataframe,Python,Pandas,Dataframe,我是新手。我有一个数据框,其中包含一个我无法识别的对象,我需要将其解包并转换为一个新的单独数据框,以形成一个新的规范化结构 df的简化版本为: trasaction_id customer_details 0 1 <customer {id:'A123', name: 'Tina'} as x > 0 2 <customer {id:'B456', name: 'Tony'} as x > 0 3 <cust

我是新手。我有一个数据框,其中包含一个我无法识别的对象,我需要将其解包并转换为一个新的单独数据框,以形成一个新的规范化结构

df的简化版本为:

   trasaction_id   customer_details
0   1       <customer {id:'A123', name: 'Tina'} as x >
0   2       <customer {id:'B456', name: 'Tony'} as x >
0   3       <customer {id:'C789', name: 'Tim'} as x >

Name: customer_details, dtype: object

这让我快发疯了。谢谢你的帮助

似乎您的对象/类具有属性
id
name
,因此您可以尝试获取

{'id': st.id, 'name': st.name}
也就是说

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
或直接连接到分离的列

df['id']   = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)

示例代码:

import pandas as pd

class customer:
    def __init__(self, id_, name):
        self.id = id_
        self.name = name
    def __str__(self):
        return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)

data = {
    'trasaction_id': [1,2,3],
    'customer_details': [
        customer('A123', 'Tina'),
        customer('B456', 'Tony'),
        customer('C789', 'Tim')
    ],
}

df = pd.DataFrame(data)
print(df)

# ---

df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )

如果它是具有属性
id
name
的对象,那么
{'id':st.id,'name':st.name}
怎么样。但如果它只是普通字符串,那么您必须使用字符串函数或正则表达式切断字符串中的值。更详细的信息:
df['customer\u details']=df['customer\u details'].apply(lambda st:{'id':st.id,'name':st.name})
谢谢@furas-我试过
df['customer\u details']=df['customer\u details'])。apply(lambda x:{'id':x.id,'name':x.name})
但被告知该属性是字符串,并给出了错误
AttributeError:'str'对象没有属性'id'
如果您有字符串,则必须使用
re
从字符串中获取值。我添加了字符串示例。我尝试应用myfunc,现在得到以下结果:
TypeError:'NoneType'对象不是t subscriptable
在某些地方,您似乎有空单元格-
None
而不是字符串。或者字符串有不同的元素,它找不到模式
“id:'(.*),name:'(.*)}”
,它可能需要不同的模式。
import pandas as pd

class customer:
    def __init__(self, id_, name):
        self.id = id_
        self.name = name
    def __str__(self):
        return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)

data = {
    'trasaction_id': [1,2,3],
    'customer_details': [
        customer('A123', 'Tina'),
        customer('B456', 'Tony'),
        customer('C789', 'Tim')
    ],
}

df = pd.DataFrame(data)
print(df)

# ---

df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )
   trasaction_id                        customer_details
0              1  <customer {id: A123, name: Tina} as x>
1              2  <customer {id: B456, name: Tony} as x>
2              3   <customer {id: C789, name: Tim} as x>

   trasaction_id                        customer_details    id  name
0              1  <customer {id: A123, name: Tina} as x>  A123  Tina
1              2  <customer {id: B456, name: Tony} as x>  B456  Tony
2              3   <customer {id: C789, name: Tim} as x>  C789   Tim

   trasaction_id                customer_details    id  name
0              1  {'id': 'A123', 'name': 'Tina'}  A123  Tina
1              2  {'id': 'B456', 'name': 'Tony'}  B456  Tony
2              3   {'id': 'C789', 'name': 'Tim'}  C789   Tim
import pandas as pd
import re

data = {
    'trasaction_id': [1,2,3],
    'customer_details': [
        "<customer {id:'A123', name: 'Tina'} as x >",
        "<customer {id:'B456', name: 'Tony'} as x >",
        "<customer {id:'C789', name: 'Tim'} as x >",
    ]
}

df = pd.DataFrame(data)
print(df)

# ---

df['id'] = df['customer_details'].apply(lambda x: re.search("id:'(.*)',", x)[1])
df['name'] = df['customer_details'].apply(lambda x: re.search("name: '(.*)'}", x)[1])
print(df)

def myfunc(x):
    r = re.search("id:'(.*)', name: '(.*)'}", x)
    return {'id': r[1], 'name': r[2]}

df['customer_details'] = df['customer_details'].apply(myfunc)
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )