Python 将数据转换为属性：值对，同时保持ID_Python_Pandas

Python 将数据转换为属性：值对，同时保持ID

python pandas

Python 将数据转换为属性：值对，同时保持ID,python,pandas,Python,Pandas,我有一个客户信息数据集，每个客户都有一个唯一的ID；考虑其状态的身份变量，如退休、学生等。；最后是一个类型变量，该变量考虑顾客的类型，例如每周、每月等。顾客可以有多种身份和类型，但只有一个ID 从我的数据集中，我希望将每个身份与每个类型关联起来，以便在此基础上进行计数和热图。因此，在维护客户ID的同时，我需要将数据分解并解压为attribute:value对格式。我甚至还不知道如何实现这一点，因此不确定要阅读哪些文档，也无法提供任何编码尝试这是我的数据的一个可复制的例子和我的预期输出的一个样本

我有一个客户信息数据集，每个客户都有一个唯一的ID；考虑其状态的身份变量，如退休、学生等。；最后是一个类型变量，该变量考虑顾客的类型，例如每周、每月等。顾客可以有多种身份和类型，但只有一个ID

从我的数据集中，我希望将每个身份与每个类型关联起来，以便在此基础上进行计数和热图。因此，在维护客户ID的同时，我需要将数据分解并解压为attribute:value对格式。我甚至还不知道如何实现这一点，因此不确定要阅读哪些文档，也无法提供任何编码尝试

这是我的数据的一个可复制的例子和我的预期输出的一个样本。欢迎任何建议

import pandas as pd
import numpy as np
    
data = {'ID':['1','1','1','2','2','2','3','3','3','3'],
                'Identity': ['Identity_1', 'Identity_2','Identity_3','Identity_1', 'Identity_4','Identity_2','Identity_4', 'Identity_5','Identity_6',np.nan],
                'Type': ['Type_1', 'Type_2',np.nan,'Type_3', 'Type_1',np.nan,'Type_4','Type_5','Type_1', 'Type_6']
                }
        
df_data = pd.DataFrame (data, columns = ['ID','Identity','Type'])
        
result ={'ID':['1','1','1','1','1','1'],
                'Identity': ['Identity_1', 'Identity_1','Identity_2','Identity_2', 'Identity_3','Identity_3'],
                'Type': ['Type_1', 'Type_2','Type_1', 'Type_2','Type_1', 'Type_2']
                }
        
df_result = pd.DataFrame (result, columns = ['ID','Identity','Type'])

单向使用

itertools.product

：

from itertools import product

def prod(data):
    return pd.DataFrame(list(product(data["Identity"].dropna(), 
                                     data["Type"].dropna())), 
                        columns=["Identity", "Type"])

new_df = df_data.groupby("ID").apply(prod).reset_index(level=1, drop=True)
print(new_df)

输出：

      Identity    Type
ID                    
1   Identity_1  Type_1
1   Identity_1  Type_2
1   Identity_2  Type_1
1   Identity_2  Type_2
1   Identity_3  Type_1
1   Identity_3  Type_2
2   Identity_1  Type_3
2   Identity_1  Type_1
2   Identity_4  Type_3
2   Identity_4  Type_1
2   Identity_2  Type_3
2   Identity_2  Type_1
3   Identity_4  Type_4
3   Identity_4  Type_5
3   Identity_4  Type_1
3   Identity_4  Type_6
3   Identity_5  Type_4
3   Identity_5  Type_5
3   Identity_5  Type_1
3   Identity_5  Type_6
3   Identity_6  Type_4
3   Identity_6  Type_5
3   Identity_6  Type_1
3   Identity_6  Type_6

Identity_1，Type_3

出现在第一个数据中，但在结果中显示

Identity_1，Type_2

这是有意的吗？还是打字错误？对于我的ID为1的客户，我们有：Identity_1有类型_1/类型_2。标识_2具有类型_1/类型_2。标识_3具有类型_1/类型_2。对于ID为2的我的客户，我们将有标识\u 1的类型为\u 3/类型为\u 1，标识\u 4的类型为\u 3/类型为\u 1，标识\u 2的类型为\u 3/类型为\u 1。此模式将继续，检查客户ID，然后将所有标识与所有类型配对。您能否通过本地打印来检查您放置的第一个数据集？我没有看到ID 1的标识1与类型1/2配对。。。相反，我只看到type1@AkshaySehgal谢谢你抽出时间。我再次在本地检查，数据对我来说似乎很好。哦，好吧，我当时可能没有正确的理解。抱歉，一百万年来我从未得到过。我从未听说过

itertools.product

。非常感谢，它工作得很好。