Python 将两个不同数据帧的列强制为相同的数据类型_Python_Pandas_Format

Python 将两个不同数据帧的列强制为相同的数据类型

python pandas

Python 将两个不同数据帧的列强制为相同的数据类型,python,pandas,format,Python,Pandas,Format,我有两个数据帧，其结构如下： print(product_combos1.head(n=5)) product_id count Length 0 (P06, P09) 36340 2 1 (P01, P05, P06, P09) 10085 4 2 (P01, P06) 36337 2 3 (P01, P09) 49897 2 4

我有两个数据帧，其结构如下：

print(product_combos1.head(n=5))
             product_id  count  Length
0            (P06, P09)  36340       2
1  (P01, P05, P06, P09)  10085       4
2            (P01, P06)  36337       2
3            (P01, P09)  49897       2
4            (P02, P09)  11573       2

print(testing_df.head(n=5))
                     product_id  Length
transaction_id                         
001                       [P01]       1
002                  [P01, P02]       2
003             [P01, P02, P09]       3
004                  [P01, P03]       2
005             [P01, P03, P05]       3

如何强制

testing\u df

中的“product\u id”列，使其与

product\u combos1

df中的列格式相同？（即-用括号代替括号）

python元组显示在括号中。列表显示在括号中

更改数据帧

testing_df['product_id'] = testing_df['product_id'].apply(tuple)
testing_df 

                     product_id  Length
transaction_id                         
1                        (P01,)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3

复印

testing_df.assign(product_id=testing_df.product_id.apply(tuple))

                     product_id  Length
transaction_id                         
1                        (P01,)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3

当然，除非这些实际上是字符串。然后用括号代替括号

testing_df.assign(product_id=testing_df.product_id.str.replace('\[(.*)\]', r'(\1)'))

                     product_id  Length
transaction_id                         
1                         (P01)       1
2                    (P01, P02)       2
3               (P01, P02, P09)       3
4                    (P01, P03)       2
5               (P01, P03, P05)       3

唯一的问题是，我的df的第一行已经从

['P01']

变成

（'P01'，）

我不知道为什么在第一行中添加了“，”，所以列元素是列表，您应用了

元组。是的，另一个数据帧的长度不是一元组。这个列表确实有长度为1的列表。Python显示长度为1的元组，其中带有逗号，以区别于表达式（x）
。当我尝试比较这两个数据帧时，这会导致任何复杂情况吗？请查看您是否可以提供帮助。当我运行上述代码时，长度列将从测试中消失，当我尝试重新添加它时，会出现以下错误：KeyError:'product\u id'
上面有一些您可以运行的内容。您正在运行哪些代码？例如，testing\u df.product\u id.str.replace（'\[（.*）\]'，r'（\1））
只生成新的product\u id
列。但是，assign
方法生成数据帧的新副本，新列覆盖旧列。您必须将结果分配给一个变量。如果您选择，该变量可能与旧变量相同。