Python 如何在Pandas中将行拆分为由管道分隔的列_Python_Pandas

Python 如何在Pandas中将行拆分为由管道分隔的列

python pandas

Python 如何在Pandas中将行拆分为由管道分隔的列,python,pandas,Python,Pandas,我在熊猫中有以下数据帧 data = {'order_id': [123, 221, 234], 'cust_id': [12, 13, 15], 'order_total': [2345, 232, 1002], 'prod_name': ['Chicken wings | Mashroom | Coriander', 'Chicken wings', 'Mashroom | Fish | Garlic']} order_df = pd.Dat

我在熊猫中有以下数据帧

data = {'order_id': [123, 221, 234],
        'cust_id': [12, 13, 15],
        'order_total': [2345, 232, 1002],
        'prod_name': ['Chicken wings | Mashroom | Coriander', 'Chicken wings', 'Mashroom | Fish | Garlic']}

order_df = pd.DataFrame(data)

   order_id  cust_id  order_total                             prod_name
0       123       12         2345  Chicken wings | Mashroom | Coriander
1       221       13          232                         Chicken wings
2       234       15         1002              Mashroom | Fish | Garlic

我想要的数据帧是

 order_id    cust_id    order_total   Chicken wings   Mashroom   Coriander    Fish    Garlic
 123         12         2345          1               1          1            0       0      
 221         13         232           1               0          0            0       0
 234         15         1002          0               1          0            1       1

我可以将其拆分为不同的产品，但无法生成上述格式

 split_product_df = order_df.prod_name.str.split("|",expand=True).add_prefix('Product_')

如何在Pandas中执行此操作。

您可以使用索引上的Pandas应用。分享一个类似的简单示例，如果包含字符串的管道具有重复标签，则以下操作将不起作用

import pandas as pd

df = pd.DataFrame({
    'order_id': [123, 456],
    'cust_id': [12, 13],
    'order_total': [2345, 6789],
    'prod_name': ["Chicken wings | Mashroom | Coriander", "Mashroom | Fish | Garlic"]
})


def process(row):
    index = row.name # get the index of row
    for word in row['prod_name'].split('|'):
        # for each word separated by | remove spaces and for that index create a column add count as 1
        w = word.lstrip().rstrip()
        df.loc[index, w] = 1


df.apply(process, axis=1) # apply the process on each row
df.drop('prod_name', axis=1, inplace=True) # drop the prod_name column
df = df.fillna(0) # fill nans with zero

您可以使用索引中的熊猫应用。分享一个类似的简单示例，如果包含字符串的管道具有重复标签，则以下操作将不起作用

import pandas as pd

df = pd.DataFrame({
    'order_id': [123, 456],
    'cust_id': [12, 13],
    'order_total': [2345, 6789],
    'prod_name': ["Chicken wings | Mashroom | Coriander", "Mashroom | Fish | Garlic"]
})


def process(row):
    index = row.name # get the index of row
    for word in row['prod_name'].split('|'):
        # for each word separated by | remove spaces and for that index create a column add count as 1
        w = word.lstrip().rstrip()
        df.loc[index, w] = 1


df.apply(process, axis=1) # apply the process on each row
df.drop('prod_name', axis=1, inplace=True) # drop the prod_name column
df = df.fillna(0) # fill nans with zero

熊猫帮了忙

@Neil，|前面似乎有空格，所以试试下面这个，我们搜索空格后跟|并替换它：

pd.concat(
    (df.iloc[:, :-1], df.prod_name.str.replace("\s+(?=\|)", "").str.get_dummies()),
    axis=1,
)

熊猫帮了忙

@Neil，|前面似乎有空格，所以试试下面这个，我们搜索空格后跟|并替换它：

pd.concat(
    (df.iloc[:, :-1], df.prod_name.str.replace("\s+(?=\|)", "").str.get_dummies()),
    axis=1,
)

@尼尔，检查更新的代码。@尼尔，检查更新的代码。我只是好奇，如果管道分隔的字符串中有重复的标签，这种方法有效吗？@AnimeshMukherkjee重复标签是什么意思？你能分享一个例子吗？比如，如果它是str dummies文档中的a | b | c | a引用，那么一列的值是1还是2？你会得到两行的“a”列，因为它看起来像两行，我只是好奇，如果字符串中有重复的标签被管道隔开，这种方法有效吗？@animeshmukerkjee你重复标签是什么意思？你能分享一个例子吗？比如，如果它是str dummies文档中的a | b | c | a引用，那么列的值是1还是2？列“a”会有两行，因为它出现两次