Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在Pandas中将行拆分为由管道分隔的列_Python_Pandas - Fatal编程技术网

Python 如何在Pandas中将行拆分为由管道分隔的列

Python 如何在Pandas中将行拆分为由管道分隔的列,python,pandas,Python,Pandas,我在熊猫中有以下数据帧 data = {'order_id': [123, 221, 234], 'cust_id': [12, 13, 15], 'order_total': [2345, 232, 1002], 'prod_name': ['Chicken wings | Mashroom | Coriander', 'Chicken wings', 'Mashroom | Fish | Garlic']} order_df = pd.Dat

我在熊猫中有以下数据帧

data = {'order_id': [123, 221, 234],
        'cust_id': [12, 13, 15],
        'order_total': [2345, 232, 1002],
        'prod_name': ['Chicken wings | Mashroom | Coriander', 'Chicken wings', 'Mashroom | Fish | Garlic']}

order_df = pd.DataFrame(data)

   order_id  cust_id  order_total                             prod_name
0       123       12         2345  Chicken wings | Mashroom | Coriander
1       221       13          232                         Chicken wings
2       234       15         1002              Mashroom | Fish | Garlic
我想要的数据帧是

 order_id    cust_id    order_total   Chicken wings   Mashroom   Coriander    Fish    Garlic
 123         12         2345          1               1          1            0       0      
 221         13         232           1               0          0            0       0
 234         15         1002          0               1          0            1       1
我可以将其拆分为不同的产品,但无法生成上述格式

 split_product_df = order_df.prod_name.str.split("|",expand=True).add_prefix('Product_')

如何在Pandas中执行此操作。

您可以使用索引上的Pandas应用。 分享一个类似的简单示例,如果包含字符串的管道具有重复标签,则以下操作将不起作用

import pandas as pd

df = pd.DataFrame({
    'order_id': [123, 456],
    'cust_id': [12, 13],
    'order_total': [2345, 6789],
    'prod_name': ["Chicken wings | Mashroom | Coriander", "Mashroom | Fish | Garlic"]
})


def process(row):
    index = row.name # get the index of row
    for word in row['prod_name'].split('|'):
        # for each word separated by | remove spaces and for that index create a column add count as 1
        w = word.lstrip().rstrip()
        df.loc[index, w] = 1


df.apply(process, axis=1) # apply the process on each row
df.drop('prod_name', axis=1, inplace=True) # drop the prod_name column
df = df.fillna(0) # fill nans with zero


您可以使用索引中的熊猫应用。 分享一个类似的简单示例,如果包含字符串的管道具有重复标签,则以下操作将不起作用

import pandas as pd

df = pd.DataFrame({
    'order_id': [123, 456],
    'cust_id': [12, 13],
    'order_total': [2345, 6789],
    'prod_name': ["Chicken wings | Mashroom | Coriander", "Mashroom | Fish | Garlic"]
})


def process(row):
    index = row.name # get the index of row
    for word in row['prod_name'].split('|'):
        # for each word separated by | remove spaces and for that index create a column add count as 1
        w = word.lstrip().rstrip()
        df.loc[index, w] = 1


df.apply(process, axis=1) # apply the process on each row
df.drop('prod_name', axis=1, inplace=True) # drop the prod_name column
df = df.fillna(0) # fill nans with zero

熊猫帮了忙

@Neil,|前面似乎有空格,所以试试下面这个,我们搜索空格后跟|并替换它:

pd.concat(
    (df.iloc[:, :-1], df.prod_name.str.replace("\s+(?=\|)", "").str.get_dummies()),
    axis=1,
)
熊猫帮了忙

@Neil,|前面似乎有空格,所以试试下面这个,我们搜索空格后跟|并替换它:

pd.concat(
    (df.iloc[:, :-1], df.prod_name.str.replace("\s+(?=\|)", "").str.get_dummies()),
    axis=1,
)

@尼尔,检查更新的代码。@尼尔,检查更新的代码。我只是好奇,如果管道分隔的字符串中有重复的标签,这种方法有效吗?@AnimeshMukherkjee重复标签是什么意思?你能分享一个例子吗?比如,如果它是str dummies文档中的a | b | c | a引用,那么一列的值是1还是2?你会得到两行的“a”列,因为它看起来像两行,我只是好奇,如果字符串中有重复的标签被管道隔开,这种方法有效吗?@animeshmukerkjee你重复标签是什么意思?你能分享一个例子吗?比如,如果它是str dummies文档中的a | b | c | a引用,那么列的值是1还是2?列“a”会有两行,因为它出现两次