在Python中连接数据帧上的文本_Python_Pandas

在Python中连接数据帧上的文本

python pandas

在Python中连接数据帧上的文本,python,pandas,Python,Pandas,我试图将文本连接到一行，然后按ID分组我的数据集如下所示： data=pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'], 'Text1':['Apple','','','Laptop','','Pens','','Ruler',''], 'Text2': ['Bananas','Grape','Mouse','','DVD Player','

我试图将文本连接到一行，然后按ID分组

我的数据集如下所示：

data=pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
                    'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
                    'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
                    'Text3':['Cherry','','','Headphones','','','','','Eraser'],
                    'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
                   })

data =data.set_index('ID')

ID  Text1   Text2      Text3      Text4
1   Apple   Bananas    Cherry     Mango
1           Grape                 Strawberries
2           Mouse       
2   Laptop             Headphones   
2           DVD-Player            Cell-phone
3   Pens            
3           Pencils               Sticky Notes
3   Ruler           
3           Eraser

我想要的操作：

先按行连接

按每个ID分组以获取由分隔符分隔的一组单词

关于如何实现这个输出有什么想法吗？

我建议使用

DataFrame.groupby

、

DataFrame.apply

和

str.join

的组合。根据您提供的内容，您可以使用以下内容。以下只是一个例子

import pandas as pd
import re

data = pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
                    'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
                    'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
                    'Text3':['Cherry','','','Headphones','','','','','Eraser'],
                    'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
                    })


cols = [x for x in data.columns if re.search("^Text", x)] # list of all columns
                                                          # that start with "Text"

# function to be applied that takes a row and a list of columns 
# to concatenate
def concat_text(row, cols):
    # The real work is done here
    return ";".join([";".join([str(x) for x in y if x]) for y in row[cols].values])

result = data.groupby("ID").apply(concat_text, cols) # groupby and apply

这将留给你

ID
1    Apple;Bananas;Cherry;Mango;Grape;Strawberries
2    Mouse;Laptop;Headphones;DVD Player;Cell phone
3           Pens;Pencils;Sticky Notes;Ruler;Eraser
dtype: object

我建议使用

DataFrame.groupby

、

DataFrame.apply

和

str.join

的组合。根据您提供的内容，您可以使用以下内容。以下只是一个例子

import pandas as pd
import re

data = pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
                    'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
                    'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
                    'Text3':['Cherry','','','Headphones','','','','','Eraser'],
                    'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
                    })


cols = [x for x in data.columns if re.search("^Text", x)] # list of all columns
                                                          # that start with "Text"

# function to be applied that takes a row and a list of columns 
# to concatenate
def concat_text(row, cols):
    # The real work is done here
    return ";".join([";".join([str(x) for x in y if x]) for y in row[cols].values])

result = data.groupby("ID").apply(concat_text, cols) # groupby and apply

这将留给你

ID
1    Apple;Bananas;Cherry;Mango;Grape;Strawberries
2    Mouse;Laptop;Headphones;DVD Player;Cell phone
3           Pens;Pencils;Sticky Notes;Ruler;Eraser
dtype: object

这很简单，但传统的做法是给出人们可以复制和粘贴的输入和输出示例。嵌入图像意味着任何想要复制您输入的人都必须手动输入。很抱歉，我的第一个答案遗漏了您问题的关键部分。我建议查看

groupby

、

apply

、和

join

。这很简单，但通常给出人们可以复制和粘贴的输入和输出示例。嵌入图像意味着任何想要复制您输入的人都必须手动输入。很抱歉，我的第一个答案遗漏了您问题的关键部分。我建议调查

groupby

、

apply

和

join

。