在Python中连接数据帧上的文本
我试图将文本连接到一行,然后按ID分组 我的数据集如下所示:在Python中连接数据帧上的文本,python,pandas,Python,Pandas,我试图将文本连接到一行,然后按ID分组 我的数据集如下所示: data=pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'], 'Text1':['Apple','','','Laptop','','Pens','','Ruler',''], 'Text2': ['Bananas','Grape','Mouse','','DVD Player','
data=pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
'Text3':['Cherry','','','Headphones','','','','','Eraser'],
'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
})
data =data.set_index('ID')
ID Text1 Text2 Text3 Text4
1 Apple Bananas Cherry Mango
1 Grape Strawberries
2 Mouse
2 Laptop Headphones
2 DVD-Player Cell-phone
3 Pens
3 Pencils Sticky Notes
3 Ruler
3 Eraser
我想要的操作:
关于如何实现这个输出有什么想法吗?我建议使用
DataFrame.groupby
、DataFrame.apply
和str.join
的组合。根据您提供的内容,您可以使用以下内容。以下只是一个例子
import pandas as pd
import re
data = pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
'Text3':['Cherry','','','Headphones','','','','','Eraser'],
'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
})
cols = [x for x in data.columns if re.search("^Text", x)] # list of all columns
# that start with "Text"
# function to be applied that takes a row and a list of columns
# to concatenate
def concat_text(row, cols):
# The real work is done here
return ";".join([";".join([str(x) for x in y if x]) for y in row[cols].values])
result = data.groupby("ID").apply(concat_text, cols) # groupby and apply
这将留给你
ID
1 Apple;Bananas;Cherry;Mango;Grape;Strawberries
2 Mouse;Laptop;Headphones;DVD Player;Cell phone
3 Pens;Pencils;Sticky Notes;Ruler;Eraser
dtype: object
我建议使用
DataFrame.groupby
、DataFrame.apply
和str.join
的组合。根据您提供的内容,您可以使用以下内容。以下只是一个例子
import pandas as pd
import re
data = pd.DataFrame(data={'ID':['1','1','2','2','2','3','3','3','3'],
'Text1':['Apple','','','Laptop','','Pens','','Ruler',''],
'Text2': ['Bananas','Grape','Mouse','','DVD Player','','Pencils','',''],
'Text3':['Cherry','','','Headphones','','','','','Eraser'],
'Text4':['Mango','Strawberries','','','Cell phone','','Sticky Notes','','']
})
cols = [x for x in data.columns if re.search("^Text", x)] # list of all columns
# that start with "Text"
# function to be applied that takes a row and a list of columns
# to concatenate
def concat_text(row, cols):
# The real work is done here
return ";".join([";".join([str(x) for x in y if x]) for y in row[cols].values])
result = data.groupby("ID").apply(concat_text, cols) # groupby and apply
这将留给你
ID
1 Apple;Bananas;Cherry;Mango;Grape;Strawberries
2 Mouse;Laptop;Headphones;DVD Player;Cell phone
3 Pens;Pencils;Sticky Notes;Ruler;Eraser
dtype: object
这很简单,但传统的做法是给出人们可以复制和粘贴的输入和输出示例。嵌入图像意味着任何想要复制您输入的人都必须手动输入。很抱歉,我的第一个答案遗漏了您问题的关键部分。我建议查看
groupby
、apply
、和join
。这很简单,但通常给出人们可以复制和粘贴的输入和输出示例。嵌入图像意味着任何想要复制您输入的人都必须手动输入。很抱歉,我的第一个答案遗漏了您问题的关键部分。我建议调查groupby
、apply
和join
。