Python/Pandas-将多个与字符串名称匹配的列组合为单个字符串列_Python_Pandas

Python/Pandas-将多个与字符串名称匹配的列组合为单个字符串列

python pandas

Python/Pandas-将多个与字符串名称匹配的列组合为单个字符串列,python,pandas,Python,Pandas,我的数据集来自jira，标签数据被分为多个列，行中每个标签对应一个列。根据给定条目上使用了多少标签标签，行数可以从1到5不等 csv可能看起来像： Issue Type Issue key Labels Labels Labels Labels Labels Story 123 #label1, #label2, #label6, #label7, #label9, Story 124

我的数据集来自jira，标签数据被分为多个列，行中每个标签对应一个列。根据给定条目上使用了多少标签标签，行数可以从1到5不等

csv可能看起来像：

Issue Type  Issue key   Labels    Labels    Labels    Labels    Labels
Story       123         #label1,  #label2,  #label6,  #label7,  #label9,
Story       124                 
Story       125         #label3,  #label1,          
Bug         126                 
Story       127         #label5,

为行中的每个标记获取一个新标签列时，列的数量可能会有所不同。似乎没有办法更正导出以将值作为单个字符串括起来

我需要做的是将它们连接到一个列“Tags”中，我不在乎清理后面的逗号

我试过了

df['Tags'] = [col for col in df.columns if 'Label' in col]

但这会抛出一个错误“值的长度与索引的长度不匹配”

在将CSV读入数据帧时，有没有一种简单的方法可以实现这一点？

您可以使用

agg

功能：

colums = [col for col in df.columns if col.startswith('value')]
df[columns].agg(lambda x : '-'.join(x.astype(str)), axis=1)

我认为你真正想做的是：

tags = [col for col in df.columns if 'Label' in col]
nontags = [col for col in df.columns if 'Label' non in col]
tagdf = df[tags]
tagcol = tagdf.apply(" ".join, axis=1)
newdf = pd.concat([df[nontags], tagcol], axis = 1)

很好，我从未在axis=1:）上聚合过。这会引发错误：TypeError:（'sequence item 1:expected str instance，float found'，'Occessed at index 0'）@J.P。我编辑了非对象类型列的答案。错误：未定义名称“join”-.join是无效语法。@J.P。这是一个输入错误，您应该能够在大约20秒内修复。请分享您的预期输出。数据帧格式