Python 熊猫-将多行合并为一行
为此:Python 熊猫-将多行合并为一行,python,pandas,Python,Pandas,为此: dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3) 我以以下方式打印数据集: lyrics,classification 0 "I should have known better with a girl like you 1 That I would love everything that y
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3)
我以以下方式打印数据集:
lyrics,classification
0 "I should have known better with a girl like you
1 That I would love everything that you do
2 And I do, hey hey hey, and I do
3 Whoa, whoa, I
4 Never realized what I kiss could be
5 This could only happen to me
6 Can't you see, can't you see
7 That when I tell you that I love you, oh
8 You're gonna say you love me too, hoo, hoo, ho...
9 And when I ask you to be mine
10 You're gonna say you love me too
11 So, oh I never realized what I kiss could be
12 Whoa whoa I never realized what I kiss could be
13 You love me too
14 You love me too",0
但我真正需要的是每行都有“
。我如何在为OP工作的熊猫
?解决方案中进行此转换(来自评论):
从根源上解决问题(在读取\u csv
):
@nbeuchat可能是对的,试试看
dataset=pd.read\u csv('lyris.csv',quoting=2)
这将为您提供一个一行两列的数据帧:歌词(字符串中嵌入行返回)和分类(0)
折叠字符串系列的一般解决方案:
您要使用:
默认的sep
为None
,这将为您提供“祝您生日快乐祝您生日快乐…”
,因此选择适合您的sep
值。上面我用斜杠(用空格填充),因为这是你通常在歌曲和诗歌的引文中看到的
您也可以尝试打印(dataset['Lyms'].str.cat(sep='\n')),它维护换行符,但将所有换行符存储在一个字符串中,而不是每行一个字符串中。这是整个数据集,还是同一csv中有多首歌曲?你想用斜杠来表示换行符(你也爱我/你也爱我)还是只用空格(你也爱我你也爱我)你的分隔符似乎是逗号而不是制表符,不是吗?@n也许是对的,只要试试
dataset=pd.read_csv('lyris.csv',quoting=2)
。这将为您提供一个一行两列的数据帧:歌词(字符串中嵌入行返回)和分类(0)。但如果没有数据集,很难判断。这是可行的。非常感谢。
import pandas as pd
dataset = pd.DataFrame({'lyrics':pd.Series(['happy birthday to you',
'happy birthday to you',
'happy birthday dear outkast',
'happy birthday to you'])})
dataset['lyrics'].str.cat(sep=' / ')
# 'happy birthday to you / happy birthday to you / happy birthday dear outkast / happy birthday to you'