Python 如何将CSV文件中具有相同键的后续行分组
如果col1等于前一行中的同一个值,我将尝试解析col3,然后将输出写入一个新文件。我有一个CSV文件,如下所示:Python 如何将CSV文件中具有相同键的后续行分组,python,string,csv,Python,String,Csv,如果col1等于前一行中的同一个值,我将尝试解析col3,然后将输出写入一个新文件。我有一个CSV文件,如下所示: col1,col2,col3 a,12,"hello " a,13,"good day" a,14,"nice weather" b,1,"cat" b,2,"dog and cat" c,2,"animals are cute" 我想要的输出: col1,col3 a,"hello good day nice weather" b,"cat dog and cat" c,"an
col1,col2,col3
a,12,"hello "
a,13,"good day"
a,14,"nice weather"
b,1,"cat"
b,2,"dog and cat"
c,2,"animals are cute"
我想要的输出:
col1,col3
a,"hello good day nice weather"
b,"cat dog and cat"
c,"animals are cute"
这就是我尝试过的:
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
next(reader)
for row in reader:
while row[0]==row[0]:
concat_text=" ".join(row[2])
print concat_text
writer.writerow((row[0],concat_text))
它运行,但我没有输出。感谢您的帮助。如果您对使用数据帧感兴趣,可以对数据帧进行分组,然后输出唯一值:
import pandas as pd
df = pd.read_csv('test.txt')
print(df)
原始数据帧
col1 col2 col3
0 a 12 hello
1 a 13 good day
2 a 14 nice weather
3 b 1 cat
4 b 2 dog and cat
5 c 2 animals are cute
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
print(df2)
第二个数据帧
col1 col2 col3
0 a 12 hello
1 a 13 good day
2 a 14 nice weather
3 b 1 cat
4 b 2 dog and cat
5 c 2 animals are cute
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
print(df2)
将导致:
col1 col3
0 a [hello , good day, nice weather]
1 b [cat, dog and cat]
2 c [animals are cute]
要连接第三列,需要使用apply
:
df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))
col1 col3
0 a hello good day nice weather
1 b cat dog and cat
2 c animals are cute
完整代码:
import pandas as pd
df = pd.read_csv('test.txt')
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))
df2.to_csv('output.csv')
如果您对使用感兴趣,可以对数据帧进行分组,然后输出唯一值:
import pandas as pd
df = pd.read_csv('test.txt')
print(df)
原始数据帧
col1 col2 col3
0 a 12 hello
1 a 13 good day
2 a 14 nice weather
3 b 1 cat
4 b 2 dog and cat
5 c 2 animals are cute
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
print(df2)
第二个数据帧
col1 col2 col3
0 a 12 hello
1 a 13 good day
2 a 14 nice weather
3 b 1 cat
4 b 2 dog and cat
5 c 2 animals are cute
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
print(df2)
将导致:
col1 col3
0 a [hello , good day, nice weather]
1 b [cat, dog and cat]
2 c [animals are cute]
要连接第三列,需要使用apply
:
df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))
col1 col3
0 a hello good day nice weather
1 b cat dog and cat
2 c animals are cute
完整代码:
import pandas as pd
df = pd.read_csv('test.txt')
df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()
df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))
df2.to_csv('output.csv')
问题是,您正在将同一行与其自身进行比较。此版本将最后一行与当前行进行比较。输出不是以引号分隔的,但它是正确的。script.py的内容
#!/usr/bin/env python
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
next(reader)
lastRow = None
# assumes data is in order on first column
for row in reader:
if not lastRow:
# start processing line with the first column and third column
concat_text = row[2].strip()
lastRow = row
print concat_text
else:
if lastRow[0]==row[0]:
# add to line
concat_text = concat_text + ' ' + row[2].strip()
print concat_text
else:
# end processing
print concat_text
writer.writerow((lastRow[0],concat_text))
# start processing
concat_text = row[2]
print concat_text
lastRow = row
# write out last element
print concat_text
writer.writerow((lastRow[0],concat_text))
运行后./script.py输出文件.csv的内容
a,hello good day nice weather
b,cat dog and cat
c,animals are cute
问题是,您正在将同一行与其自身进行比较。此版本将最后一行与当前行进行比较。输出不是以引号分隔的,但它是正确的。script.py的内容
#!/usr/bin/env python
import csv
with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
reader=csv.reader(inputfile)
writer=csv.writer(outputfile)
next(reader)
lastRow = None
# assumes data is in order on first column
for row in reader:
if not lastRow:
# start processing line with the first column and third column
concat_text = row[2].strip()
lastRow = row
print concat_text
else:
if lastRow[0]==row[0]:
# add to line
concat_text = concat_text + ' ' + row[2].strip()
print concat_text
else:
# end processing
print concat_text
writer.writerow((lastRow[0],concat_text))
# start processing
concat_text = row[2]
print concat_text
lastRow = row
# write out last element
print concat_text
writer.writerow((lastRow[0],concat_text))
运行后./script.py输出文件.csv的内容
a,hello good day nice weather
b,cat dog and cat
c,animals are cute
行[0]==行[0]:…
永远不会前进,它是一个无限循环。行[0]==行[0]:…
永远不会前进,它是一个无限循环。这是因为hello
在原始数据中后面有一个空格。@Leb记得将df2.添加到_csv('somefile.csv')
@Ilja。谢谢。谢谢,我认为熊猫是另一种很好的方式。不客气。这个答案只是作为你和任何可能的未来观众的一个选择。如果您无法使用pandas
,那么这里的其他答案将非常重要。这是因为hello
在原始数据中有一个空格。@Leb请记住将df2.to_csv('somefile.csv')
@Ilja添加进去。谢谢。谢谢,我认为熊猫是另一种很好的方式。不客气。这个答案只是作为你和任何可能的未来观众的一个选择。如果您无法使用pandas
,这里的其他答案将非常准确。