Python 如何将CSV文件中具有相同键的后续行分组_Python_String_Csv

Python 如何将CSV文件中具有相同键的后续行分组

python string csv

Python 如何将CSV文件中具有相同键的后续行分组,python,string,csv,Python,String,Csv,如果col1等于前一行中的同一个值，我将尝试解析col3，然后将输出写入一个新文件。我有一个CSV文件，如下所示： col1,col2,col3 a,12,"hello " a,13,"good day" a,14,"nice weather" b,1,"cat" b,2,"dog and cat" c,2,"animals are cute" 我想要的输出： col1,col3 a,"hello good day nice weather" b,"cat dog and cat" c,"an

如果col1等于前一行中的同一个值，我将尝试解析col3，然后将输出写入一个新文件。我有一个CSV文件，如下所示：

col1,col2,col3
a,12,"hello "
a,13,"good day"
a,14,"nice weather"
b,1,"cat"
b,2,"dog and cat"
c,2,"animals are cute"

我想要的输出：

col1,col3
a,"hello good day nice weather"
b,"cat dog and cat"
c,"animals are cute"

这就是我尝试过的：

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    for row in reader:
        while row[0]==row[0]:
            concat_text=" ".join(row[2])
        print concat_text
        writer.writerow((row[0],concat_text))

它运行，但我没有输出。感谢您的帮助。

如果您对使用数据帧感兴趣，可以对数据帧进行分组，然后输出唯一值：

import pandas as pd

df = pd.read_csv('test.txt')
print(df)

原始数据帧

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

第二个数据帧

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

将导致：

  col1                              col3
0    a  [hello , good day, nice weather]
1    b                [cat, dog and cat]
2    c                [animals are cute]

要连接第三列，需要使用

apply

：

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

  col1                          col3
0    a   hello good day nice weather
1    b               cat dog and cat
2    c              animals are cute

完整代码：

import pandas as pd

df = pd.read_csv('test.txt')
df2 = df.groupby(df['col1'])

df2 = df2['col3'].unique()
df2 = df2.reset_index()

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

df2.to_csv('output.csv')

如果您对使用感兴趣，可以对数据帧进行分组，然后输出唯一值：

import pandas as pd

df = pd.read_csv('test.txt')
print(df)

原始数据帧

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

第二个数据帧

  col1  col2              col3
0    a    12            hello 
1    a    13          good day
2    a    14      nice weather
3    b     1               cat
4    b     2       dog and cat
5    c     2  animals are cute

df2 = df.groupby(df['col1'])
df2 = df2['col3'].unique()
df2 = df2.reset_index()

print(df2)

将导致：

  col1                              col3
0    a  [hello , good day, nice weather]
1    b                [cat, dog and cat]
2    c                [animals are cute]

要连接第三列，需要使用

apply

：

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

  col1                          col3
0    a   hello good day nice weather
1    b               cat dog and cat
2    c              animals are cute

完整代码：

import pandas as pd

df = pd.read_csv('test.txt')
df2 = df.groupby(df['col1'])

df2 = df2['col3'].unique()
df2 = df2.reset_index()

df2['col3'] = df2['col3'].apply(lambda x: ' '.join(s.strip() for s in x))

df2.to_csv('output.csv')

问题是，您正在将同一行与其自身进行比较。此版本将最后一行与当前行进行比较。输出不是以引号分隔的，但它是正确的。script.py的内容

#!/usr/bin/env python

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    lastRow = None
    # assumes data is in order on first column
    for row in reader:
        if not lastRow:
            # start processing line with the first column and third column
            concat_text = row[2].strip()
            lastRow = row
            print concat_text
        else:
            if lastRow[0]==row[0]:
                # add to line
                concat_text = concat_text + ' ' + row[2].strip()
                print concat_text
            else:
                # end processing
                print concat_text
                writer.writerow((lastRow[0],concat_text))
                # start processing
                concat_text = row[2]
                print concat_text
            lastRow = row
    # write out last element
    print concat_text
    writer.writerow((lastRow[0],concat_text))

运行后./script.py输出文件.csv的内容

a,hello good day nice weather
b,cat dog and cat
c,animals are cute

问题是，您正在将同一行与其自身进行比较。此版本将最后一行与当前行进行比较。输出不是以引号分隔的，但它是正确的。script.py的内容

#!/usr/bin/env python

import csv

with open('myfile.csv', 'rb') as inputfile, open('outputfile.csv','wb') as outputfile:
    reader=csv.reader(inputfile)
    writer=csv.writer(outputfile)
    next(reader)
    lastRow = None
    # assumes data is in order on first column
    for row in reader:
        if not lastRow:
            # start processing line with the first column and third column
            concat_text = row[2].strip()
            lastRow = row
            print concat_text
        else:
            if lastRow[0]==row[0]:
                # add to line
                concat_text = concat_text + ' ' + row[2].strip()
                print concat_text
            else:
                # end processing
                print concat_text
                writer.writerow((lastRow[0],concat_text))
                # start processing
                concat_text = row[2]
                print concat_text
            lastRow = row
    # write out last element
    print concat_text
    writer.writerow((lastRow[0],concat_text))

运行后./script.py输出文件.csv的内容

a,hello good day nice weather
b,cat dog and cat
c,animals are cute

行[0]==行[0]：…

永远不会前进，它是一个无限循环。

行[0]==行[0]：…

永远不会前进，它是一个无限循环。这是因为

hello

在原始数据中后面有一个空格。@Leb记得将

df2.添加到_csv（'somefile.csv'）

@Ilja。谢谢。谢谢，我认为熊猫是另一种很好的方式。不客气。这个答案只是作为你和任何可能的未来观众的一个选择。如果您无法使用

pandas

，那么这里的其他答案将非常重要。这是因为

hello

在原始数据中有一个空格。@Leb请记住将

df2.to_csv（'somefile.csv'）

@Ilja添加进去。谢谢。谢谢，我认为熊猫是另一种很好的方式。不客气。这个答案只是作为你和任何可能的未来观众的一个选择。如果您无法使用

pandas

，这里的其他答案将非常准确。