来自CSV的Python 2替换特定列

来自CSV的Python 2替换特定列,python,python-2.7,Python,Python 2.7,我有一些CSV文件的格式是ID,时间戳,customerID,电子邮件等。我想填写电子邮件列为空,其他列保持不变。我使用的是Python2.7,仅限于使用Pandas。有人能帮我吗? 谢谢大家的帮助 我的代码如下,但这不是效率和可靠的,如果一些原始的有奇怪的字符,它将被打破的逻辑 new_columns = [ '\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Packa

我有一些CSV文件的格式是ID,时间戳,customerID,电子邮件等。我想填写电子邮件列为空,其他列保持不变。我使用的是Python2.7,仅限于使用Pandas。有人能帮我吗? 谢谢大家的帮助

我的代码如下,但这不是效率和可靠的,如果一些原始的有奇怪的字符,它将被打破的逻辑

new_columns = [

    '\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
    'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
    'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
    'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
    'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
    'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
    'D2 Technical and quality(Q13)',
    'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
    'D3 Series and films content(Q17)',
    'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
    'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
    'R2 Service and support(Q36)',
    'Promoter open text(Q37)'

]

        with open(file_path, 'r') as infile:
            print file_path
            reader = csv.reader(infile, delimiter=";")
            first_row = next(reader)
            for row in reader:
                output_row = []
                for column_name in new_columns:
                    ind = first_row.index(column_name)
                    data = row[ind]
                    if ind == first_row.index('Email'):
                        data = ''
                    output_row.append(data)
                writer.writerow(output_row)
之前的文件格式

之后的文件格式

因此,您正在重新排序列并清除电子邮件列:

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)
        for row in reader:
            output_row = []
            for column_name in new_columns:
                ind = first_row.index(column_name)
                data = row[ind]
                if ind == first_row.index('Email'):
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)
我建议将搜索
first_row.index(column_name)
first_row.index('Email')
移出每行处理

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)

        email = first_row.index('Email')       
        indexes = []
        for column_name in new_columns:
            ind = first_row.index(column_name)
            indexes.append(ind)

        for row in reader:
            output_row = []
            for ind in indexes:
                data = row[ind]
                if ind == email:
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)
email
是输入中电子邮件列的索引<代码>索引是输入中列的索引列表,其顺序由
新列
指定


未测试。

因此您正在重新排列列并清除电子邮件列:

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)
        for row in reader:
            output_row = []
            for column_name in new_columns:
                ind = first_row.index(column_name)
                data = row[ind]
                if ind == first_row.index('Email'):
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)
我建议将搜索
first_row.index(column_name)
first_row.index('Email')
移出每行处理

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)

        email = first_row.index('Email')       
        indexes = []
        for column_name in new_columns:
            ind = first_row.index(column_name)
            indexes.append(ind)

        for row in reader:
            output_row = []
            for ind in indexes:
                data = row[ind]
                if ind == email:
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)
email
是输入中电子邮件列的索引<代码>索引是输入中列的索引列表,其顺序由
新列
指定


未测试。

您可以使用dict版本的csv读写器按名称获取列。大概是这样的:

import csv
with open('./test.csv', 'r') as infile:
   reader = csv.DictReader(infile, delimiter=";")
   with open('./output.csv', 'w') as outfile:
       writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
       writer.writeheader()
       for row in reader:
           row['Email'] = ''
           writer.writerow(row)

您可以使用dict版本的csv读写器按名称获取列。大概是这样的:

import csv
with open('./test.csv', 'r') as infile:
   reader = csv.DictReader(infile, delimiter=";")
   with open('./output.csv', 'w') as outfile:
       writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
       writer.writeheader()
       for row in reader:
           row['Email'] = ''
           writer.writerow(row)