Python 如何删除csv文件中的恼人数据
我想删除csv文件中的一些字符串(“Description”“这是一个模拟”),还想删除数据中的一些“=”和数据末尾的“,”。该文件如下所示Python 如何删除csv文件中的恼人数据,python,Python,我想删除csv文件中的一些字符串(“Description”“这是一个模拟”),还想删除数据中的一些“=”和数据末尾的“,”。该文件如下所示 "time","student","items" ="09:00:00","Tim","apple", ="09:00:10","Jason","orange", "09:10:10","Emily","grape", "09:22:10","Ivy","kiwi", "Description" "This is a simulation"
"time","student","items"
="09:00:00","Tim","apple",
="09:00:10","Jason","orange",
"09:10:10","Emily","grape",
"09:22:10","Ivy","kiwi",
"Description"
"This is a simulation"
我已经试过了。它不起作用
ff= []
import csv
with open('file.csv') as f:
for row in csv.DictReader(f):
row.replace(',','')
ff.append(row)
我想要这样:
"time","student","items"
"09:00:00","Tim","apple"
"09:00:10","Jason","orange"
"09:10:10","Emily","grape"
"09:22:10","Ivy","kiwi"
您可能希望将文件作为原始文本文件而不是csv来读取,以便更容易使用它执行字符串操作 编辑:我假设
tmp
是CSV文件的路径,
是由CSV.DictReader
生成的字典列表。然后,您可以通过执行两个主要步骤来编写convert(tmp)
。一种是重新格式化文件并将其转换为临时文件,另一种是使用csv.DictReader
将临时文件读入字典数据列表。读取完数据后,将使用os
模块删除临时文件:
import csv
import os
def convert(tmp):
new_lines = []
temp_file = tmp + '.tmp'
with open(tmp) as fd:
for line in fd:
# remove new line characters
line = line.replace('\n', '').replace('\r', '')
# delete string
line = line.replace('=', '').replace('"Description"', '').replace('"This is a simulation"', '')
# don't add empty string
if line.strip() == '':
continue
# remove last line commas
if line[-1] == ',':
line = line[:-1]
new_lines.append(line)
# write formatted data to temporary csv file
with open(temp_file, 'w') as fd:
fd.write('\n'.join(new_lines))
# get list data
ff = None
with open(temp_file) as f:
ff = list(csv.DictReader(f))
# delete temporary file
os.remove(temp_file)
return ff
print convert('./file.csv')
主要利用内置的
str
方法,假设第一行始终是有效的头行
ff = []
with open('file.csv') as f:
for row in f:
# strip empty lines, and head/tail = ,
line = row.strip().strip('=').strip(',')
# skip empty lines
if not line:
continue
# assume first row is always a valid header row
# split by comma to see if it matches header row
if not len(ff) or (len(line.split(',')) == len(ff[0].split(','))):
ff.append(line)
如果
=
仅出现在行之前和/或之后,您可以利用方法清除该行,然后利用str.split
方法通过逗号,
查看每一行是否产生与标题行相同数量的元素(如果不是,则删除或不包括)。不是解决方案,而是(脏的)启发性的做法是,标题行后的“good”行中有一个冒号。添加一行if':'不在行中:continue
跳过没有冒号的行。另外,虽然csv
模块功能强大,但它提供的灵活性比简单的逐行读取文件字符串要小,尤其是当您需要执行字符串操作来完成清理工作时。因此,只需对f中的行执行,然后执行其余操作。我如何使用def convert(tmp):retrun@9898这样的函数?我已经更新了答案,使用convert(tmp)
函数返回每行的字典数据列表。感谢您回答这个问题