CSV文件中Python中的Case/IfElse语句
我有一个csv文件(CSV文件中Python中的Case/IfElse语句,python,csv,Python,Csv,我有一个csv文件(original.csv),其中有一个唯一的ID列(uid)和我要计算的列,然后用未修改的uid创建一个新文件(result.csv),并根据计算结果创建新列 我的原始文件如下所示: uid,var01,var02,var03,var04,var05 1,2,3,2,3,1 2,2,2,2,2,1 3,,2,2,1,1 4,2,2,2,1,1 5,1,2,2,1,2 6,3,,2,3,2 7,3,,1,1,1 8,2,3,1,,3 9,3,1,,3, 10,,3,2,3,3
original.csv
),其中有一个唯一的ID列(uid
)和我要计算的列,然后用未修改的uid
创建一个新文件(result.csv
),并根据计算结果创建新列
我的原始文件如下所示:
uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
我想做一个与此逻辑相同的计算(用SQL编写):case当var01=1时,则1 else 0结束为var01\u new;case当var02=1时,则1 else 0结束为var02\u new,…
结果如下所示:
uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
考虑到实际文件的大小(约2000万行,50多列),我希望将解决方案保留在basePython
中,而不是像Pandas
和Numpy
这样内存有限的包中。我试过了,但我无法让它在我的用例中工作
我试过这个代码,但没用
>>> import csv
>>>
>>> sourcepath = "/Users/me/python_case_statement.csv"
>>> destpath = "/Users/me/python_case_statement_flat.csv"
>>>
>>> with open(sourcepath, "rb") as source, open(destpath, "wb") as dest:
... reader = csv.reader(source, delimiter = ',', quotechar='"')
... writer = csv.writer(dest, delimiter = ',', quotechar='"')
... headers = reader.next()
... writer.writerow(headers)
... for rownum, row in enumerate(reader):
... 'uid' = 'uid'
... if 'var01' == 1:
... 'var01_new' == 1
... else:
... 'var01_new' == 0
... row.append(result)
... writer.writerow(row)
...
File "<stdin>", line 7
SyntaxError: can't assign to literal
>>>
导入csv
>>>
>>>sourcepath=“/Users/me/python\u case\u statement.csv”
>>>destpath=“/Users/me/python\u case\u statement\u flat.csv”
>>>
>>>以open(sourcepath,“rb”)作为源,open(destpath,“wb”)作为dest:
... reader=csv.reader(源,分隔符=',',引号=')
…writer=csv.writer(dest,分隔符=',',引号='')
... headers=reader.next()
... writer.writerow(标题)
... 对于rownum,枚举(读取器)中的行:
... 'uid'='uid'
... 如果“var01”==1:
... 'var01_new'==1
... 其他:
... 'var01_new'==0
... 行。追加(结果)
... writer.writerow(行)
...
文件“”,第7行
SyntaxError:无法分配给文字
>>>
因此Python不像SQL那样是一种纯粹的声明性语言,它是过程性的,因此您必须描述控制流,尽管它有许多声明性结构。所以
>>> s = """uid,var01,var02,var03,var04,var05
... 1,2,3,2,3,1
... 2,2,2,2,2,1
... 3,,2,2,1,1
... 4,2,2,2,1,1
... 5,1,2,2,1,2
... 6,3,,2,3,2
... 7,3,,1,1,1
... 8,2,3,1,,3
... 9,3,1,,3,
... 10,,3,2,3,3"""
>>> reader = csv.reader(io.StringIO(s))
>>> result = io.StringIO()
>>> writer = csv.writer(result)
上面让我们假设我们正在使用流(io.StringIO
)处理文件。但是,您可以使用with语句以您已经完成的方式来执行此操作。现在,问题的关键是:
>>> header = next(reader)
>>> writer.writerow(["{}_new".format(v) for v in header])
59
>>> for row in reader:
... new_row = [row[0]] # uid the same
... new_row.extend(1 if c == '1' else 0 for c in row[1:])
... writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
14
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
>>>
我使用了理解结构和条件表达式,它们允许更好、更具声明性的方式来转换数据。但是没有它们,您也可以做同样的事情,使用if-else
语句并建立行:
>>> result = io.StringIO()
>>> reader = csv.reader(io.StringIO(s))
>>> writer = csv.writer(result)
>>> header = next(reader)
>>> new_header = []
>>> for s in header:
... new_header.append("{}_new".format(s))
...
>>> writer.writerow(new_header)
59
>>> for row in reader:
... new_row = []
... for c in row:
... if c == '1':
... new_row.append(1)
... else:
... new_row.append(0)
... writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
13
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
0,0,0,0,0,1
0,0,0,0,1,1
0,0,0,0,1,1
0,1,0,0,1,0
0,0,0,0,0,0
0,0,0,1,1,1
0,0,0,1,0,0
0,0,1,0,0,0
0,0,0,0,0,0
在您的代码中,您试图分配
'uid'='uid'
和'var01\u new'==0
,这是不正确的,您的代码将引发异常语法错误:无法分配到文字
否则,您也可以询问您的问题,而无需使用csv
模块,如以下示例:
我假设您的输入文件名为id\u input.csv
,您的输出文件名为new.csv
:
data = ([k.strip(',')] for k in open("id_input.csv", 'r'))
condition = True
with open("new.csv", 'a') as f:
for k in data:
if condition:
f.write("uid,var01_new,var02_new,var03_new,var04_new,var05_new\n")
condition = False
else:
dd = k[0].split(",")
f.write(dd[0] + ',' + ",".join(j if j == '1' else '0' for j in dd[1:]) + '\n')
因此,在上述代码中,使用此输入:
uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
输出文件new.csv
将包含以下数据:
uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,0
2,0,0,0,0,0
3,0,0,0,1,0
4,0,0,0,1,0
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,0
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
那你的问题是什么?修改SO问题时发生了什么?您发现了哪些错误?什么是“将解决方案保留在基本python中而不是内存有限的包中”?无论如何,这似乎是一件相对简单的事情,特别是考虑到你所链接的问题。你到底试过什么?好吧,你的语法错误似乎很容易解释。。。也许你应该从一个基本的Python教程开始?很抱歉@juanpa.arrivillaga和Alfabravo没有发布我的尝试。你想分配
'uid'='uid'
?同样适用于'var01\u new'==1
?感谢Chiheb,这将适用于类似的用例。感谢并欢迎:-)