CSV文件中Python中的Case/IfElse语句

CSV文件中Python中的Case/IfElse语句,python,csv,Python,Csv,我有一个csv文件(original.csv),其中有一个唯一的ID列(uid)和我要计算的列,然后用未修改的uid创建一个新文件(result.csv),并根据计算结果创建新列 我的原始文件如下所示: uid,var01,var02,var03,var04,var05 1,2,3,2,3,1 2,2,2,2,2,1 3,,2,2,1,1 4,2,2,2,1,1 5,1,2,2,1,2 6,3,,2,3,2 7,3,,1,1,1 8,2,3,1,,3 9,3,1,,3, 10,,3,2,3,3

我有一个csv文件(
original.csv
),其中有一个唯一的ID列(
uid
)和我要计算的列,然后用未修改的
uid
创建一个新文件(
result.csv
),并根据计算结果创建新列

我的原始文件如下所示:

uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
我想做一个与此逻辑相同的计算(用SQL编写):
case当var01=1时,则1 else 0结束为var01\u new;case当var02=1时,则1 else 0结束为var02\u new,…

结果如下所示:

uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0
考虑到实际文件的大小(约2000万行,50多列),我希望将解决方案保留在base
Python
中,而不是像
Pandas
Numpy
这样内存有限的包中。我试过了,但我无法让它在我的用例中工作

我试过这个代码,但没用

>>> import csv
>>> 
>>> sourcepath = "/Users/me/python_case_statement.csv"
>>> destpath =  "/Users/me/python_case_statement_flat.csv"
>>> 
>>> with open(sourcepath, "rb") as source, open(destpath, "wb") as dest:
...     reader = csv.reader(source, delimiter = ',', quotechar='"')
...     writer = csv.writer(dest,  delimiter = ',', quotechar='"')
...     headers = reader.next()
...     writer.writerow(headers)
...     for rownum, row in enumerate(reader):
...         'uid' = 'uid'
...         if 'var01' == 1:
...             'var01_new' == 1
...         else:
...             'var01_new' == 0
...         row.append(result)
...         writer.writerow(row)
... 
  File "<stdin>", line 7
SyntaxError: can't assign to literal
>>> 
导入csv >>> >>>sourcepath=“/Users/me/python\u case\u statement.csv” >>>destpath=“/Users/me/python\u case\u statement\u flat.csv” >>> >>>以open(sourcepath,“rb”)作为源,open(destpath,“wb”)作为dest: ... reader=csv.reader(源,分隔符=',',引号=') …writer=csv.writer(dest,分隔符=',',引号='') ... headers=reader.next() ... writer.writerow(标题) ... 对于rownum,枚举(读取器)中的行: ... 'uid'='uid' ... 如果“var01”==1: ... 'var01_new'==1 ... 其他: ... 'var01_new'==0 ... 行。追加(结果) ... writer.writerow(行) ... 文件“”,第7行 SyntaxError:无法分配给文字 >>>
因此Python不像SQL那样是一种纯粹的声明性语言,它是过程性的,因此您必须描述控制流,尽管它有许多声明性结构。所以

>>> s = """uid,var01,var02,var03,var04,var05
... 1,2,3,2,3,1
... 2,2,2,2,2,1
... 3,,2,2,1,1
... 4,2,2,2,1,1
... 5,1,2,2,1,2
... 6,3,,2,3,2
... 7,3,,1,1,1
... 8,2,3,1,,3
... 9,3,1,,3,
... 10,,3,2,3,3"""
>>> reader = csv.reader(io.StringIO(s))
>>> result = io.StringIO()
>>> writer = csv.writer(result)
上面让我们假设我们正在使用流(
io.StringIO
)处理文件。但是,您可以使用with语句以您已经完成的方式来执行此操作。现在,问题的关键是:

>>> header = next(reader)
>>> writer.writerow(["{}_new".format(v) for v in header])
59
>>> for row in reader:
...     new_row = [row[0]] # uid the same
...     new_row.extend(1 if c == '1' else 0 for c in row[1:])
...     writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
14
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0

>>>
我使用了理解结构和条件表达式,它们允许更好、更具声明性的方式来转换数据。但是没有它们,您也可以做同样的事情,使用
if-else
语句并建立行:

>>> result = io.StringIO()
>>> reader = csv.reader(io.StringIO(s))
>>> writer = csv.writer(result)
>>> header = next(reader)
>>> new_header = []
>>> for s in header:
...     new_header.append("{}_new".format(s))
...
>>> writer.writerow(new_header)
59
>>> for row in reader:
...     new_row = []
...     for c in row:
...         if c == '1':
...             new_row.append(1)
...         else:
...             new_row.append(0)
...     writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
13
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
0,0,0,0,0,1
0,0,0,0,1,1
0,0,0,0,1,1
0,1,0,0,1,0
0,0,0,0,0,0
0,0,0,1,1,1
0,0,0,1,0,0
0,0,1,0,0,0
0,0,0,0,0,0

在您的代码中,您试图分配
'uid'='uid'
'var01\u new'==0
,这是不正确的,您的代码将引发异常
语法错误:无法分配到文字

否则,您也可以询问您的问题,而无需使用
csv
模块,如以下示例:

我假设您的输入文件名为
id\u input.csv
,您的输出文件名为
new.csv

data = ([k.strip(',')] for k in open("id_input.csv", 'r'))

condition = True

with open("new.csv", 'a') as f:
    for k in data:
        if condition:
            f.write("uid,var01_new,var02_new,var03_new,var04_new,var05_new\n")
            condition = False
        else:
            dd = k[0].split(",")
            f.write(dd[0] + ',' + ",".join(j if j == '1' else '0'  for j in dd[1:]) + '\n')
因此,在上述代码中,使用此输入:

uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
输出文件
new.csv
将包含以下数据:

uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,0
2,0,0,0,0,0
3,0,0,0,1,0
4,0,0,0,1,0
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,0
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0

那你的问题是什么?修改SO问题时发生了什么?您发现了哪些错误?什么是“将解决方案保留在基本python中而不是内存有限的包中”?无论如何,这似乎是一件相对简单的事情,特别是考虑到你所链接的问题。你到底试过什么?好吧,你的语法错误似乎很容易解释。。。也许你应该从一个基本的Python教程开始?很抱歉@juanpa.arrivillaga和Alfabravo没有发布我的尝试。你想分配
'uid'='uid'
?同样适用于
'var01\u new'==1
?感谢Chiheb,这将适用于类似的用例。感谢并欢迎:-)