如何在python中迭代两列？_Python_Csv_Pandas

如何在python中迭代两列？

python csv pandas

如何在python中迭代两列？,python,csv,pandas,Python,Csv,Pandas,我试图使用python？迭代csv文件中的两列，我听说您必须为此导入pandas，但我只是在编码部分遇到了困难 import csv as csv import numpy as np import pandas as pd csv_file_object = csv.reader(open('train.csv', 'rb')) # Load in the csv file header = csv_file_object.next() # Skip t

我试图使用python？迭代csv文件中的两列，我听说您必须为此导入pandas，但我只是在编码部分遇到了困难

import csv as csv
import numpy as np
import pandas as pd

csv_file_object = csv.reader(open('train.csv', 'rb'))  # Load in the csv file
header = csv_file_object.next()                   # Skip the fist line as it is a header
data=[]                                     # Create a variable to hold the data

for row in csv_file_object:                      # Skip through each row in the csv file,
    data.append(row[0:])                        # adding each row to the data variable
data = np.array(data)   



def number_of_female_in_class_3(data):
    for row in data.iterow:
        if row[2] == 'female' and row[4] == '3':
            sum += 1

问题是函数类3中女性的数量我想通过两列，我想通过第2列检查行是否包含字符串'female'，并通过第4列检查状态是否为'3'。如果这是真的，那么我想将1增加到sum
我想知道是否有人可以发布一个关于如何实现这一点的简单代码
这是我试图检索的train.csv文件

**PassengerID** | **Survived** | **Pclass** | **Name** | **Sex** | 1 | 0 | 3 | mary | Female | 2 | 1 | 2 | james | Male | 3 | 1 | 3 | Tanya | Female |

谢谢你
我想这就是你需要的：

import csv def number_of_female_in_class_3(data): # initialize sum variable sum = 0 for row in data: if row[4] == 'Female' and row[2] == '3': # match sum += 1 # return the result return sum # Load in the csv file csv_file_object = csv.reader(open('train.csv', 'rb'), delimiter='|') # skip the header header = csv_file_object.next() data = [] for row in csv_file_object: # add each row of data to the data list, stripping excess whitespace data.append(map(str.strip, row)) # print the result print number_of_female_in_class_3(data)
一些解释：
首先，在你的文件中有一个大写F的女性，其次是你的列号向后（第5列是性别，第3列是班级）在开始递增sum变量之前，需要将其初始化为0。
此处不需要numpy和pandas，尽管您需要将strip函数应用于每行中的每个元素，以删除多余的空格（
map（str.strip，row）
），并将
delimiter='|'
传递到
csv.reader
，因为默认的分隔符是逗号。最后，您需要
在函数末尾返回sum
。
事实上，
pandas
可以在这里帮助您
我从一个更干净的CSV开始：

PassengerID,Survived,Pclass,Name,Sex 1,0,3,mary,female 2,1,2,james,male 3,1,3,tanya,female
如果你的CSV实际上看起来像你发布的内容（不是真正的CSV），那么你将有一些争论要做（见下文）。但是如果你能让熊猫吃到它：

>>> import pandas as pd >>> df = pd.DataFrame.from_csv('data.csv') >>> result = df[(df.Sex=='female') & (df.Survived==False)]
产生一个新的
数据帧
：

>>> result Survived Pclass Name Sex PassengerID 1 0 3 mary female
您可以执行
len（result）
来获取您要进行的计数

正在加载该CSV 如果你被那讨厌的CSV困住了，你可以像这样得到你的
df
：

# Load using a different delimiter. df = pd.DataFrame.from_csv('data.csv', sep="|") # Rename the index. df.index.names = ['PassID'] # Rename the columns, using X for the bogus one. df.columns = ['Survived', 'Pclass', 'Name', 'Sex', 'X'] # Remove the 'extra' column. del df['X']

您可以导入熊猫，但不使用它。不过，这是一个很好的用例，因此可能值得研究一下。看看我的答案。希望您可以轻松地将数据重新格式化为一个更干净的CSV，一切都会“正常工作”。它一直在给我提供帮助0@Mr_Shoryuken它使用您发布的数据为我打印2。您是否复制并粘贴了上面的内容并进行了尝试？还值得注意的是，使用pandas的速度要慢很多（对于您发布的小数据集，速度要慢100倍，而数据集变大的速度要慢得多）