如何使用Python检查多个值是否包含CSV数据集?

如何使用Python检查多个值是否包含CSV数据集?,python,list,csv,Python,List,Csv,我试图检查CSV文件是否包含来自美国的所有州。正如您在我的代码中所看到的,我已将CSV文件作为Python列表导入。我正在尝试解决这个问题,不使用熊猫或其他模块 我已经创建了一个状态列表,但我想知道检查CSV数据集包含多少个状态的最有效解决方案是什么 import csv with open('president_county_candidate.csv', newline='', encoding='utf_8') as file: reader = csv.reader(file)

我试图检查CSV文件是否包含来自美国的所有州。正如您在我的代码中所看到的,我已将CSV文件作为Python列表导入。我正在尝试解决这个问题,不使用熊猫或其他模块

我已经创建了一个状态列表,但我想知道检查CSV数据集包含多少个状态的最有效解决方案是什么

import csv

with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
    reader = csv.reader(file)
    data = list(reader)

print(data)

[['state', 'county', 'candidate', 'party', 'votes'], ['Delaware', 'Kent County', 'Joe Biden', 'DEM', '44518'], ['Delaware', 'Kent County', 'Donald Trump', 'REP', '40976'], ['Delaware', 'Kent County', 'Jo Jorgensen', 'LIB', '1044'], ['Delaware', 'Kent County', 'Howie Hawkins', 'GRN', '420'], ['Delaware', 'Kent County', ' Write-ins', 'WRI', '0'], ['Delaware', 'New Castle County', 'Joe Biden', 'DEM', '194245'], ['Delaware', 'New Castle County', 'Donald Trump', 'REP', '87687'], ['Delaware', 'New Castle County', 'Jo Jorgensen', 'LIB', '2932'], ['Delaware', 'New Castle County', 'Howie Hawkins', 'GRN', '1277'], ['Delaware', 'New Castle County', ' Write-ins', 'WRI', '0'], ['Delaware', 'Sussex County', 'Donald Trump', 'REP', '71196'], ['Delaware', 'Sussex County', 'Joe Biden', 'DEM', '56657'], ['Delaware', 'Sussex County', 'Jo Jorgensen', 'LIB', '1003'], ['Delaware', 'Sussex County', 'Howie Hawkins', 'GRN', '437'], ['District of Columbia', 'District of Columbia', 'Joe Biden', 'DEM', '31723'], ['District of Columbia', 'District of Columbia', 'Donald Trump', 'REP', '1239'], ['District of Columbia', 'District of Columbia', ' Write-ins', 'WRI', '206'], ['District of Columbia', 'District of Columbia', 'Howie Hawkins', 'GRN', '192'], ['District of Columbia', 'District of Columbia', 'Jo Jorgensen', 'LIB', '147'], ['District of Columbia', 'District of Columbia', 'Gloria La Riva', 'PSL', '77'], ['District of Columbia', 'District of Columbia', 'Brock Pierce', 'IND', '28'], ['District of Columbia', 'Ward 2', 'Joe Biden', 'DEM', '25228'], ['District of Columbia', 'Ward 2', 'Donald Trump', 'REP', '2466'], ['District of Columbia', 'Ward 2', ' Write-ins', 'WRI', '298'], ['District of Columbia', 'Ward 2', 'Jo Jorgensen', 'LIB', '229'], ['District of Columbia', 'Ward 2', 'Howie Hawkins', 'GRN', '96'], ['District of Columbia', 'Ward 2', 'Gloria La Riva', 'PSL', '37'], ['District of Columbia', 'Ward 2', 'Brock Pierce', 'IND', '32']]

states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii', 
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']

此示例将统计在
数据
列表中找到的每个状态:

counter = {}
for state, *_ in data:
    if state in states:
        counter.setdefault(state, 0)
        counter[state] += 1

for state in states:
    print("{:<20} {}".format(state, counter.get(state, 0)))

print()
print("Total states found:", len(counter))


注意:为了加快速度,您可以事先将
状态
列表
转换为
设置

如果您的目标只是

import csv

with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
    reader = csv.reader(file)
    data = list(reader)

states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii', 
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']

for state in data:
    for i in states:
        if state == i:
            print(state)
检查CSV文件是否包含来自美国的所有州

然后,您可以在文件中找到唯一的一组状态,并确保它们正好计数为50

number=len(设置(记录[0]。数据[1:]中记录的下限())
#预计:数字应为50

首先,一个提示:在这种情况下,使用csv.DictReader可能更容易,因为它将为您提供带标签的行并自动跳过第一行。不需要,但使代码更易于阅读

导入csv
将open('test.csv')作为f:
数据=列表(csv.DictReader(f))
打印(数据)
#印刷品:[
#{‘州’:‘特拉华’、‘县’:‘肯特县’、‘候选人’:‘乔·拜登’、‘政党’:‘DEM’、‘选票’:‘44518’},
#{‘州’:‘特拉华’、‘县’:‘肯特县’、‘候选人’:‘唐纳德·特朗普’、‘政党’:‘代表’、‘选票’:‘40976’}
#    ...
# ]
然后,可以使用此表达式获取csv文件中提到的所有状态:

states_in_csv=set(数据行的行['state'])
打印(状态在csv中)
#{'特拉华州','哥伦比亚特区',…}
line['state']对于数据中的行
是一个列表理解,它只提取每一行的“state”字段
set()
生成这些状态的集合,即删除所有重复项

然后,您可以轻松地测试表中表示了多少状态。例如:

num_states=0
对于[“”]中的状态:
如果状态为\u,则状态为\u csv:
num_states+=1
打印(“州数:”,州数)
这是非常有效的,因为检查一个值是否在一个集合中是一个常数时间操作,所以您不必在整个表中搜索每个状态


您的
状态列表似乎包含每个状态。如果您只想知道表中有多少个状态,您可以简单地使用
len(states\u in_csv)

您可以使用HashMap,或者在本例中使用Python字典,这是此作业最有效的数据结构。此代码段可以帮助您:

dict={}
for i in data:
    #verify if the state exist
    if i[0] in states:
        if  i[0]  in dict.keys():
            dict[i[0]] +=1
        else:
            dict[i[0]]=1

for k in dict.keys():
    if dict[k]>1:
        print(f"There are {dict[k]} times the {k} state")


您是否尝试过为
循环编写一个简单的
?到底是什么问题?