如何使用Python检查多个值是否包含CSV数据集？_Python_List_Csv

如何使用Python检查多个值是否包含CSV数据集？

python list csv

如何使用Python检查多个值是否包含CSV数据集？,python,list,csv,Python,List,Csv,我试图检查CSV文件是否包含来自美国的所有州。正如您在我的代码中所看到的，我已将CSV文件作为Python列表导入。我正在尝试解决这个问题，不使用熊猫或其他模块我已经创建了一个状态列表，但我想知道检查CSV数据集包含多少个状态的最有效解决方案是什么 import csv with open('president_county_candidate.csv', newline='', encoding='utf_8') as file: reader = csv.reader(file)

我试图检查CSV文件是否包含来自美国的所有州。正如您在我的代码中所看到的，我已将CSV文件作为Python列表导入。我正在尝试解决这个问题，不使用熊猫或其他模块

我已经创建了一个状态列表，但我想知道检查CSV数据集包含多少个状态的最有效解决方案是什么

import csv

with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
    reader = csv.reader(file)
    data = list(reader)

print(data)

[['state', 'county', 'candidate', 'party', 'votes'], ['Delaware', 'Kent County', 'Joe Biden', 'DEM', '44518'], ['Delaware', 'Kent County', 'Donald Trump', 'REP', '40976'], ['Delaware', 'Kent County', 'Jo Jorgensen', 'LIB', '1044'], ['Delaware', 'Kent County', 'Howie Hawkins', 'GRN', '420'], ['Delaware', 'Kent County', ' Write-ins', 'WRI', '0'], ['Delaware', 'New Castle County', 'Joe Biden', 'DEM', '194245'], ['Delaware', 'New Castle County', 'Donald Trump', 'REP', '87687'], ['Delaware', 'New Castle County', 'Jo Jorgensen', 'LIB', '2932'], ['Delaware', 'New Castle County', 'Howie Hawkins', 'GRN', '1277'], ['Delaware', 'New Castle County', ' Write-ins', 'WRI', '0'], ['Delaware', 'Sussex County', 'Donald Trump', 'REP', '71196'], ['Delaware', 'Sussex County', 'Joe Biden', 'DEM', '56657'], ['Delaware', 'Sussex County', 'Jo Jorgensen', 'LIB', '1003'], ['Delaware', 'Sussex County', 'Howie Hawkins', 'GRN', '437'], ['District of Columbia', 'District of Columbia', 'Joe Biden', 'DEM', '31723'], ['District of Columbia', 'District of Columbia', 'Donald Trump', 'REP', '1239'], ['District of Columbia', 'District of Columbia', ' Write-ins', 'WRI', '206'], ['District of Columbia', 'District of Columbia', 'Howie Hawkins', 'GRN', '192'], ['District of Columbia', 'District of Columbia', 'Jo Jorgensen', 'LIB', '147'], ['District of Columbia', 'District of Columbia', 'Gloria La Riva', 'PSL', '77'], ['District of Columbia', 'District of Columbia', 'Brock Pierce', 'IND', '28'], ['District of Columbia', 'Ward 2', 'Joe Biden', 'DEM', '25228'], ['District of Columbia', 'Ward 2', 'Donald Trump', 'REP', '2466'], ['District of Columbia', 'Ward 2', ' Write-ins', 'WRI', '298'], ['District of Columbia', 'Ward 2', 'Jo Jorgensen', 'LIB', '229'], ['District of Columbia', 'Ward 2', 'Howie Hawkins', 'GRN', '96'], ['District of Columbia', 'Ward 2', 'Gloria La Riva', 'PSL', '37'], ['District of Columbia', 'Ward 2', 'Brock Pierce', 'IND', '32']]

states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii', 
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']

此示例将统计在

数据列表中找到的每个状态：
counter = {}
for state, *_ in data:
    if state in states:
        counter.setdefault(state, 0)
        counter[state] += 1

for state in states:
    print("{:<20} {}".format(state, counter.get(state, 0)))

print()
print("Total states found:", len(counter))


注意：为了加快速度，您可以事先将状态
从列表
转换为设置
。
如果您的目标只是
import csv

with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
    reader = csv.reader(file)
    data = list(reader)

states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii', 
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']

for state in data:
    for i in states:
        if state == i:
            print(state)

检查CSV文件是否包含来自美国的所有州
然后，您可以在文件中找到唯一的一组状态，并确保它们正好计数为50
number=len（设置（记录[0]。数据[1:]中记录的下限（））
#预计：数字应为50
首先，一个提示：在这种情况下，使用csv.DictReader可能更容易，因为它将为您提供带标签的行并自动跳过第一行。不需要，但使代码更易于阅读
导入csv
将open（'test.csv'）作为f：
数据=列表（csv.DictReader（f））
打印（数据）
#印刷品：[
#{‘州’：‘特拉华’、‘县’：‘肯特县’、‘候选人’：‘乔·拜登’、‘政党’：‘DEM’、‘选票’：‘44518’}，
#{‘州’：‘特拉华’、‘县’：‘肯特县’、‘候选人’：‘唐纳德·特朗普’、‘政党’：‘代表’、‘选票’：‘40976’}
#    ...
# ]

然后，可以使用此表达式获取csv文件中提到的所有状态：
states_in_csv=set（数据行的行['state']）
打印（状态在csv中）
#{'特拉华州'，'哥伦比亚特区'，…}

line['state']对于数据中的行
是一个列表理解，它只提取每一行的“state”字段set（）
生成这些状态的集合，即删除所有重复项
然后，您可以轻松地测试表中表示了多少状态。例如：
num_states=0
对于[“”]中的状态：
如果状态为\u，则状态为\u csv：
num_states+=1
打印（“州数：”，州数）

这是非常有效的，因为检查一个值是否在一个集合中是一个常数时间操作，所以您不必在整个表中搜索每个状态
您的状态列表似乎包含每个状态。如果您只想知道表中有多少个状态，您可以简单地使用len（states\u in_csv）
您可以使用HashMap，或者在本例中使用Python字典，这是此作业最有效的数据结构。此代码段可以帮助您：
dict={}
for i in data:
    #verify if the state exist
    if i[0] in states:
        if  i[0]  in dict.keys():
            dict[i[0]] +=1
        else:
            dict[i[0]]=1

for k in dict.keys():
    if dict[k]>1:
        print(f"There are {dict[k]} times the {k} state")


您是否尝试过为
循环编写一个简单的？到底是什么问题？