有没有办法用一个地区对应的大陆来标记这个地区?python

有没有办法用一个地区对应的大陆来标记这个地区?python,python,pandas,numpy,if-statement,Python,Pandas,Numpy,If Statement,我获得了多个地区和国家的csv。我的目标是为每个国家/地区创建一个包含相应大陆的新专栏。我为每个大陆创建了数组,列出了其中的所有国家。然而,这一方法使地区和城市下落不明。有没有更好的办法?我在研究中发现的地图信息没有一个与大陆对应。以下是我目前/低效的方法 import pandas as pd import numpy as np import matplotlib.pyplot as plt import psycopg2 as ps #Part 1 url = 'https://raw

我获得了多个地区和国家的csv。我的目标是为每个国家/地区创建一个包含相应大陆的新专栏。我为每个大陆创建了数组,列出了其中的所有国家。然而,这一方法使地区和城市下落不明。有没有更好的办法?我在研究中发现的地图信息没有一个与大陆对应。以下是我目前/低效的方法

import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
import psycopg2 as ps

#Part 1
url = 'https://raw.githubusercontent.com/laurenaxon/ding_INFO5502_SPRING2020/master/indicator%20hiv%20estimated%20prevalence%25%2015-49.csv'
df = pd.read_csv(url)

Africa = ('Algeria','Angola','Benin','Botswana','Burkina','Burundi','Cameroon','Cape Verde','Central African Republic','Chad','Comoros','Congo','Congo, Democratic Republic of','Djibouti','Egypt','Equatorial Guinea','Eritrea','Ethiopia','Gabon','Gambia','Ghana','Guinea','Guinea-Bissau','Ivory Coast','Kenya','Lesotho','Liberia','Libya','Madagascar','Malawi','Mali','Mauritania','Mauritius','Morocco','Mozambique','Namibia','Niger','Nigeria','Rwanda','Sao Tome and Principe','Senegal','Seychelles','Sierra Leone','Somalia','South Africa','South Sudan','Sudan','Swaziland','Tanzania','Togo','Tunisia','Uganda','Zambia','Zimbabwe')
Asia = ('Afghanistan','Bahrain','Bangladesh','Bhutan','Brunei','Burma (Myanmar)','Cambodia','China','East Timor','India','Indonesia','Iran','Iraq','Israel','Japan','Jordan','Kazakhstan','Korea, North','Korea, South','Kuwait','Kyrgyzstan','Laos','Lebanon','Malaysia','Maldives','Mongolia','Nepal','Oman','Pakistan','Philippines','Qatar','Russian Federation','Saudi Arabia','Singapore','Sri Lanka','Syria','Tajikistan','Thailand','Turkey','Turkmenistan','United Arab Emirates','Uzbekistan','Vietnam','Yemen')
Europe = ('Albania','Andorra','Armenia','Austria','Azerbaijan','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Liechtenstein','Lithuania','Luxembourg','Macedonia','Malta','Moldova','Monaco','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','San Marino','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Ukraine','United Kingdom','Vatican City')
North_America = ('Antigua and Barbuda','Bahamas','Barbados','Belize','Canada','Costa Rica','Cuba','Dominica','Dominican Republic','El Salvador','Grenada','Guatemala','Haiti','Honduras','Jamaica','Mexico','Nicaragua','Panama','Saint Kitts and Nevis','Saint Lucia','Saint Vincent and the Grenadines','Trinidad and Tobago','United States')
South_America = ('Argentina','Bolivia','Brazil','Chile','Colombia','Ecuador','Guyana','Paraguay','Peru','Suriname','Uruguay','Venezuela')
Australia_Oceania = ('Australia','Fiji','Kiribati','Marshall Islands','Micronesia','Nauru','New Zealand','Palau','Papua New Guinea','Samoa','Solomon Islands','Tonga','Tuvalu','Vanuatu')

country = df["Estimated HIV Prevalence% - (Ages 15-49)"]

def GetConti(country):
    if country in Africa:
        return "Africa"
    elif country in Asia:
        return "Asia"
    elif country in Europe:
        return "Europe"
    elif country in North_America:
        return "North America"
    elif country in South_America:
        return "South America"
    elif country in Australia_Oceania:
        return "Australia/Oceania"
    else:
        return "Other"



df['Continent']=country.apply(GetConti)
df.to_csv('url', sep='\t')


print(df)

要在O(1)时间运行此操作,您需要一个哈希表(字典),其中每个国家都是对应于它的键,如下所示:

{
    'Algeria': 'Africa',
    'Angola': 'Africa',
    ...

    'Afghanistan': 'Asia',
    ...
}
当然,这是非常繁琐的编写过程,因此您可以使用以下方法将数据转换为这种格式:

class continents:
    Africa = ('Algeria','Angola','Benin','Botswana','Burkina','Burundi','Cameroon','Cape Verde','Central African Republic','Chad','Comoros','Congo','Congo, Democratic Republic of','Djibouti','Egypt','Equatorial Guinea','Eritrea','Ethiopia','Gabon','Gambia','Ghana','Guinea','Guinea-Bissau','Ivory Coast','Kenya','Lesotho','Liberia','Libya','Madagascar','Malawi','Mali','Mauritania','Mauritius','Morocco','Mozambique','Namibia','Niger','Nigeria','Rwanda','Sao Tome and Principe', 'Senegal','Seychelles','Sierra Leone','Somalia','South Africa','South Sudan','Sudan','Swaziland','Tanzania','Togo','Tunisia','Uganda','Zambia','Zimbabwe')

    Asia = ('Afghanistan','Bahrain','Bangladesh','Bhutan','Brunei','Burma (Myanmar)','Cambodia','China','East Timor','India','Indonesia','Iran','Iraq','Israel','Japan','Jordan','Kazakhstan','Korea, North','Korea, South','Kuwait','Kyrgyzstan','Laos','Lebanon','Malaysia','Maldives','Mongolia','Nepal','Oman','Pakistan','Philippines','Qatar','Russian Federation','Saudi Arabia','Singapore','Sri Lanka','Syria','Tajikistan','Thailand','Turkey','Turkmenistan','United Arab Emirates','Uzbekistan','Vietnam','Yemen')

    Europe = ('Albania','Andorra','Armenia','Austria','Azerbaijan','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Liechtenstein','Lithuania','Luxembourg','Macedonia','Malta','Moldova','Monaco','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','San Marino','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Ukraine','United Kingdom','Vatican City')

    North_America = ('Antigua and Barbuda','Bahamas','Barbados','Belize','Canada','Costa Rica','Cuba','Dominica','Dominican Republic','El Salvador','Grenada','Guatemala','Haiti','Honduras','Jamaica','Mexico','Nicaragua','Panama','Saint Kitts and Nevis','Saint Lucia','Saint Vincent and the Grenadines','Trinidad and Tobago','United States')
    South_America = ('Argentina','Bolivia','Brazil','Chile','Colombia','Ecuador','Guyana','Paraguay','Peru','Suriname','Uruguay','Venezuela')
    Australia_Oceania = ('Australia','Fiji','Kiribati','Marshall Islands','Micronesia','Nauru','New Zealand','Palau','Papua New Guinea','Samoa','Solomon Islands','Tonga','Tuvalu','Vanuatu')

country_to_continent_map = {}
for name in dir(continents):
    if name.startswith('_'): continue

    country_set = getattr(continents, name)

    for country in country_set:
        country_to_continent_map[country] = name
from some_file import country_to_continent_map

def get_continent(country):
    try:
        return country_to_continent_map[country]
    except KeyError:
        return 'Other'

col = 'Estimated HIV Prevalence% - (Ages 15-49)'

country = df[col]
country.apply(get_continent)
print(df.to_csv())
然后将其写入某个文件:

file = f'''
country_to_continent_map = {str(country_to_continent_map)}
'''
with open('some_file.py', 'w+') as f:
    f.write(file)
现在,您可以简单地将代码转换为以下内容:

class continents:
    Africa = ('Algeria','Angola','Benin','Botswana','Burkina','Burundi','Cameroon','Cape Verde','Central African Republic','Chad','Comoros','Congo','Congo, Democratic Republic of','Djibouti','Egypt','Equatorial Guinea','Eritrea','Ethiopia','Gabon','Gambia','Ghana','Guinea','Guinea-Bissau','Ivory Coast','Kenya','Lesotho','Liberia','Libya','Madagascar','Malawi','Mali','Mauritania','Mauritius','Morocco','Mozambique','Namibia','Niger','Nigeria','Rwanda','Sao Tome and Principe', 'Senegal','Seychelles','Sierra Leone','Somalia','South Africa','South Sudan','Sudan','Swaziland','Tanzania','Togo','Tunisia','Uganda','Zambia','Zimbabwe')

    Asia = ('Afghanistan','Bahrain','Bangladesh','Bhutan','Brunei','Burma (Myanmar)','Cambodia','China','East Timor','India','Indonesia','Iran','Iraq','Israel','Japan','Jordan','Kazakhstan','Korea, North','Korea, South','Kuwait','Kyrgyzstan','Laos','Lebanon','Malaysia','Maldives','Mongolia','Nepal','Oman','Pakistan','Philippines','Qatar','Russian Federation','Saudi Arabia','Singapore','Sri Lanka','Syria','Tajikistan','Thailand','Turkey','Turkmenistan','United Arab Emirates','Uzbekistan','Vietnam','Yemen')

    Europe = ('Albania','Andorra','Armenia','Austria','Azerbaijan','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czech Republic','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Liechtenstein','Lithuania','Luxembourg','Macedonia','Malta','Moldova','Monaco','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','San Marino','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Ukraine','United Kingdom','Vatican City')

    North_America = ('Antigua and Barbuda','Bahamas','Barbados','Belize','Canada','Costa Rica','Cuba','Dominica','Dominican Republic','El Salvador','Grenada','Guatemala','Haiti','Honduras','Jamaica','Mexico','Nicaragua','Panama','Saint Kitts and Nevis','Saint Lucia','Saint Vincent and the Grenadines','Trinidad and Tobago','United States')
    South_America = ('Argentina','Bolivia','Brazil','Chile','Colombia','Ecuador','Guyana','Paraguay','Peru','Suriname','Uruguay','Venezuela')
    Australia_Oceania = ('Australia','Fiji','Kiribati','Marshall Islands','Micronesia','Nauru','New Zealand','Palau','Papua New Guinea','Samoa','Solomon Islands','Tonga','Tuvalu','Vanuatu')

country_to_continent_map = {}
for name in dir(continents):
    if name.startswith('_'): continue

    country_set = getattr(continents, name)

    for country in country_set:
        country_to_continent_map[country] = name
from some_file import country_to_continent_map

def get_continent(country):
    try:
        return country_to_continent_map[country]
    except KeyError:
        return 'Other'

col = 'Estimated HIV Prevalence% - (Ages 15-49)'

country = df[col]
country.apply(get_continent)
print(df.to_csv())

这不一定能回答您的问题,但您应该使用Python而不是元组。它将使用元组或列表在O(1)中查找,而不是在O(n)中查找。这仍然给我留下了未标记区域的问题,这些区域不是国家,例如州和城市。我当前的代码足以返回每个国家的正确大陆,我的问题在于非国家地区。@LaurenA。对不起,我误解了你的问题。您有非国家地区的数据吗?它与国家列在同一列中。数据相互交织。我想知道是否有人知道如何检索这些信息,而不必像我为各国所做的那样手动查找每个地区。这是为了一个家庭作业,我假设有一种我没有找到的自动方式,因为我们被特别要求不要手动输入信息。