如何使用Python将国家代码转换为全名,并根据Excel文件中的城市名称推断国家名称?

如何使用Python将国家代码转换为全名,并根据Excel文件中的城市名称推断国家名称?,python,excel,country,city,Python,Excel,Country,City,我是Python的初学者 现在我的Excel文件中有两列。一个是国家栏,另一个是城市栏 对于国家/地区列,大多数值显示在国家/地区代码中,其中一些值显示在国家/地区全名中,而一些值是美国州代码,其中不到1%为空 对于城市列,它清楚地显示了完整的城市名称(而不是城市代码),而其中近20%为空白 如何使用Python创建一个新列,根据国家代码显示完整的国家名称,如果在国家列中显示完整的国家名称,并且在新列中显示美国代码作为美国,则该列的名称保持不变 棘手的部分是,在国家栏中,以CO为例,CO可以代表

我是Python的初学者

现在我的Excel文件中有两列。一个是国家栏,另一个是城市栏

对于国家/地区列,大多数值显示在国家/地区代码中,其中一些值显示在国家/地区全名中,而一些值是美国州代码,其中不到1%为空

对于城市列,它清楚地显示了完整的城市名称(而不是城市代码),而其中近20%为空白

如何使用Python创建一个新列,根据国家代码显示完整的国家名称,如果在国家列中显示完整的国家名称,并且在新列中显示美国代码作为美国,则该列的名称保持不变

棘手的部分是,在国家栏中,以CO为例,CO可以代表哥伦比亚和科罗拉多,我不能确定一开始它是一个国家还是一个州,但当我检查相应的城市名称时,我可以知道它是一个国家还是一个州(例如:科罗拉多州的朗蒙特,哥伦比亚州的波哥大)。如何在新列中避免此问题,并根据相应的城市名称推断新列中的完整国家名称


我感谢你的帮助

这种方法的建议是创建字典(即
dic={'CO':'columbia',…}
dic_state={'CO':'Colorado',…})
。然后,可能有一个if语句来检查国家是否是美国。如果是美国,则使用
dic_state
。最后,您可以使用适当的命令创建一个新列(这取决于您正在使用的包/模块)


祝你好运

好的,您可以使用{key(state):Values(cities归属于states)}json并使用python读取文件并将列表排列到相应的city、state。

解释

使用以下逻辑对任务进行编码

  • 处理简单的缩写,如U.S
  • 长度大于3的国家
  • 有乡村和城市
    • 在城市中查找最近的国家/城市对
  • 仅限国家
    • 在两个字母的国家/地区代码的国家/地区列表中查找最接近的国家/地区匹配项
  • 国家长度等于3
    • 查找具有3个字母的国家/地区代码的国家/地区
  • 国家长度等于2(可以是国家或州代码)
  • 代码不存在于状态列表中
    • 必须是国家代码,因此请使用两个字母的国家代码查找国家
  • 国家/地区列表中不存在代码
    • 必须是美国的州代码,所以国家是美国
  • 可以是国家或州代码
    • 检查是否将该城市作为州代码
    • 检查是否将此作为国家/地区代码的城市
    • 必须是这两种可能性的最佳匹配
  • 注意:字符串匹配使用模糊匹配,以允许名称拼写的灵活性 rapidfuzz库的使用优于FuzzyFuzzy,因为它比FuzzyFuzzy快一个数量级

    代码

    import pandas as pd
    from rapidfuzz import fuzz
    
    def find_closest_country(country):
        ' Country with the closest name in list of countries in country code '
        ratios = [fuzz.partial_ratio(country, x) for x in alpha2.values()]
        rated_countries = [(info, r) for info, r in zip(alpha2.values(), ratios)]
        
        # Best match with shortest name
        return sorted(rated_countries, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def check_city_country(city, country):
        ' City, Country pair closest in list of cities '
        ratios = [fuzz.partial_ratio(city, x['name']) * fuzz.partial_ratio(country, x['country']) for x in cities]
        rated_cities = [(info, r) for info, r in zip(cities, ratios)]
        
        # Best match with shortest name
        return sorted(rated_cities, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def check_city_subregion(city, subregion):
        ' City, subresion pair closest in list of cities '
        ratios = [fuzz.partial_ratio(city, x['name']) * fuzz.partial_ratio(subregion, x['subcountry']) for x in cities]
        rated_cities = [(info, r) for info, r in zip(cities, ratios)]
        
        # Best match with shortest name
        return sorted(rated_cities, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def lookup(country, city):
        '''
            Finds country based upon country and city
            country - country name or country code
            city - name of city
        '''
        if country.lower() == 'u.s.':
            # Picks up common US acronym
            country = "US"
       
        if len(country) > 3:
            # Must be country since too long for abbreviation
            if city:
                # Find closest city country pair in list of cities
                city_info = check_city_country(city, country)
                if city_info:
                    return city_info[0]['country']
           
            # No city, so find closest country in list of countries (2 code abbreviations reverse lookup)
            countries = find_closest_country(country)
            if countries:
                return countries[0]
            
            return None
        elif len(country) == 3:
            # 3 letter abbreviation
            country = country.upper()
            return alpha3.get(country, None)
        
        elif len(country) == 2:
            # Two letter country abbreviation
            country = country.upper()
            if not country in states:
                # Not a state code, so lookup contry from code
                return alpha2.get(country, None)
            
            if not country in alpha2:
                # Not a country code, so must be state code for US
                return "United States of America"
            
            # Could be country of state code
            
            if city:
                # Have 2 digit code (could be country or state)
                pos_country = alpha2[country]  # possible country
                pos_state = states[country]    # possible state
                
                # check closest country with this city
                pos_countries = check_city_country(city, pos_country)
                
                # If state code, country would be United States
                pos_us = check_city_country(city, "United States")
                
                if pos_countries[1] > pos_us[1]:
                    # Provided better match as country code
                    return pos_countries[0]['country']
                else:
                    # Provided better match as state code (i.e. "United States")
                    return pos_us[0]['country']
            else:
                return alpha2[country]
                 
        else:
            return None
       
    
    df = pd.read_excel('country_test.xlsx') # Load Excel File
    df.fillna('', inplace=True)
    
    # Get name of country based upon country and city
    df['country_'] = df.apply(lambda row: lookup(row['country'], row['city']), axis = 1)
    
    数据

    # State Codes
    # https://gist.github.com/rugbyprof/76575b470b6772ce8fa0c49e23931d97
    states = {"AL":"Alabama","AK":"Alaska","AZ":"Arizona","AR":"Arkansas","CA":"California","CO":"Colorado","CT":"Connecticut","DE":"Delaware","FL":"Florida","GA":"Georgia","HI":"Hawaii","ID":"Idaho","IL":"Illinois","IN":"Indiana","IA":"Iowa","KS":"Kansas","KY":"Kentucky","LA":"Louisiana","ME":"Maine","MD":"Maryland","MA":"Massachusetts","MI":"Michigan","MN":"Minnesota","MS":"Mississippi","MO":"Missouri","MT":"Montana","NE":"Nebraska","NV":"Nevada","NH":"New Hampshire","NJ":"New Jersey","NM":"New Mexico","NY":"New York","NC":"North Carolina","ND":"North Dakota","OH":"Ohio","OK":"Oklahoma","OR":"Oregon","PA":"Pennsylvania","RI":"Rhode Island","SC":"South Carolina","SD":"South Dakota","TN":"Tennessee","TX":"Texas","UT":"Utah","VT":"Vermont","VA":"Virginia","WA":"Washington","WV":"West Virginia","WI":"Wisconsin","WY":"Wyoming"}
    
    # two letter country codes
    # https://gist.github.com/carlopires/1261951/d13ca7320a6abcd4b0aa800d351a31b54cefdff4
    alpha2 = {
        'AD': 'Andorra',
        'AE': 'United Arab Emirates',
        'AF': 'Afghanistan',
        'AG': 'Antigua & Barbuda',
        'AI': 'Anguilla',
        'AL': 'Albania',
        'AM': 'Armenia',
        'AN': 'Netherlands Antilles',
        'AO': 'Angola',
        'AQ': 'Antarctica',
        'AR': 'Argentina',
        'AS': 'American Samoa',
        'AT': 'Austria',
        'AU': 'Australia',
        'AW': 'Aruba',
        'AZ': 'Azerbaijan',
        'BA': 'Bosnia and Herzegovina',
        'BB': 'Barbados',
        'BD': 'Bangladesh',
        'BE': 'Belgium',
        'BF': 'Burkina Faso',
        'BG': 'Bulgaria',
        'BH': 'Bahrain',
        'BI': 'Burundi',
        'BJ': 'Benin',
        'BM': 'Bermuda',
        'BN': 'Brunei Darussalam',
        'BO': 'Bolivia',
        'BR': 'Brazil',
        'BS': 'Bahama',
        'BT': 'Bhutan',
        'BU': 'Burma (no longer exists)',
        'BV': 'Bouvet Island',
        'BW': 'Botswana',
        'BY': 'Belarus',
        'BZ': 'Belize',
        'CA': 'Canada',
        'CC': 'Cocos (Keeling) Islands',
        'CF': 'Central African Republic',
        'CG': 'Congo',
        'CH': 'Switzerland',
        'CI': 'Côte D\'ivoire (Ivory Coast)',
        'CK': 'Cook Iislands',
        'CL': 'Chile',
        'CM': 'Cameroon',
        'CN': 'China',
        'CO': 'Colombia',
        'CR': 'Costa Rica',
        'CS': 'Czechoslovakia (no longer exists)',
        'CU': 'Cuba',
        'CV': 'Cape Verde',
        'CX': 'Christmas Island',
        'CY': 'Cyprus',
        'CZ': 'Czech Republic',
        'DD': 'German Democratic Republic (no longer exists)',
        'DE': 'Germany',
        'DJ': 'Djibouti',
        'DK': 'Denmark',
        'DM': 'Dominica',
        'DO': 'Dominican Republic',
        'DZ': 'Algeria',
        'EC': 'Ecuador',
        'EE': 'Estonia',
        'EG': 'Egypt',
        'EH': 'Western Sahara',
        'ER': 'Eritrea',
        'ES': 'Spain',
        'ET': 'Ethiopia',
        'FI': 'Finland',
        'FJ': 'Fiji',
        'FK': 'Falkland Islands (Malvinas)',
        'FM': 'Micronesia',
        'FO': 'Faroe Islands',
        'FR': 'France',
        'FX': 'France, Metropolitan',
        'GA': 'Gabon',
        'GB': 'United Kingdom (Great Britain)',
        'GD': 'Grenada',
        'GE': 'Georgia',
        'GF': 'French Guiana',
        'GH': 'Ghana',
        'GI': 'Gibraltar',
        'GL': 'Greenland',
        'GM': 'Gambia',
        'GN': 'Guinea',
        'GP': 'Guadeloupe',
        'GQ': 'Equatorial Guinea',
        'GR': 'Greece',
        'GS': 'South Georgia and the South Sandwich Islands',
        'GT': 'Guatemala',
        'GU': 'Guam',
        'GW': 'Guinea-Bissau',
        'GY': 'Guyana',
        'HK': 'Hong Kong',
        'HM': 'Heard & McDonald Islands',
        'HN': 'Honduras',
        'HR': 'Croatia',
        'HT': 'Haiti',
        'HU': 'Hungary',
        'ID': 'Indonesia',
        'IE': 'Ireland',
        'IL': 'Israel',
        'IN': 'India',
        'IO': 'British Indian Ocean Territory',
        'IQ': 'Iraq',
        'IR': 'Islamic Republic of Iran',
        'IS': 'Iceland',
        'IT': 'Italy',
        'JM': 'Jamaica',
        'JO': 'Jordan',
        'JP': 'Japan',
        'KE': 'Kenya',
        'KG': 'Kyrgyzstan',
        'KH': 'Cambodia',
        'KI': 'Kiribati',
        'KM': 'Comoros',
        'KN': 'St. Kitts and Nevis',
        'KP': 'Korea, Democratic People\'s Republic of',
        'KR': 'Korea, Republic of',
        'KW': 'Kuwait',
        'KY': 'Cayman Islands',
        'KZ': 'Kazakhstan',
        'LA': 'Lao People\'s Democratic Republic',
        'LB': 'Lebanon',
        'LC': 'Saint Lucia',
        'LI': 'Liechtenstein',
        'LK': 'Sri Lanka',
        'LR': 'Liberia',
        'LS': 'Lesotho',
        'LT': 'Lithuania',
        'LU': 'Luxembourg',
        'LV': 'Latvia',
        'LY': 'Libyan Arab Jamahiriya',
        'MA': 'Morocco',
        'MC': 'Monaco',
        'MD': 'Moldova, Republic of',
        'MG': 'Madagascar',
        'MH': 'Marshall Islands',
        'ML': 'Mali',
        'MN': 'Mongolia',
        'MM': 'Myanmar',
        'MO': 'Macau',
        'MP': 'Northern Mariana Islands',
        'MQ': 'Martinique',
        'MR': 'Mauritania',
        'MS': 'Monserrat',
        'MT': 'Malta',
        'MU': 'Mauritius',
        'MV': 'Maldives',
        'MW': 'Malawi',
        'MX': 'Mexico',
        'MY': 'Malaysia',
        'MZ': 'Mozambique',
        'NA': 'Namibia',
        'NC': 'New Caledonia',
        'NE': 'Niger',
        'NF': 'Norfolk Island',
        'NG': 'Nigeria',
        'NI': 'Nicaragua',
        'NL': 'Netherlands',
        'NO': 'Norway',
        'NP': 'Nepal',
        'NR': 'Nauru',
        'NT': 'Neutral Zone (no longer exists)',
        'NU': 'Niue',
        'NZ': 'New Zealand',
        'OM': 'Oman',
        'PA': 'Panama',
        'PE': 'Peru',
        'PF': 'French Polynesia',
        'PG': 'Papua New Guinea',
        'PH': 'Philippines',
        'PK': 'Pakistan',
        'PL': 'Poland',
        'PM': 'St. Pierre & Miquelon',
        'PN': 'Pitcairn',
        'PR': 'Puerto Rico',
        'PT': 'Portugal',
        'PW': 'Palau',
        'PY': 'Paraguay',
        'QA': 'Qatar',
        'RE': 'Réunion',
        'RO': 'Romania',
        'RU': 'Russian Federation',
        'RW': 'Rwanda',
        'SA': 'Saudi Arabia',
        'SB': 'Solomon Islands',
        'SC': 'Seychelles',
        'SD': 'Sudan',
        'SE': 'Sweden',
        'SG': 'Singapore',
        'SH': 'St. Helena',
        'SI': 'Slovenia',
        'SJ': 'Svalbard & Jan Mayen Islands',
        'SK': 'Slovakia',
        'SL': 'Sierra Leone',
        'SM': 'San Marino',
        'SN': 'Senegal',
        'SO': 'Somalia',
        'SR': 'Suriname',
        'ST': 'Sao Tome & Principe',
        'SU': 'Union of Soviet Socialist Republics (no longer exists)',
        'SV': 'El Salvador',
        'SY': 'Syrian Arab Republic',
        'SZ': 'Swaziland',
        'TC': 'Turks & Caicos Islands',
        'TD': 'Chad',
        'TF': 'French Southern Territories',
        'TG': 'Togo',
        'TH': 'Thailand',
        'TJ': 'Tajikistan',
        'TK': 'Tokelau',
        'TM': 'Turkmenistan',
        'TN': 'Tunisia',
        'TO': 'Tonga',
        'TP': 'East Timor',
        'TR': 'Turkey',
        'TT': 'Trinidad & Tobago',
        'TV': 'Tuvalu',
        'TW': 'Taiwan, Province of China',
        'TZ': 'Tanzania, United Republic of',
        'UA': 'Ukraine',
        'UG': 'Uganda',
        'UM': 'United States Minor Outlying Islands',
        'US': 'United States of America',
        'UY': 'Uruguay',
        'UZ': 'Uzbekistan',
        'VA': 'Vatican City State (Holy See)',
        'VC': 'St. Vincent & the Grenadines',
        'VE': 'Venezuela',
        'VG': 'British Virgin Islands',
        'VI': 'United States Virgin Islands',
        'VN': 'Viet Nam',
        'VU': 'Vanuatu',
        'WF': 'Wallis & Futuna Islands',
        'WS': 'Samoa',
        'YD': 'Democratic Yemen (no longer exists)',
        'YE': 'Yemen',
        'YT': 'Mayotte',
        'YU': 'Yugoslavia',
        'ZA': 'South Africa',
        'ZM': 'Zambia',
        'ZR': 'Zaire',
        'ZW': 'Zimbabwe',
        'ZZ': 'Unknown or unspecified country',
    }
    
    # Three letter codes
    #https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3#Uses_and_applications
    alpha3 = """ABW  Aruba
    AFG  Afghanistan
    AGO  Angola
    AIA  Anguilla
    ALA  Åland Islands
    ALB  Albania
    AND  Andorra
    ARE  United Arab Emirates
    ARG  Argentina
    ARM  Armenia
    ASM  American Samoa
    ATA  Antarctica
    ATF  French Southern Territories
    ATG  Antigua and Barbuda
    AUS  Australia
    AUT  Austria
    AZE  Azerbaijan
    BDI  Burundi
    BEL  Belgium
    BEN  Benin
    BES  Bonaire, Sint Eustatius and Saba
    BFA  Burkina Faso
    BGD  Bangladesh
    BGR  Bulgaria
    BHR  Bahrain
    BHS  Bahamas
    BIH  Bosnia and Herzegovina
    BLM  Saint Barthélemy
    BLR  Belarus
    BLZ  Belize
    BMU  Bermuda
    BOL  Bolivia (Plurinational State of)
    BRA  Brazil
    BRB  Barbados
    BRN  Brunei Darussalam
    BTN  Bhutan
    BVT  Bouvet Island
    BWA  Botswana
    CAF  Central African Republic
    CAN  Canada
    CCK  Cocos (Keeling) Islands
    CHE  Switzerland
    CHL  Chile
    CHN  China
    CIV  Côte d'Ivoire
    CMR  Cameroon
    COD  Congo, Democratic Republic of the
    COG  Congo
    COK  Cook Islands
    COL  Colombia
    COM  Comoros
    CPV  Cabo Verde
    CRI  Costa Rica
    CUB  Cuba
    CUW  Curaçao
    CXR  Christmas Island
    CYM  Cayman Islands
    CYP  Cyprus
    CZE  Czechia
    DEU  Germany
    DJI  Djibouti
    DMA  Dominica
    DNK  Denmark
    DOM  Dominican Republic
    DZA  Algeria
    ECU  Ecuador
    EGY  Egypt
    ERI  Eritrea
    ESH  Western Sahara
    ESP  Spain
    EST  Estonia
    ETH  Ethiopia
    FIN  Finland
    FJI  Fiji
    FLK  Falkland Islands (Malvinas)
    FRA  France
    FRO  Faroe Islands
    FSM  Micronesia (Federated States of)
    GAB  Gabon
    GBR  United Kingdom of Great Britain and Northern Ireland
    GEO  Georgia
    GGY  Guernsey
    GHA  Ghana
    GIB  Gibraltar
    GIN  Guinea
    GLP  Guadeloupe
    GMB  Gambia
    GNB  Guinea-Bissau
    GNQ  Equatorial Guinea
    GRC  Greece
    GRD  Grenada
    GRL  Greenland
    GTM  Guatemala
    GUF  French Guiana
    GUM  Guam
    GUY  Guyana
    HKG  Hong Kong
    HMD  Heard Island and McDonald Islands
    HND  Honduras
    HRV  Croatia
    HTI  Haiti
    HUN  Hungary
    IDN  Indonesia
    IMN  Isle of Man
    IND  India
    IOT  British Indian Ocean Territory
    IRL  Ireland
    IRN  Iran (Islamic Republic of)
    IRQ  Iraq
    ISL  Iceland
    ISR  Israel
    ITA  Italy
    JAM  Jamaica
    JEY  Jersey
    JOR  Jordan
    JPN  Japan
    KAZ  Kazakhstan
    KEN  Kenya
    KGZ  Kyrgyzstan
    KHM  Cambodia
    KIR  Kiribati
    KNA  Saint Kitts and Nevis
    KOR  Korea, Republic of
    KWT  Kuwait
    LAO  Lao People's Democratic Republic
    LBN  Lebanon
    LBR  Liberia
    LBY  Libya
    LCA  Saint Lucia
    LIE  Liechtenstein
    LKA  Sri Lanka
    LSO  Lesotho
    LTU  Lithuania
    LUX  Luxembourg
    LVA  Latvia
    MAC  Macao
    MAF  Saint Martin (French part)
    MAR  Morocco
    MCO  Monaco
    MDA  Moldova, Republic of
    MDG  Madagascar
    MDV  Maldives
    MEX  Mexico
    MHL  Marshall Islands
    MKD  North Macedonia
    MLI  Mali
    MLT  Malta
    MMR  Myanmar
    MNE  Montenegro
    MNG  Mongolia
    MNP  Northern Mariana Islands
    MOZ  Mozambique
    MRT  Mauritania
    MSR  Montserrat
    MTQ  Martinique
    MUS  Mauritius
    MWI  Malawi
    MYS  Malaysia
    MYT  Mayotte
    NAM  Namibia
    NCL  New Caledonia
    NER  Niger
    NFK  Norfolk Island
    NGA  Nigeria
    NIC  Nicaragua
    NIU  Niue
    NLD  Netherlands
    NOR  Norway
    NPL  Nepal
    NRU  Nauru
    NZL  New Zealand
    OMN  Oman
    PAK  Pakistan
    PAN  Panama
    PCN  Pitcairn
    PER  Peru
    PHL  Philippines
    PLW  Palau
    PNG  Papua New Guinea
    POL  Poland
    PRI  Puerto Rico
    PRK  Korea (Democratic People's Republic of)
    PRT  Portugal
    PRY  Paraguay
    PSE  Palestine, State of
    PYF  French Polynesia
    QAT  Qatar
    REU  Réunion
    ROU  Romania
    RUS  Russian Federation
    RWA  Rwanda
    SAU  Saudi Arabia
    SDN  Sudan
    SEN  Senegal
    SGP  Singapore
    SGS  South Georgia and the South Sandwich Islands
    SHN  Saint Helena, Ascension and Tristan da Cunha
    SJM  Svalbard and Jan Mayen
    SLB  Solomon Islands
    SLE  Sierra Leone
    SLV  El Salvador
    SMR  San Marino
    SOM  Somalia
    SPM  Saint Pierre and Miquelon
    SRB  Serbia
    SSD  South Sudan
    STP  Sao Tome and Principe
    SUR  Suriname
    SVK  Slovakia
    SVN  Slovenia
    SWE  Sweden
    SWZ  Eswatini
    SXM  Sint Maarten (Dutch part)
    SYC  Seychelles
    SYR  Syrian Arab Republic
    TCA  Turks and Caicos Islands
    TCD  Chad
    TGO  Togo
    THA  Thailand
    TJK  Tajikistan
    TKL  Tokelau
    TKM  Turkmenistan
    TLS  Timor-Leste
    TON  Tonga
    TTO  Trinidad and Tobago
    TUN  Tunisia
    TUR  Turkey
    TUV  Tuvalu
    TWN  Taiwan, Province of China
    TZA  Tanzania, United Republic of
    UGA  Uganda
    UKR  Ukraine
    UMI  United States Minor Outlying Islands
    URY  Uruguay
    USA  United States of America
    UZB  Uzbekistan
    VAT  Holy See
    VCT  Saint Vincent and the Grenadines
    VEN  Venezuela (Bolivarian Republic of)
    VGB  Virgin Islands (British)
    VIR  Virgin Islands (U.S.)
    VNM  Viet Nam
    VUT  Vanuatu
    WLF  Wallis and Futuna
    WSM  Samoa
    YEM  Yemen
    ZAF  South Africa
    ZMB  Zambia
    ZWE  Zimbabwe"""
    
    # Convert to dictionary
    alpha3 = dict(tuple(re.split(r" {2,}", s)) for s in alpha3.split('\n'))
    
    # List of World Cities & Country
    # cities https://pkgstore.datahub.io/core/world-cities/world-cities_csv/data/6cc66692f0e82b18216a48443b6b95da/world-cities_csv.csv
    # Online CSV File
    
    import csv
    import urllib.request
    import io
    
    def csv_import(url):
        url_open = urllib.request.urlopen(url)
        csvfile = csv.DictReader(io.StringIO(url_open.read().decode('utf-8')), delimiter=',') 
        return csvfile
    
    url = 'https://pkgstore.datahub.io/core/world-cities/world-cities_csv/data/6cc66692f0e82b18216a48443b6b95da/world-cities_csv.csv'
    
    cities = csv_import(url)
    
    测试

    Excel文件(输入)

    测试代码

    import pandas as pd
    from rapidfuzz import fuzz
    
    def find_closest_country(country):
        ' Country with the closest name in list of countries in country code '
        ratios = [fuzz.partial_ratio(country, x) for x in alpha2.values()]
        rated_countries = [(info, r) for info, r in zip(alpha2.values(), ratios)]
        
        # Best match with shortest name
        return sorted(rated_countries, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def check_city_country(city, country):
        ' City, Country pair closest in list of cities '
        ratios = [fuzz.partial_ratio(city, x['name']) * fuzz.partial_ratio(country, x['country']) for x in cities]
        rated_cities = [(info, r) for info, r in zip(cities, ratios)]
        
        # Best match with shortest name
        return sorted(rated_cities, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def check_city_subregion(city, subregion):
        ' City, subresion pair closest in list of cities '
        ratios = [fuzz.partial_ratio(city, x['name']) * fuzz.partial_ratio(subregion, x['subcountry']) for x in cities]
        rated_cities = [(info, r) for info, r in zip(cities, ratios)]
        
        # Best match with shortest name
        return sorted(rated_cities, key = lambda x: (x[1], -len(x[0])), reverse = True)[0]
        
    def lookup(country, city):
        '''
            Finds country based upon country and city
            country - country name or country code
            city - name of city
        '''
        if country.lower() == 'u.s.':
            # Picks up common US acronym
            country = "US"
       
        if len(country) > 3:
            # Must be country since too long for abbreviation
            if city:
                # Find closest city country pair in list of cities
                city_info = check_city_country(city, country)
                if city_info:
                    return city_info[0]['country']
           
            # No city, so find closest country in list of countries (2 code abbreviations reverse lookup)
            countries = find_closest_country(country)
            if countries:
                return countries[0]
            
            return None
        elif len(country) == 3:
            # 3 letter abbreviation
            country = country.upper()
            return alpha3.get(country, None)
        
        elif len(country) == 2:
            # Two letter country abbreviation
            country = country.upper()
            if not country in states:
                # Not a state code, so lookup contry from code
                return alpha2.get(country, None)
            
            if not country in alpha2:
                # Not a country code, so must be state code for US
                return "United States of America"
            
            # Could be country of state code
            
            if city:
                # Have 2 digit code (could be country or state)
                pos_country = alpha2[country]  # possible country
                pos_state = states[country]    # possible state
                
                # check closest country with this city
                pos_countries = check_city_country(city, pos_country)
                
                # If state code, country would be United States
                pos_us = check_city_country(city, "United States")
                
                if pos_countries[1] > pos_us[1]:
                    # Provided better match as country code
                    return pos_countries[0]['country']
                else:
                    # Provided better match as state code (i.e. "United States")
                    return pos_us[0]['country']
            else:
                return alpha2[country]
                 
        else:
            return None
       
    
    df = pd.read_excel('country_test.xlsx') # Load Excel File
    df.fillna('', inplace=True)
    
    # Get name of country based upon country and city
    df['country_'] = df.apply(lambda row: lookup(row['country'], row['city']), axis = 1)
    
    结果数据帧

           country        city                  country_
    0            u.s.              United States of America
    1              DZ                               Algeria
    2              AS                        American Samoa
    3              co    Longmont             United States
    4              co      Bogota                  Colombia
    5              AL                               Albania
    6              AL  Huntsville             United States
    7             usa              United States of America
    8             AFG                           Afghanistan
    9             BLR       Minsk                   Belarus
    10            AUS                             Australia
    11  united states              United States of America
    12          Korea       seoul               South Korea
    13          Korea   Pyongyang               North Korea