Python 名称的二进制搜索_Python_Search_Binary Search

Python 名称的二进制搜索

python search

Python 名称的二进制搜索,python,search,binary-search,Python,Search,Binary Search,我在一个单独的文件（countries.txt）中有一个国家列表，我需要做一个二进制搜索来查找一个国家，并为它声明提供的信息我的档案： Afghanistan, 647500.0, 25500100 Albania, 28748.0, 2821977 Algeria, 2381740.0, 38700000 American Samoa, 199.0, 55519 Andorra, 468.0, 76246 Angola, 1246700.0,

我在一个单独的文件（countries.txt）中有一个国家列表，我需要做一个二进制搜索来查找一个国家，并为它声明提供的信息

我的档案：

Afghanistan,    647500.0,   25500100

Albania,    28748.0,    2821977

Algeria,    2381740.0,  38700000

American Samoa, 199.0,  55519

Andorra,    468.0,  76246

Angola, 1246700.0,  20609294

如果我想找到阿尔巴尼亚的面积和人口，我把

getCountry（阿尔巴尼亚）

放在shell中，我如何让它说明提供的信息

到目前为止我有这个

def getCountry(key):

    start = "%s" #index
    end = len("%s")-1 #index
    while start<=end:
        mid = (start + end) / 2
        if '%s'[mid] == key: #found it!
            return True
        elif "%s"[mid] > key:
            end = mid -1
        else:
            start = mid + 1
    #end < start 
    return False

def getCountry（键）：
start=“%s”#索引
end=len（“%s”）-1#索引
启动键时：
结束=中间-1
其他：
开始=中间+1
#结束<开始
返回错误

正如Ashwini在他的评论中所建议的，您可以使用python中的字典。它看起来像这样：

countries = {'Afghanistan': (647500.0, 25500100),

    'Albania': (28748.0, 2821977),

    'Algeria': (2381740.0, 38700000),

    'American Samoa': (199.0, 55519),

    'Andorra': (468.0, 76246),

    'Angola': (1246700.0, 20609294)}

print countries['Angola'][0]

def get_countries(filename):
    with open(filename) as f:
        return [line.strip().split(',') for line in f]

def get_country_by_name(countries, name):
    lo, hi = 0, len(countries) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countries[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            return country
    return countries[lo] if countries[lo][0] == name else None

if __name__ == '__main__':
    a = get_countries("countries.csv")
    print(a)
    c = get_country_by_name(a, "Albania")
    print(c)

from bisect import bisect_left

def get_country_by_name(countries, name):
    country_names = [country[0] for country in countries]
    i = bisect_left(country_names, name)
    return countries[i]

你可以从

了解更多有关

字典

和

元组的信息。另一个答案是正确的。你应该使用字典，但我猜这是一个作业，你首先需要的是一个列表
with open("countries.txt") as f:
     #filter(none,a_list) will remove all falsey values (empty strings/lists/etc)
     #map(some_function,a_list) will apply a function to all elements in a list and return the results as a new list
     #in this case the iterable we are handing in as a_list is an open file handle and we are spliting each line on ","
     country_list = filter(None,map(lambda x:x.split(","),f))

然后你只需要像其他二进制搜索一样搜索你的有序列表
为了进行二进制搜索，您可以执行以下操作（递归版本）
def bin_搜索（已排序的列表，目标）：
mid\u pt=len（排序列表）//2
如果目标<排序列表[中间点]：
返回bin\u搜索（已排序的列表[：mid\u pt]，目标）
elif目标>已排序列表[中间点]：
返回bin\u搜索（已排序的列表[mid\u pt:]，目标）
elif target==已排序的列表[中间点]：
中途返回

在您的情况下，您将需要一些小的修改
我将使用字典：
def get_countries(filename):
    with open(filename) as f:
        country_iter = (line.strip().split(',') for line in f)
        return {
            country: {"area": area, "population": population}
            for country, area, population in country_iter
        }

if __name__ == '__main__':
    d = get_countries("countries.csv")
    print(d)

如果你真的把心放在二进制搜索上，它看起来更像这样：
countries = {'Afghanistan': (647500.0, 25500100),

    'Albania': (28748.0, 2821977),

    'Algeria': (2381740.0, 38700000),

    'American Samoa': (199.0, 55519),

    'Andorra': (468.0, 76246),

    'Angola': (1246700.0, 20609294)}

print countries['Angola'][0]

def get_countries(filename):
    with open(filename) as f:
        return [line.strip().split(',') for line in f]

def get_country_by_name(countries, name):
    lo, hi = 0, len(countries) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countries[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            return country
    return countries[lo] if countries[lo][0] == name else None

if __name__ == '__main__':
    a = get_countries("countries.csv")
    print(a)
    c = get_country_by_name(a, "Albania")
    print(c)

from bisect import bisect_left

def get_country_by_name(countries, name):
    country_names = [country[0] for country in countries]
    i = bisect_left(country_names, name)
    return countries[i]

分步解决这个问题
从排序列表开始，在函数中对列表执行二进制搜索
确保它适用于空列表、一个项目的列表等
编写一个函数来获取未排序的列表，对其进行排序，并从第一个函数返回结果
编写一个函数，以字符串作为键，以其他字符串作为数据，获取元组列表。它应该对密钥上的数据进行排序，并返回您想要的内容
编写一个函数，读取文件并构造与4兼容的数据，然后返回所选项
拍拍你自己的背，用容易理解的步骤解决你更复杂的问题
注：这显然是学习如何实现算法的作业。如果真的要从文件中找到信息，那么使用字典是错误的。正确的做法是阅读每一行，直到发现该国对文件中平均一半的条目进行单一比较。没有浪费的存储，没有浪费的时间比较或散列。
如果您将数据存储在字典中，并使用国家名称作为关键字，则可以在O（1）
时间内完成此操作。我对此不熟悉。如何将文件存储在字典中，然后使用itI i怀疑它用于需要二进制搜索的作业…是的，它用于作业，但我问了一个问题，因为我不知道如何对其进行二进制搜索。我有以下脚本：def getCountry（key）：with open（“countries.txt”）作为f:country_list=filter（无，map（lambda x:x.split（“，”，f））start=“%s”#index end=len（“%s”）-1#index，而start key:end=mid-1 else:start=mid+1#endcountry\u list

…但是它仍然需要一些小的工作来处理您的数据好吧，我只是有点困惑，在脚本中，这会是什么样子，包括使用open（“countries.txt”）作为f:country\u list=filter（无，映射（lambda x:x.split（，“”，f））您的二进制搜索错误。您需要颠倒这两行：return bin\u search（已排序的列表[mid\u pt:]，目标）
和return bin\u search（已排序的列表[mid\u pt]，目标）
。如果目标确实在上半部分，则在下半部分递归，如果目标在下半部分，则在上半部分递归。如果您从txt文件中构建dict:PThanks进行二进制搜索，则我会在一瞬间+1这一点。能否在结尾解释有关if语句的所有信息？如果name='main'：a=get_countries（“countries.csv”）print（a）c=get_country_by_name（a，“阿尔巴尼亚”）print（c）这是代码的测试驱动程序。它调用代码并显示其工作方式。这里的文档说明：OK和for def getcountry（countries，name）：我应该为“countries”变量设置什么？对不起，我是一个真正的初学者，get\u country\u by\u name
有两个参数：（1）国家列表和（2）国家名称。每个country
都是由（1）国家名称、（2）国家地区和（3）组成的列表国家的人口。这正是get\u countries
应该从文件中加载的内容。这听起来可能很愚蠢，但我如何为第一个参数输入“国家列表”？