Python 无法从输出Url列表中查找Url
我下面的代码给出了一个url列表。如果我想要任何特定的url,如何解决此问题: 我的代码如下:Python 无法从输出Url列表中查找Url,python,python-3.x,web-scraping,fuzzy-search,Python,Python 3.x,Web Scraping,Fuzzy Search,我下面的代码给出了一个url列表。如果我想要任何特定的url,如何解决此问题: 我的代码如下: import bs4, requests index_pages = ('https://www.tripadvisor.in/Hotels-g60763-oa{}-New_York_City_New_York-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 180, 30)) urls = [] with requests.sessio
import bs4, requests
index_pages = ('https://www.tripadvisor.in/Hotels-g60763-oa{}-New_York_City_New_York-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 180, 30))
urls = []
with requests.session() as s:
for index in index_pages:
r = s.get(index)
soup = bs4.BeautifulSoup(r.text, 'lxml')
url_list = [i.get('href') for i in soup.select('.property_title')]
urls.append(url_list)
print(url_list)
New_York_City_New_York.html', '/Hotel_Review-g60763-d93543-Reviews-Shelburne_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1485603-Reviews-Comfort_Inn_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d93340-Reviews-Hotel_Elysee_by_Library_Hotel_Collection-New_York_City_New_York.html', '/Hotel_Review-g60763-d1641016-Reviews-The_Chatwal_A_Luxury_Collection_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d93585-Reviews-Lowell_Hotel-New_York_City_New_York.html']
D:\anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
['/Hotel_Review-g60763-d277882-Reviews-Hampton_Inn_Manhattan_Seaport_Financial_District-New_York_City_New_York.html', '/Hotel_Review-g60763-d3529145-Reviews-Holiday_Inn_Express_Manhattan_Times_Square_South-New_York_City_New_York.html', '/Hotel_Review-g60763-d208453-Reviews-Hilton_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d249711-Reviews-The_Hotel_at_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d1158753-Reviews-Kimpton_Ink48_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1186070-Reviews-Marriott_Vacation_Club_Pulse_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d1938661-Reviews-Row_NYC_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93345-Reviews-Skyline_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d217616-Reviews-Kimpton_Muse_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1888977-Reviews-The_Pearl_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d223021-Reviews-Club_Quarters_Hotel_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d611947-Reviews-New_York_Hilton_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d4274398-Reviews-Courtyard_New_York_Manhattan_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456416-Reviews-The_Dominick_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d122014-Reviews-Gild_Hall_a_Thompson_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d2622936-Reviews-Wyndham_Garden_Chinatown-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456560-Reviews-Kimpton_Hotel_Eventi-New_York_City_New_York.html', '/Hotel_Review-g60763-d249710-Reviews-Morningside_Inn-New_York_City_New_York.html', '/Hotel_Review-g60763-d2079052-Reviews-YOTEL_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d224214-Reviews-The_Bryant_Park_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1785018-Reviews-The_James_New_York_SoHo-New_York_City_New_York.html', '/Hotel_Review-g60763-d247814-Reviews-The_Gatsby_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d112039-Reviews-Hotel_Newton-New_York_City_New_York.html', '/Hotel_Review-g60763-d612263-Reviews-Hotel_Mela-New_York_City_New_York.html', '/Hotel_Review-g60763-d99392-Reviews-Hotel_Metro-New_York_City_New_York.html', '/Hotel_Review-g60763-d4446427-Reviews-Hotel_Boutique_At_Grand_Central-New_York_City_New_York.html', '/Hotel_Review-g60763-d1503474-Reviews-Distrikt_Hotel_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d93467-Reviews-Gardens_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93603-Reviews-The_Pierre_A_Taj_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d113311-Reviews-The_Peninsula_New_York-New_York_City_New_York.html']
现在,我得到的输出是URL列表。输出如下所示:
import bs4, requests
index_pages = ('https://www.tripadvisor.in/Hotels-g60763-oa{}-New_York_City_New_York-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 180, 30))
urls = []
with requests.session() as s:
for index in index_pages:
r = s.get(index)
soup = bs4.BeautifulSoup(r.text, 'lxml')
url_list = [i.get('href') for i in soup.select('.property_title')]
urls.append(url_list)
print(url_list)
New_York_City_New_York.html', '/Hotel_Review-g60763-d93543-Reviews-Shelburne_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1485603-Reviews-Comfort_Inn_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d93340-Reviews-Hotel_Elysee_by_Library_Hotel_Collection-New_York_City_New_York.html', '/Hotel_Review-g60763-d1641016-Reviews-The_Chatwal_A_Luxury_Collection_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d93585-Reviews-Lowell_Hotel-New_York_City_New_York.html']
D:\anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
['/Hotel_Review-g60763-d277882-Reviews-Hampton_Inn_Manhattan_Seaport_Financial_District-New_York_City_New_York.html', '/Hotel_Review-g60763-d3529145-Reviews-Holiday_Inn_Express_Manhattan_Times_Square_South-New_York_City_New_York.html', '/Hotel_Review-g60763-d208453-Reviews-Hilton_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d249711-Reviews-The_Hotel_at_Times_Square-New_York_City_New_York.html', '/Hotel_Review-g60763-d1158753-Reviews-Kimpton_Ink48_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1186070-Reviews-Marriott_Vacation_Club_Pulse_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d1938661-Reviews-Row_NYC_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93345-Reviews-Skyline_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d217616-Reviews-Kimpton_Muse_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1888977-Reviews-The_Pearl_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d223021-Reviews-Club_Quarters_Hotel_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d611947-Reviews-New_York_Hilton_Midtown-New_York_City_New_York.html', '/Hotel_Review-g60763-d4274398-Reviews-Courtyard_New_York_Manhattan_Times_Square_West-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456416-Reviews-The_Dominick_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d122014-Reviews-Gild_Hall_a_Thompson_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d2622936-Reviews-Wyndham_Garden_Chinatown-New_York_City_New_York.html', '/Hotel_Review-g60763-d1456560-Reviews-Kimpton_Hotel_Eventi-New_York_City_New_York.html', '/Hotel_Review-g60763-d249710-Reviews-Morningside_Inn-New_York_City_New_York.html', '/Hotel_Review-g60763-d2079052-Reviews-YOTEL_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d224214-Reviews-The_Bryant_Park_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d1785018-Reviews-The_James_New_York_SoHo-New_York_City_New_York.html', '/Hotel_Review-g60763-d247814-Reviews-The_Gatsby_Hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d112039-Reviews-Hotel_Newton-New_York_City_New_York.html', '/Hotel_Review-g60763-d612263-Reviews-Hotel_Mela-New_York_City_New_York.html', '/Hotel_Review-g60763-d99392-Reviews-Hotel_Metro-New_York_City_New_York.html', '/Hotel_Review-g60763-d4446427-Reviews-Hotel_Boutique_At_Grand_Central-New_York_City_New_York.html', '/Hotel_Review-g60763-d1503474-Reviews-Distrikt_Hotel_New_York_City-New_York_City_New_York.html', '/Hotel_Review-g60763-d93467-Reviews-Gardens_NYC_an_Affinia_hotel-New_York_City_New_York.html', '/Hotel_Review-g60763-d93603-Reviews-The_Pierre_A_Taj_Hotel_New_York-New_York_City_New_York.html', '/Hotel_Review-g60763-d113311-Reviews-The_Peninsula_New_York-New_York_City_New_York.html']
现在,如果我要从上面的列表中查找任何特定的url,如何做到这一点?
例如:对于希尔顿时代广场,如何从上面的列表中找到url 查找准确的url:
def findExactUrl(urlList, searched):
for url in urllist:
if url == searched:
reurn url
空闲时你可以打电话
>>findExactUrl(url_list, "http://maritonhotel.com/123")
## if such url is in your list
>>"mariton Hotel"
## or if such url is not there, nothing should show, just:
>>
或者,从.py文件调用:
myUrl = findExactUrl(url_list, "http://maritonhotel.com/123")
print(myUrl)
>>"http://maritonhotel.com/123"
您可以编辑函数以返回True
或i
以查找其索引
更模糊的搜索
def findOccurence(urlList, searched):
foundUrls = []
for url in urllist:
if url.contains(searched):
foundUrls.append(url)
return foundUrls
若要从字符串中删除某些子字符串,只需调用.replace()方法
如果您有什么不明白的地方,请告诉我。另外,请认真考虑购买一些Python书给初学者,或者做一些在线课程。 这就是我要做的:
keyword = "Hilton_Times_Square"
target_urls = [ e for e in url_list if keyword in e ]
如何打印该url?示例:假设我的列表中有5家万豪酒店。但我只需要打印一家特定的酒店,那么我该怎么做?>>a=findExactUrl(url\u list,”)<然后你只需调用>>打印(a)@Shaon你还有什么问题吗?非常感谢@Kajbo…我只想在我的输出中添加这个url…这样看起来:…我不明白。你能详细说明一下吗?我该如何打印吗?当我使用print(target\u url)时,它给出了一个输出显示str对象不可调用要打印它,你应该在target\u url:print中为u使用循环:
(u)
这与Kajbo答案中的“更多模糊搜索”部分相同