Python 从页面创建链接列表https://www.stubhub.com/new-york-rangers-tickets/performer/2764/ 包含“纽约游骑兵队”的文字

Python 从页面创建链接列表https://www.stubhub.com/new-york-rangers-tickets/performer/2764/ 包含“纽约游骑兵队”的文字,python,selenium,xpath,beautifulsoup,webdriverwait,Python,Selenium,Xpath,Beautifulsoup,Webdriverwait,我试图用python创建一个包含特定字符串的页面中所有链接的列表。例如,我想从这个页面中找到所有包含“纽约流浪者@”的链接 感谢所有的帮助-如果这是一个愚蠢的问题,很抱歉,但是在任何地方都找不到它。数据嵌入在标记内的页面中。您可以使用此示例解析数据(使用re和json模块): 首先,你需要获得你想要搜索链接的网页的内容。我强烈建议使用一个简单的Python HTTP库: 导入请求 response=request.get(https://www.stubhub.com/new-york-rang

我试图用python创建一个包含特定字符串的页面中所有链接的列表。例如,我想从这个页面中找到所有包含“纽约流浪者@”的链接


感谢所有的帮助-如果这是一个愚蠢的问题,很抱歉,但是在任何地方都找不到它。

数据嵌入在
标记内的页面中。您可以使用此示例解析数据(使用
re
json
模块):


首先,你需要获得你想要搜索链接的网页的内容。我强烈建议使用一个简单的Python HTTP库:

导入请求
response=request.get(https://www.stubhub.com/new-york-rangers-tickets/performer/2764/)
由于某些原因,此特定URL需要用户代理标头,因此您应在请求时发送一个标头:

url='1〕https://www.stubhub.com/new-york-rangers-tickets/performer/2764/'
用户代理='Mozilla/5.0(X11;Ubuntu;Linux x86;rv:72.0)Gecko/20100101 Firefox/72.0'
response=requests.get(url,headers={'User-Agent':User\u-Agent})
然后,您可以使用开始分析页面内容。可以使用方法
find_all
将编译后的正则表达式作为
text
参数传递,以查找包含特定文本的所有
a
标记:

从bs4导入美化组
进口稀土
soup=BeautifulSoup(response.content,“html.parser”)
rangers\u anchor\u tags=soup.find\u all(“a”,text=re.compile(r)。*\b纽约流浪者队在\b.*)
URL=[anchor[“href”]用于rangers_anchor_标记中的锚定]
URL
,则是锚定标记的相应内部文本包含所述字符串的URL列表。

使用您将不需要创建所有链接的列表,即
href
属性从包含文本的页面纽约流浪者您需要为导入WebDriverWaitode>所有元素的可见性()并且您可以使用以下选项:

  • 使用
    XPATH

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    # configuring the driver for optimum results
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')   
    driver.get("https://www.stubhub.com/new-york-rangers-tickets/performer/2764/")
    
    # just one line of code
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[./div[contains(., 'New York Rangers')]]")))])
    
  • 控制台输出:

    ['https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-1-31-2020/event/104217508/', 'https://www.stubhub.com/detroit-red-wings-tickets-detroit-red-wings-detroit-little-caesars-arena-2-1-2020/event/104215245/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-3-2020/event/104212773/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-5-2020/event/104215469/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-7-2020/event/104217518/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-9-2020/event/104214839/', 'https://www.stubhub.com/winnipeg-jets-tickets-winnipeg-bell-mts-place-2-11-2020/event/104212882/', 'https://www.stubhub.com/minnesota-wild-tickets-minnesota-wild-saint-paul-xcel-energy-center-2-13-2020/event/104216234/', 'https://www.stubhub.com/columbus-blue-jackets-tickets-columbus-blue-jackets-columbus-nationwide-arena-2-14-2020/event/104212942/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-16-2020/event/104217520/', 'https://www.stubhub.com/chicago-blackhawks-tickets-chicago-blackhawks-chicago-united-center-2-19-2020/event/104213910/', 'https://www.stubhub.com/carolina-hurricanes-tickets-carolina-hurricanes-raleigh-pnc-arena-2-21-2020/event/104212812/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-22-2020/event/104217524/', 'https://www.stubhub.com/new-york-islanders-tickets-new-york-islanders-uniondale-nycb-live-home-of-the-nassau-veterans-memorial-coliseum-2-25-2020/event/104354662/', 'https://www.stubhub.com/montreal-canadiens-tickets-montreal-bell-centre-2-27-2020/event/104215418/', 'https://www.stubhub.com/philadelphia-flyers-tickets-philadelphia-flyers-philadelphia-wells-fargo-center-philadelphia-2-28-2020/event/104212712/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-1-2020/event/104215027/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-3-2020/event/104217528/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-5-2020/event/104215030/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-7-2020/event/104215474/']
    

您应该在问题中添加您迄今为止的尝试/想法:)此外,我们需要一些澄清。什么应该包含某个字符串?
标记内的文本?url本身?抱歉,第一篇帖子仍然在搞清楚协议。它会在标签内。别担心,里奇!您应该点击问题下方的“编辑”按钮添加这些澄清。欢迎使用堆栈溢出!:到底是什么问题?你在挣扎哪一部分?
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# configuring the driver for optimum results
options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')   
driver.get("https://www.stubhub.com/new-york-rangers-tickets/performer/2764/")

# just one line of code
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[./div[contains(., 'New York Rangers')]]")))])
['https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-1-31-2020/event/104217508/', 'https://www.stubhub.com/detroit-red-wings-tickets-detroit-red-wings-detroit-little-caesars-arena-2-1-2020/event/104215245/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-3-2020/event/104212773/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-5-2020/event/104215469/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-7-2020/event/104217518/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-9-2020/event/104214839/', 'https://www.stubhub.com/winnipeg-jets-tickets-winnipeg-bell-mts-place-2-11-2020/event/104212882/', 'https://www.stubhub.com/minnesota-wild-tickets-minnesota-wild-saint-paul-xcel-energy-center-2-13-2020/event/104216234/', 'https://www.stubhub.com/columbus-blue-jackets-tickets-columbus-blue-jackets-columbus-nationwide-arena-2-14-2020/event/104212942/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-16-2020/event/104217520/', 'https://www.stubhub.com/chicago-blackhawks-tickets-chicago-blackhawks-chicago-united-center-2-19-2020/event/104213910/', 'https://www.stubhub.com/carolina-hurricanes-tickets-carolina-hurricanes-raleigh-pnc-arena-2-21-2020/event/104212812/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-2-22-2020/event/104217524/', 'https://www.stubhub.com/new-york-islanders-tickets-new-york-islanders-uniondale-nycb-live-home-of-the-nassau-veterans-memorial-coliseum-2-25-2020/event/104354662/', 'https://www.stubhub.com/montreal-canadiens-tickets-montreal-bell-centre-2-27-2020/event/104215418/', 'https://www.stubhub.com/philadelphia-flyers-tickets-philadelphia-flyers-philadelphia-wells-fargo-center-philadelphia-2-28-2020/event/104212712/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-1-2020/event/104215027/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-3-2020/event/104217528/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-5-2020/event/104215030/', 'https://www.stubhub.com/new-york-rangers-tickets-new-york-rangers-new-york-madison-square-garden-3-7-2020/event/104215474/']