Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从Mixcloud播放列表中提取URL_Python_Css_Selenium_Xpath_Webdriverwait - Fatal编程技术网

Python 从Mixcloud播放列表中提取URL

Python 从Mixcloud播放列表中提取URL,python,css,selenium,xpath,webdriverwait,Python,Css,Selenium,Xpath,Webdriverwait,我需要一些帮助从mixcloud.com用户页面的锚href标签中提取URL。我知道该页面是使用javascript生成的,我正在使用selenium来解决这个问题,我已经成功地将类似的方法用于Youtube播放列表,但我无法实现这一点。这里是我试图提取的混合url的url https://www.mixcloud.com/caimanblack/ <div class="AudioCard__DetailsContainer-sc-1ltw4p1-6 euvMwc"

我需要一些帮助从mixcloud.com用户页面的锚href标签中提取URL。我知道该页面是使用javascript生成的,我正在使用selenium来解决这个问题,我已经成功地将类似的方法用于Youtube播放列表,但我无法实现这一点。这里是我试图提取的混合url的url

https://www.mixcloud.com/caimanblack/


<div class="AudioCard__DetailsContainer-sc-1ltw4p1-6 euvMwc">
<div class="AudioCardTitle__Container-sc-1kxsru9-1 hGblkL">
<div class="AudioCardPlayButton__PlayButtonContainer-sc-1iib1iv-0 diYcBm AudioCardTitle__PlayButton-sc-1kxsru9-2 dDAfgc" title="Play">
<div class="AudioCardPlayButton__PlayButtonIconContainer-sc-1iib1iv-3 izFLOx">
<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<title>Icon / 24 / Play Solid</title>
<path fill="#1E2337" d="M20 10.67L7.9 4 6 4.9v14.42l1.9.68L20 13.33z" fill-rule="evenodd"></path></svg></div>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" class="AudioCardPlayButton__PlayButtonRings-sc-1iib1iv-2 iIGcCU">
<circle class="ring-listened" cx="50%" cy="50%" r="22" style="stroke-dashoffset: 34.5575px; stroke-dasharray: 0px, 138.23px; stroke: rgb(243, 178, 166);"></circle>
<circle class="ring-remaining" cx="50%" cy="50%" r="22" style="stroke-dashoffset: 172.788px; stroke-dasharray: 138.23px, 0px;"></circle></svg></div>
<div class="AudioCardTitle__DetailsContainer-sc-1kxsru9-3 cTqEgM">
<a class="AudioCardTitle__PlainLink-sc-1kxsru9-0 AudioCardTitle__TitleLink-sc-1kxsru9-4 jKwuem" href="/caimanblack/93-94-dark-jungle-mix-5/">93-94 Dark Jungle Mix 5</a>
<div class="AudioCardTitle__OwnerText-sc-1kxsru9-5 gxeIb">by&nbsp;
<span class="hovercard-anchor AudioCardTitle__OwnerHovercard-sc-1kxsru9-7 cicNsQ">
<a class="AudioCardTitle__PlainLink-sc-1kxsru9-0 AudioCardTitle__OwnerLink-sc-1kxsru9-6 YOGda" href="/caimanblack/">Caiman Black</a>
你能试试这个吗

mixes = driver.find_elements_by_xpath("//a[contains(@class,'AudioCardTitle')]")
for mix in mixes:
    print(mix.get_attribute("href"))

要获取所有链接,请使用
WebDriverWait()
并等待所定位的所有元素的可见性()和以下
CSS
选择器

driver.get("https://www.mixcloud.com/caimanblack/")
AllLinks=[item.get_attribute("href") for item in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a[class^='AudioCardTitle__PlainLink'][class$='jKwuem']")))]
print(AllLinks)
print("Total Links : {}".format(len(AllLinks)))
控制台输出:

['https://www.mixcloud.com/caimanblack/93-94-dark-jungle-mix-5/', 'https://www.mixcloud.com/caimanblack/93-95-intelligent-db-mix-2/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-rollers-mix-2/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-rollers-mix-1/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-mix-1/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/93-95-intelligent-db-mix-1/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-4/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-3/', 'https://www.mixcloud.com/caimanblack/dub-7-king-tubby-others/', 'https://www.mixcloud.com/caimanblack/94-jungle-mix-2/', 'https://www.mixcloud.com/caimanblack/94-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/94-jungle-death/', 'https://www.mixcloud.com/caimanblack/94-94-jungle-mix-4/', 'https://www.mixcloud.com/caimanblack/93-94-jungle-mix-3/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-3/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-2/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-1/', 'https://www.mixcloud.com/caimanblack/dub-6-king-tubby-the-upsetters-augustus-pablo-others/', 'https://www.mixcloud.com/caimanblack/dub-5-augustus-pablo-revolutionaries-aggrovators-others/', 'https://www.mixcloud.com/caimanblack/dub-4-king-tubby-the-upsetters-linval-thompson-others/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-2/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-1/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-3/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-2/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/96-99-jazz-funk-drum-bass-mix-1/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-3/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-2/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-1/']
Total Links : 30

您可以使用下面的
xpath
,结果也是一样的

driver.get("https://www.mixcloud.com/caimanblack/")
AllLinks=[item.get_attribute("href") for item in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,"//a[starts-with(@class,'AudioCardTitle__PlainLink') and contains(@class,'jKwuem')]")))]
print(AllLinks)
print("Total Links : {}".format(len(AllLinks))) 

现在如果你想迭代,你可以用这个

for item in AllLinks:
    print(item)

谢谢你,这已经奏效了。因此,这只是搜索任何包含单词“AudioCardTitle”的锚定标记并解析url。而不是搜索特定的标记?在任何类名中都是,如果它包含字符串“AudioCardTitle”,它将进入您的列表;)非常好,谢谢你的回答,这确实有效,我会保留这些信息,以便在删除其他网站时参考。
for item in AllLinks:
    print(item)