Python 网络垃圾-don';t显示html代码的文本部分

Python 网络垃圾-don';t显示html代码的文本部分,python,selenium,xpath,web-scraping,css-selectors,Python,Selenium,Xpath,Web Scraping,Css Selectors,当我试图通过python使用Selenium库创建网站时,我遇到了一个问题。 关键是,我想获得一些有关收集到此网站的歌曲的信息: 但是,当我试图从相应的html代码中提取文本时,该过程返回一个空列表 如果我在浏览器(Chrome)中查看html代码,我会看到文本部分,但是当我在python中查看相同的代码时,文本部分不会出现 这是我的密码: browser = webdriver.Chrome() browser.get("https://bandcamp.com/?g=all&s=to

当我试图通过python使用Selenium库创建网站时,我遇到了一个问题。 关键是,我想获得一些有关收集到此网站的歌曲的信息:

但是,当我试图从相应的html代码中提取文本时,该过程返回一个空列表

如果我在浏览器(Chrome)中查看html代码,我会看到文本部分,但是当我在python中查看相同的代码时,文本部分不会出现

这是我的密码:

browser = webdriver.Chrome()
browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")

name_song = browser.find_elements_by_css_selector("a.item-title")
name_artist = browser.find_elements_by_css_selector("a.item-artist")

genre = browser.find_elements_by_class_name("item-genre")
print(name_song, name artist, genre)
当我打印这三个变量时,我得到了html代码,但是我无法从中提取任何内容。我怎样才能解决这个问题?非常感谢你的帮助

我的目标是让“启示论者”和“克里格斯马切尼”以及“金属”被分配到一个不同的变量


您只需进入每个元素即可获得所需内容。上面的代码返回三个selenium元素对象列表。每个对象都有您可以访问的属性,其中一个属性是
.text

如果我运行上面的代码,我就可以访问
name\u song

[<selenium.webdriver.remote.webelement.WebElement (session="83853054732fa0a5bfbc8a7e32a1e05b", element="0.4629143928625561-1")>,...
然后索引到该列表中:

[i.text for i in name_song]
'Apocalypticists'

你是如此接近。您只需诱导WebDriverWait使所需的元素可见,并将WebElements存储在三个不同的列表中,然后对它们进行迭代以打印所需的文本,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    browser = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")
    name_song = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.item-title")))
    name_artist = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.item-artist")))
    genre = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH,"//a[@class='item-artist']//following::span[1]")))
    for song, artist, gen in zip(name_song, name_artist, genre):
        print("%s song is by %s and is of %s genre" % (song.text, artist.text, gen.text))
    
  • 控制台输出:

    Apocalypticists song is by Kriegsmaschine and is of metal genre
    The Path song is by Carbon Based Lifeforms and is of ambient genre
    Christmas Time Is Here (N & S America Edition) song is by Khruangbin and is of funk genre
    Christmas Time Is Here (Excluding N & S America) song is by Khruangbin and is of funk genre
    Snailchan Adventure song is by Ujico*/Snail's House and is of electronic genre
    O God who avenges, shine forth. Rise up, Judge of the Earth; pay back to the proud what they deserve. song is by the body and is of metal genre
    T-Rex EP song is by Ben Prunty and is of soundtrack genre
    Woodland Womp (24bit 96kHz) song is by Kalya Scintilla and is of electronic genre
    

元素对象不会给出innerText值。您需要调用element.text来获取它。
browser.find\u elements\u by\u class\u name(“项目类型”)
返回23个元素。定位器也必须更改,以获得适当的8元素

browser = webdriver.Chrome()
browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")

name_song = browser.find_elements_by_css_selector("a.item-title")
name_artist = browser.find_elements_by_css_selector("a.item-artist")
genre = browser.find_elements_by_css_selector("span.item-genre")

for i in range(len(name_song)-1):
  print(name_song[i].text)
  print(name_artist[i].text)
  print(genre[i].text)

您发布的代码仅查找作为webdriver对象的元素列表。您是否尝试过在元素列表中建立索引,然后在每个元素之后添加
.text
,以获取文本属性?例如,
[i.text For i in name_song]
?我尝试过这样做,但问题是我得到了一个空str,因为它找不到任何可以从html代码中提取的内容。我只是运行了完全相同的代码,得到了您想要的输出,请参阅我的答案
browser = webdriver.Chrome()
browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")

name_song = browser.find_elements_by_css_selector("a.item-title")
name_artist = browser.find_elements_by_css_selector("a.item-artist")
genre = browser.find_elements_by_css_selector("span.item-genre")

for i in range(len(name_song)-1):
  print(name_song[i].text)
  print(name_artist[i].text)
  print(genre[i].text)