Javascript 如何从动态更新网页中提取数据

Javascript 如何从动态更新网页中提取数据,javascript,python,selenium-webdriver,xpath,webdriverwait,Javascript,Python,Selenium Webdriver,Xpath,Webdriverwait,我想从丝芙兰网站上抓取评论。审查是动态更新的 经过检查,我发现审查是在这里的HTML代码 <div class="css-eq4i08 " data-comp="Ellipsis Box">Honestly I never write reviews but this is a must if you have frizzy after even after straightening it! It smells fantastic and it works wonders de

我想从丝芙兰网站上抓取评论。审查是动态更新的

经过检查,我发现审查是在这里的HTML代码

<div class="css-eq4i08 " data-comp="Ellipsis Box">Honestly I never write 
reviews but this is a must if you have frizzy after even after straightening 
it! It smells fantastic and it works wonders definitely will be restocking once 
I’m done this one !!</div>
如果我写
find\u element\u by\u class
,它会给我空白

最好的选择是什么

我正在尝试使用带有属性的xpath。代码不起作用。
请有人帮助我了解什么是最好的解决方案?

要从Sephora网站上获取评论,您必须引导WebDriverWait使元素可见,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")
    driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='tabpanel0']/div//b[contains(., 'What Else You Need to Know')]"))))
    reviews = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-comp='GridCell Box']//div[@data-comp='Ellipsis Box']")))
    for review in reviews:
        print(review.get_attribute("innerHTML"))
    
  • 控制台输出:

    Honestly I never write reviews but this is a must if you have frizzy after even after straightening it! It smells fantastic and it works wonders definitely will be restocking once I’m done this one !!
    I really like this product. I was looking for something to tame frizz and fly aways during the winter and this does the job. At first I was nervous it might give a greasy look but it makes my hair smooth and soft. Scent is actually a little subtle for me, but still nice.
    This oil-serum is perfect for the right level of hydration without the feel of oil residue. Great for all hair types and my new go-to product.
    I LOVE how weightless this oil feels in my hair.. takes away all of my flyaways without looking of feeling greasy.. the packaging is COOL (travel-friendly) and it smells wonderful!!
    I tried this when it first dropped on their website. I’ve been using it for about 3 weeks now. And I have to say its just OKAY. Nothing super special about it. I haven’t noticed super smooth hair that isn’t given with other products that cost less. It’s just like any other smoothing serum. I also can’t figure out what the smell is. It doesn’t really smell as pleasant as their other products.
    in love!! A tiny bit goes a long way. No more fly aways. No more frizz from touch or environment.
    

要从丝芙兰网站上抓取评论,您必须诱导WebDriverWait以使元素可见,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")
    driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='tabpanel0']/div//b[contains(., 'What Else You Need to Know')]"))))
    reviews = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-comp='GridCell Box']//div[@data-comp='Ellipsis Box']")))
    for review in reviews:
        print(review.get_attribute("innerHTML"))
    
  • 控制台输出:

    Honestly I never write reviews but this is a must if you have frizzy after even after straightening it! It smells fantastic and it works wonders definitely will be restocking once I’m done this one !!
    I really like this product. I was looking for something to tame frizz and fly aways during the winter and this does the job. At first I was nervous it might give a greasy look but it makes my hair smooth and soft. Scent is actually a little subtle for me, but still nice.
    This oil-serum is perfect for the right level of hydration without the feel of oil residue. Great for all hair types and my new go-to product.
    I LOVE how weightless this oil feels in my hair.. takes away all of my flyaways without looking of feeling greasy.. the packaging is COOL (travel-friendly) and it smells wonderful!!
    I tried this when it first dropped on their website. I’ve been using it for about 3 weeks now. And I have to say its just OKAY. Nothing super special about it. I haven’t noticed super smooth hair that isn’t given with other products that cost less. It’s just like any other smoothing serum. I also can’t figure out what the smell is. It doesn’t really smell as pleasant as their other products.
    in love!! A tiny bit goes a long way. No more fly aways. No more frizz from touch or environment.
    

有人提出建议吗?
@data-comp()
应该是
@data-comp
。。。没有参数。@JeffcD在我进行更改后不起作用。请指导我如何访问上述评论?有人可以指导我的路径吗?只是将类作为路径不起作用。这里的问题是,当你向下滚动页面时,评论会被加载。只需将页面向下滚动到评论所在的位置,明确地等待评论加载到页面上,然后通过xpath调用
find\u element\u
即可获得评论文本。此外,您的问题中的xpath也存在问题
//div[@id='ratings-reviews']//div[@data comp='省略号框']
应该可以做到这一点。它的
省略号
带有2个“l”有人有建议吗?
@data-comp()
应该是
@data-comp
。。。没有参数。@JeffcD在我进行更改后不起作用。请指导我如何访问上述评论?有人可以指导我的路径吗?只是将类作为路径不起作用。这里的问题是,当你向下滚动页面时,评论会被加载。只需将页面向下滚动到评论所在的位置,明确地等待评论加载到页面上,然后通过xpath调用
find\u element\u
即可获得评论文本。此外,您的问题中的xpath也存在问题
//div[@id='ratings-reviews']//div[@data comp='省略号框']
应该可以做到这一点。它的
省略号
加上2个“l”谢谢。@DebanjanB.现在我需要另一个帮助。如果我想通过打开inspect来访问评论,那么就去网络查看它将访问的url?我可以将我的post/get请求发送到该url?有可能吗?@HimanishBhattacharjee你能就你的新要求提出一个新问题吗?StackOverflow贡献者将很乐意帮助您。谢谢。@DebanjanB。现在我需要另一个帮助。如果我想通过打开inspect来访问评论,请转到网络并检查它将访问的url?我可以将我的post/get请求发送到该url?可以这样做吗?@HimanishBhattacharjee您能向您的新用户提出一个新问题吗要求?StackOverflow贡献者将很乐意帮助您。