Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用硒时不完全的擦伤_Python_Selenium_Selenium Webdriver_Beautifulsoup - Fatal编程技术网

Python 使用硒时不完全的擦伤

Python 使用硒时不完全的擦伤,python,selenium,selenium-webdriver,beautifulsoup,Python,Selenium,Selenium Webdriver,Beautifulsoup,我正试图从Backcountry.com的评论栏中找到答案。该网站使用动态加载更多部分,即当你想加载更多评论时,url不会改变。我使用SeleniumWebDriver与加载更多评论的按钮交互,并使用BeautifulSoup来抓取评论 我能够成功地与LoadMore按钮交互并加载所有可用的评论。我还能够在您尝试“加载更多”按钮之前抓取出现的初始评论 总而言之:我可以与“加载更多”按钮交互,我可以刮取可用的初始评论,但我不能刮取加载所有评论后可用的所有评论 我已经尝试更改html标记,看看这是否

我正试图从Backcountry.com的评论栏中找到答案。该网站使用动态加载更多部分,即当你想加载更多评论时,url不会改变。我使用SeleniumWebDriver与加载更多评论的按钮交互,并使用BeautifulSoup来抓取评论

我能够成功地与LoadMore按钮交互并加载所有可用的评论。我还能够在您尝试“加载更多”按钮之前抓取出现的初始评论

总而言之:我可以与“加载更多”按钮交互,我可以刮取可用的初始评论,但我不能刮取加载所有评论后可用的所有评论

我已经尝试更改html标记,看看这是否会有所不同。我试图增加睡眠时间,以防刮板没有足够的时间来完成它的工作

# URL and Request code for BeautifulSoup

url_filter_bc = 'https://www.backcountry.com/msr-miniworks-ex-ceramic-water-filter?skid=CAS0479-CE-ONSI&ti=U2VhcmNoIFJlc3VsdHM6bXNyOjE6MTE6bXNy'
res_filter_bc = requests.get(url_filter_bc, headers = {'User-agent' : 'notbot'})


# Function that scrapes the reivews

def scrape_bc(request, website):
    newlist = []
    soup = BeautifulSoup(request.content, 'lxml')
    newsoup = soup.find('div', {'id': 'the-wall'})
    reviews = newsoup.find('section', {'id': 'wall-content'})

    for row in reviews.find_all('section', {'class': 'upc-single user-content-review review'}):
        newdict = {}
        newdict['review']  = row.find('p', {'class': 'user-content__body description'}).text
        newdict['title']   = row.find('h3', {'class': 'user-content__title upc-title'}).text
        newdict['website'] = website

        newlist.append(newdict)

    df = pd.DataFrame(newlist)
    return df


# function that uses Selenium and combines that with the scraper function to output a pandas Dataframe

def full_bc(url, website):
    driver = connect_to_page(url, headless=False)
    request = requests.get(url, headers = {'User-agent' : 'notbot'})
    time.sleep(5)
    full_df = pd.DataFrame()
    while True:
        try:
            loadMoreButton = driver.find_element_by_xpath("//a[@class='btn js-load-more-btn btn-secondary pdp-wall__load-more-btn']")
            time.sleep(2)
            loadMoreButton.click()
            time.sleep(2)
        except:
            print('Done Loading More')

#             full_json = driver.page_source
            temp_df = pd.DataFrame()
            temp_df = scrape_bc(request, website)

            full_df = pd.concat([full_df, temp_df], ignore_index = True)

            time.sleep(7)
            driver.quit()
            break

    return  full_df 
我期望一个包含113行和三列的熊猫数据帧。
我得到了一个包含18行和3列的熊猫数据框架。

好的,您单击了
loadMoreButton
并加载了更多评论。但是你继续向
scrape_bc
输入你下载过的相同
request
内容,完全与Selenium分开

替换
请求。使用
driver.page\u source
获取(…)
,并确保在调用
scrape\u bc(…)
之前循环中有
driver.page\u source

request = driver.page_source
temp_df = pd.DataFrame()
temp_df = scrape_bc(request, website)