如何使用selenium python逐个点击从网站获取数据_Python_Python 3.x_Selenium_Web Scraping_Web Crawler

如何使用selenium python逐个点击从网站获取数据

python python-3.x selenium web-scraping web-crawler

如何使用selenium python逐个点击从网站获取数据,python,python-3.x,selenium,web-scraping,web-crawler,Python,Python 3.x,Selenium,Web Scraping,Web Crawler,我正试图从网站上获取数据，但我想选择第一个1000链接打开一个接一个，并从那里获取数据我试过： list_links = driver.find_elements_by_tag_name('a') for i in list_links: print (i.get_attribute('href')) 通过这种方式获得不需要的额外链接例如：https://www.magicbricks.com/property-for-sale/residential-real-est

我正试图从网站上获取数据，但我想选择第一个1000链接打开一个接一个，并从那里获取数据

我试过：

list_links = driver.find_elements_by_tag_name('a')

for i in list_links:
        print (i.get_attribute('href'))

通过这种方式获得不需要的额外链接

例如：https://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1、2,3,4,5、%3E5和proptype=多层公寓、建筑层公寓、顶层公寓、工作室公寓、住宅、别墅、住宅区和城市名称=孟买

我们将获得超过50k的链接。如何打开只有前1000个链接在下面的属性照片

编辑

我也试过：

driver.find_elements_by_xpath("//div[@class='.l-srp__results.flex__item']")
driver.find_element_by_css_selector('a').get_attribute('href')

for matches in driver:
    print('Liking')
    print (matches)
    #matches.click()
    time.sleep(5)

但获取错误：TypeError:“WebDriver”对象不可编辑

为什么我不使用这一行来获取链接：driver.find_element_by_css_selector'a'。get_attribute'href'

编辑1

我试图排序链接如下，但得到错误

            result = re.findall(r'https://www.magicbricks.com/propertyDetails/', my_list)
            print (result)

错误：TypeError:应为字符串或类似字节的对象

或尝试

            a = ['https://www.magicbricks.com/propertyDetails/']
            output_names = [name for name in a if (name[:45] in my_list)]
            print (output_names)

什么也得不到

所有链接都在列表中。请建议

先谢谢你。请建议硒不是刮网的好主意。我建议您使用JMeter，它是免费的、开源的

如果您想使用selenium，那么您尝试采用的方法不是一种稳定的方法—单击并获取数据。相反，我建议你遵循这一点——类似于这里。这个例子是用java编写的。但是你可以得到这个想法

driver.get("https://www.yahoo.com");

Map<Integer, List<String>> map = driver.findElements(By.xpath("//*[@href]")) 
                .stream()                             // find all elements which has href attribute & process one by one
                .map(ele -> ele.getAttribute("href")) // get the value of href
                .map(String::trim)                    // trim the text
                .distinct()                           // there could be duplicate links , so find unique
                .collect(Collectors.groupingBy(LinkUtil::getResponseCode)); // group the links based on the response code

更多信息在这里

我认为您应该收集列表中所有元素，这些元素的标签名为a，href属性不为null。然后遍历列表并逐个单击元素。创建WebElement类型的列表并存储所有有效链接。在这里，您可以应用更多筛选器或条件，即链接包含一些字符或其他条件

要在列表中存储WebElement，您可以使用driver.findEelements。此方法将返回WebElement类型的列表。

您能给我们一个您想要的链接示例吗？您需要优化选择器。请打开此链接https://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1、2,3,4,5、%3E5和proptype=多层公寓、建筑层公寓、顶层公寓、工作室公寓、住宅、别墅、，Residential Plot&cityName=孟买，在这里您将获得超过50000个房产详细信息，然后单击第一个https://www.magicbricks.com/propertyDetails/2-BHK-1182-Sq-ft-Multistorey-Apartment-FOR-Sale-Kandivali-East-in-Mumbai&id=4d423336313032373731 然后你们会看到一些数据，比如卧室、浴室等。如何对链接进行排序？我只需要一些有效的链接。请建议在这种情况下，我想给你的数据如下。什么是有效链接？什么是无效链接。举个例子。