Python 隐藏电话号码可以'；不要刮_Python_Selenium_Web Scraping_Beautifulsoup

Python 隐藏电话号码可以'；不要刮

python selenium web-scraping

Python 隐藏电话号码可以'；不要刮,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,在点击“llamar”按钮后，我一直无法提取电话号码。到目前为止，我已经在selenium中使用了xpath方法，还尝试使用beautiful soup来提取数字，但不幸的是，没有任何效果。我通常会得到一个无效的选择器错误（如果我在selenium中使用xpath选择器），而在BS4中，我会得到一个-AttributeError:“NoneType”对象没有属性“text”。。。我希望你能帮助我以下是链接的url- 以下是我尝试的代码： from selenium import webdri

在点击“llamar”按钮后，我一直无法提取电话号码。到目前为止，我已经在selenium中使用了xpath方法，还尝试使用beautiful soup来提取数字，但不幸的是，没有任何效果。我通常会得到一个无效的选择器错误（如果我在selenium中使用xpath选择器），而在BS4中，我会得到一个-AttributeError:“NoneType”对象没有属性“text”。。。我希望你能帮助我

以下是链接的url-

以下是我尝试的代码：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import UnexpectedAlertPresentException

url = 'https://www.milanuncios.com/venta-de-pisos-en-malaga-malaga/portada-alta-carlos-de-haya-carranque - 386352344.htm'
path = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe'
path1 = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\firefox'
# driver = webdriver.Chrome(path)
options = Options()
driver = webdriver.Chrome(path)
driver.get(url)

a = []

mah_div = driver.page_source
soup = BeautifulSoup(mah_div, features='lxml')

cookie_button = '//*[@id="sui-TcfFirstLayerModal"]/div/div/footer/div/button[2]'
btn_press = driver.find_element_by_xpath(cookie_button)
btn_press.click()

llam_button = '//*[@id="ad-detail-contact"]/a[2]'
llam_press = driver.find_element_by_xpath(llam_button)
llam_press.click()
time.sleep(10)

for item in soup.find_all("div", {"class": "contenido"}):
    a.append(item.find("div", {"class": "plaincontenido"}).text)

print(a)

手机存储在Javascript中。您可以使用

re

模块提取它：

import re
import requests
from bs4 import BeautifulSoup

url = "https://www.milanuncios.com/venta-de-pisos-en-malaga-malaga/portada-alta-carlos-de-haya-carranque-386352344.htm"
phone_url = "https://www.milanuncios.com/datos-contacto/?usePhoneProxy=0&from=detail&includeEmail=false&id={}"

ad_id = re.search(r"(\d+)\.htm", url).group(1)

html_text = requests.get(phone_url.format(ad_id)).text

soup = BeautifulSoup(html_text, "html.parser")
phone = re.search(r"getTrackingPhone\((.*?)\)", html_text).group(1)

print(soup.select_one(".texto").get_text(strip=True), phone)

印刷品：

ana（特别）639。。。。

使用Selenium您需要单击按钮并切换到iframe

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

wait.until(EC.element_to_be_clickable(
            (By.CSS_SELECTOR, ".def-btn.phone-btn")))
tel_button = driver.find_element_by_css_selector(".def-btn.phone-btn")
tel_button.click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "ifrw")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".texto>.telefonos")))
tel_number = driver.find_element_by_css_selector(".texto>.telefonos").text

请注意，我使用了很多稳定的定位器。

使用这个

汤。选择一个（“script[type='application/ld+json']：contains（'Product'）”）。获取文本（strip=True）

解析相关的脚本标记，然后挖掘出包含电话号码的

description

的值。难以置信！很好用，谢谢。我还刚刚了解到在这种情况下需要切换帧。这将对未来的web应用非常有用。