Python 3.x 我试图提取span_id中的文本,但使用python beautifulsoup获得空白输出
我试图提取span id标记内的文本,但得到的是空白输出屏幕 我也尝试过使用父元素div文本,但无法提取,请任何人帮助我。 下面是我的代码Python 3.x 我试图提取span_id中的文本,但使用python beautifulsoup获得空白输出,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我试图提取span id标记内的文本,但得到的是空白输出屏幕 我也尝试过使用父元素div文本,但无法提取,请任何人帮助我。 下面是我的代码 import requests from bs4 import BeautifulSoup r = requests.get('https://www.paperplatemakingmachines.com/') soup = BeautifulSoup(r.text,'lxml') mob = soup.find('span',{"id":"
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.paperplatemakingmachines.com/')
soup = BeautifulSoup(r.text,'lxml')
mob = soup.find('span',{"id":"tollfree"})
print(mob.text)
我希望文本位于给定手机号码的范围内。数据实际上是通过脚本动态呈现的。您需要做的是从脚本解析数据:
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('https://www.paperplatemakingmachines.com/')
soup = BeautifulSoup(r.text,'lxml')
script= soup.find('script')
mob = re.search("(?<=pns_no = \")(.*)(?=\";)", script.text).group()
print(mob)
导入请求
进口稀土
从bs4导入BeautifulSoup
r=请求。获取('https://www.paperplatemakingmachines.com/')
soup=BeautifulSoup(r.text,'lxml')
script=soup.find('script')
mob=re.search((?您必须使用Selenium,因为该文本在初始请求中不存在,或者在不搜索
标记的情况下至少不存在
from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time
driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
url='https://www.paperplatemakingmachines.com/'
driver.get(url)
# It's better to use Selenium's WebDriverWait, but I'm still learning how to use that correctly
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
mob = soup.find('span',{"id":"tollfree"})
print(mob.text)
使用正则表达式查找数字的另一种方法
import requests
import re
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.paperplatemakingmachines.com/',)
soup = bs(r.content, 'lxml')
r = re.compile(r'var pns_no = "(\d+)"')
data = soup.find('script', text=r).text
script = r.findall(data)[0]
print('+91-' + script)