Javascript python BeautifulSoup无法从网页获取文本

Javascript python BeautifulSoup无法从网页获取文本,javascript,python,html,web-scraping,beautifulsoup,Javascript,Python,Html,Web Scraping,Beautifulsoup,我正在尝试使用python从网页中获取产品名称。但它只返回一个空标签。我还尝试了请求库和lxml解析BeautifulSoup。请帮我解决这个问题,提前谢谢:-) 网站中的HTML: <div class="product-name">SWAN</div> <div class="product-price"> <span class="final-price">₹10650</span> </div> <

我正在尝试使用python从网页中获取产品名称。但它只返回一个空标签。我还尝试了
请求
库和
lxml
解析
BeautifulSoup
。请帮我解决这个问题,提前谢谢:-)

网站中的HTML:

<div class="product-name">SWAN</div>
   <div class="product-price">
   <span class="final-price">₹10650</span>
</div>
<div class="specification">
   <div>Specifications</div>
   <table>
      <tr>
         <td>....</td>
      </tr>
      <tr>
         <td>....</td>
     </tr>
   </table>
</div>
输出:

<div class="product-name"></div>
<div class="specification">
<div>Specifications</div>
<table></table>
</div>

规格

您正在搜索的数据实际上是由javascript加载的。您必须使用selenium之类的包来检索数据

您可以尝试以下方法:

代码:

from bs4 import BeautifulSoup as bs
from selenium import webdriver
import requests
from selenium.webdriver.firefox.options import Options as FirefoxOptions

# Use options to have your selenium headless
options = FirefoxOptions()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

url = "http://opor.in/ProductDetail/Index?ProductId=212"
driver.get(url)

page = driver.page_source
html = bs(page, 'html.parser')

model_name = html.find('div', {'class':'product-name'})
spec = html.find('div', {'class':'specification'})
print(model_name)
print(spec)
<div class="product-name">WHALE 25 LPH</div>
<div class="specification">
<div>Specifications</div>
<table><tr><td class="specification-group" colspan="2"><div>General</div></td></tr><tr><td>Product Code</td><td>1601KFMB</td></tr><tr><td>MEMBRANE</td><td>MEMBRELLA -ALPHA- 80 GPD (2 NOS)</td></tr><tr><td>PUMP</td><td>KEMFLO 48 V</td></tr><tr><td class="specification-group" colspan="2"><div>Specifications</div></td></tr><tr><td>APPLICATION</td><td>SUITABLE FOR BRACKISH WATER</td></tr><tr><td>FILTER LIFE</td><td>APPROX 3000 LITRE / 6 MONTHS</td></tr><tr><td>FILTERS</td><td>SEDIMENT, PRECARBON, POST CARBON</td></tr><tr><td>FLOAT</td><td>MEMBRELLA</td></tr><tr><td>FR</td><td>MEMBRELLA /KFL</td></tr><tr><td>INLINE SET</td><td>MEMBRELLA</td></tr><tr><td>INPUT VOLTAGE</td><td>100-300 VOLT AC (50Hz)</td></tr><tr><td>INSTALLATION</td><td>COUNTER TOP</td></tr><tr><td>MAX.OPERATION TDS</td><td>4000 PPM</td></tr><tr><td>MEMBRANE TYPE</td><td>THIN FILM COMPOSITE</td></tr><tr><td>MIN.INLET PRESSURE / TEMP</td><td>0.3 kg / cm2, 10 °C</td></tr><tr><td>MODEL</td><td>WHALE 25</td></tr><tr><td>OPERATING VOLTAGE</td><td>48 VOLT (DC)</td></tr><tr><td>PRODUCT DIMENSION</td><td>21.1 (H) x 9.9 (W) x 16.7 (L)</td></tr><tr><td>PURIFICATION CAPACITY</td><td>25 LITRES PER HOUR</td></tr><tr><td>RECOVERY RATE</td><td>MORE THAN 30% AT 27°c ± 2°c</td></tr><tr><td>SMPS</td><td>MEMBRELLA / EQUALIANT</td></tr><tr><td>SOLENOID VALVE</td><td>MEMBRELLA / SLX</td></tr><tr><td>STORAGE CAPACITY</td><td>20 LITRES</td></tr><tr><td>TECHNOLOGY</td><td>REVERSE OSMOSIS SYSTEM</td></tr><tr><td>TOTAL POWER CONSUMPTION</td><td>50 W</td></tr><tr><td>TUBE 1/4</td><td>5 METERS</td></tr><tr><td>TUBE 3/8</td><td>2 METERS</td></tr><tr><td>WEIGHT</td><td>18 kg (Approx)</td></tr><tr><td>WARRENTY &amp;  SUPPORT</td><td>Since Whale  designs its purifiers and many of its parts  are a truly integrated system. Dealer only  can provide one-stop service ,guaranty and support for any service and maintenance, so most issues can be resolved in a single visit</td></tr></table>
</div>
结果:

from bs4 import BeautifulSoup as bs
from selenium import webdriver
import requests
from selenium.webdriver.firefox.options import Options as FirefoxOptions

# Use options to have your selenium headless
options = FirefoxOptions()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)

url = "http://opor.in/ProductDetail/Index?ProductId=212"
driver.get(url)

page = driver.page_source
html = bs(page, 'html.parser')

model_name = html.find('div', {'class':'product-name'})
spec = html.find('div', {'class':'specification'})
print(model_name)
print(spec)
<div class="product-name">WHALE 25 LPH</div>
<div class="specification">
<div>Specifications</div>
<table><tr><td class="specification-group" colspan="2"><div>General</div></td></tr><tr><td>Product Code</td><td>1601KFMB</td></tr><tr><td>MEMBRANE</td><td>MEMBRELLA -ALPHA- 80 GPD (2 NOS)</td></tr><tr><td>PUMP</td><td>KEMFLO 48 V</td></tr><tr><td class="specification-group" colspan="2"><div>Specifications</div></td></tr><tr><td>APPLICATION</td><td>SUITABLE FOR BRACKISH WATER</td></tr><tr><td>FILTER LIFE</td><td>APPROX 3000 LITRE / 6 MONTHS</td></tr><tr><td>FILTERS</td><td>SEDIMENT, PRECARBON, POST CARBON</td></tr><tr><td>FLOAT</td><td>MEMBRELLA</td></tr><tr><td>FR</td><td>MEMBRELLA /KFL</td></tr><tr><td>INLINE SET</td><td>MEMBRELLA</td></tr><tr><td>INPUT VOLTAGE</td><td>100-300 VOLT AC (50Hz)</td></tr><tr><td>INSTALLATION</td><td>COUNTER TOP</td></tr><tr><td>MAX.OPERATION TDS</td><td>4000 PPM</td></tr><tr><td>MEMBRANE TYPE</td><td>THIN FILM COMPOSITE</td></tr><tr><td>MIN.INLET PRESSURE / TEMP</td><td>0.3 kg / cm2, 10 °C</td></tr><tr><td>MODEL</td><td>WHALE 25</td></tr><tr><td>OPERATING VOLTAGE</td><td>48 VOLT (DC)</td></tr><tr><td>PRODUCT DIMENSION</td><td>21.1 (H) x 9.9 (W) x 16.7 (L)</td></tr><tr><td>PURIFICATION CAPACITY</td><td>25 LITRES PER HOUR</td></tr><tr><td>RECOVERY RATE</td><td>MORE THAN 30% AT 27°c ± 2°c</td></tr><tr><td>SMPS</td><td>MEMBRELLA / EQUALIANT</td></tr><tr><td>SOLENOID VALVE</td><td>MEMBRELLA / SLX</td></tr><tr><td>STORAGE CAPACITY</td><td>20 LITRES</td></tr><tr><td>TECHNOLOGY</td><td>REVERSE OSMOSIS SYSTEM</td></tr><tr><td>TOTAL POWER CONSUMPTION</td><td>50 W</td></tr><tr><td>TUBE 1/4</td><td>5 METERS</td></tr><tr><td>TUBE 3/8</td><td>2 METERS</td></tr><tr><td>WEIGHT</td><td>18 kg (Approx)</td></tr><tr><td>WARRENTY &amp;  SUPPORT</td><td>Since Whale  designs its purifiers and many of its parts  are a truly integrated system. Dealer only  can provide one-stop service ,guaranty and support for any service and maintenance, so most issues can be resolved in a single visit</td></tr></table>
</div>
whal25 LPH
规格
一般产品代码1601KFMB膜滤器-阿尔法-80 GPD(2个)泵组EMFLO 48 V规格适用于微咸水过滤器寿命约3000升/6个月过滤器沉积物、碳前、碳后浮子膜滤器膜滤器/KFLINE设置膜滤器输入电压100-300伏交流(50Hz)安装计数器TOPMAX.OPERATION TDS4000 PPM膜式薄膜复合材料入口压力/温度0.3 kg/cm2,10°cm模型霍尔25工作电压48伏(DC)产品尺寸21.1(高)x 9.9(宽)x 16.7(长)净化能力25升/小时在27°c±2°c温度下回收率超过30%风冷/平衡电磁阀风冷/SLX储存能力20 Litrest技术反渗透系统总功耗50瓦管1/45米管3/82米重量18千克;由于Whale设计了净化器,其许多部件都是真正的集成系统。经销商对任何维修保养都只能提供一站式的服务、保证和支持,大部分问题都可以一次解决

java-scripts加载的数据。但是,如果您在script标记中看到可用的DOM数据,请从script标记中获取值并加载到json中,然后获取键值

代码

from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
import json
url = "http://opor.in/ProductDetail/Index?ProductId=212"
page = urlopen(url).read()
soup = bs(page, 'html.parser')

for item in soup.find_all('script'):
   if 'productDetail' in item.text:
       data=item.text.split('var productDetail =')[-1].split('};')[0] + "}"
       datajson=json.loads(data.strip())
       print('Product Code :' + datajson['ProductCode'])
       for item in datajson['ProductSpecification']:
           print(item['SpecificationName'] + " : "+ item['SpecificationValue'])
Product Code :1601KFMB
MEMBRANE : MEMBRELLA -ALPHA- 80 GPD (2 NOS)
PUMP : KEMFLO 48 V
APPLICATION : SUITABLE FOR BRACKISH WATER
FILTER LIFE : APPROX 3000 LITRE / 6 MONTHS
FILTERS : SEDIMENT, PRECARBON, POST CARBON
FLOAT : MEMBRELLA
FR : MEMBRELLA /KFL
INLINE SET : MEMBRELLA
INPUT VOLTAGE : 100-300 VOLT AC (50Hz)
INSTALLATION : COUNTER TOP
MAX.OPERATION TDS : 4000 PPM
MEMBRANE TYPE : THIN FILM COMPOSITE
MIN.INLET PRESSURE / TEMP : 0.3 kg / cm2, 10 °C
MODEL : WHALE 25
OPERATING VOLTAGE : 48 VOLT (DC)
PRODUCT DIMENSION : 21.1 (H) x 9.9 (W) x 16.7 (L)
PURIFICATION CAPACITY : 25 LITRES PER HOUR
RECOVERY RATE : MORE THAN 30% AT 27°c ± 2°c
SMPS : MEMBRELLA / EQUALIANT
SOLENOID VALVE : MEMBRELLA / SLX
STORAGE CAPACITY : 20 LITRES
TECHNOLOGY : REVERSE OSMOSIS SYSTEM
TOTAL POWER CONSUMPTION : 50 W
TUBE 1/4 : 5 METERS
TUBE 3/8 : 2 METERS
WEIGHT : 18 kg (Approx)
WARRENTY &  SUPPORT : Since Whale  designs its purifiers and many of its parts  are a truly integrated system. Dealer only  can provide one-stop service ,guaranty and support for any service and maintenance, so most issues can be resolved in a single visit
输出

from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
import json
url = "http://opor.in/ProductDetail/Index?ProductId=212"
page = urlopen(url).read()
soup = bs(page, 'html.parser')

for item in soup.find_all('script'):
   if 'productDetail' in item.text:
       data=item.text.split('var productDetail =')[-1].split('};')[0] + "}"
       datajson=json.loads(data.strip())
       print('Product Code :' + datajson['ProductCode'])
       for item in datajson['ProductSpecification']:
           print(item['SpecificationName'] + " : "+ item['SpecificationValue'])
Product Code :1601KFMB
MEMBRANE : MEMBRELLA -ALPHA- 80 GPD (2 NOS)
PUMP : KEMFLO 48 V
APPLICATION : SUITABLE FOR BRACKISH WATER
FILTER LIFE : APPROX 3000 LITRE / 6 MONTHS
FILTERS : SEDIMENT, PRECARBON, POST CARBON
FLOAT : MEMBRELLA
FR : MEMBRELLA /KFL
INLINE SET : MEMBRELLA
INPUT VOLTAGE : 100-300 VOLT AC (50Hz)
INSTALLATION : COUNTER TOP
MAX.OPERATION TDS : 4000 PPM
MEMBRANE TYPE : THIN FILM COMPOSITE
MIN.INLET PRESSURE / TEMP : 0.3 kg / cm2, 10 °C
MODEL : WHALE 25
OPERATING VOLTAGE : 48 VOLT (DC)
PRODUCT DIMENSION : 21.1 (H) x 9.9 (W) x 16.7 (L)
PURIFICATION CAPACITY : 25 LITRES PER HOUR
RECOVERY RATE : MORE THAN 30% AT 27°c ± 2°c
SMPS : MEMBRELLA / EQUALIANT
SOLENOID VALVE : MEMBRELLA / SLX
STORAGE CAPACITY : 20 LITRES
TECHNOLOGY : REVERSE OSMOSIS SYSTEM
TOTAL POWER CONSUMPTION : 50 W
TUBE 1/4 : 5 METERS
TUBE 3/8 : 2 METERS
WEIGHT : 18 kg (Approx)
WARRENTY &  SUPPORT : Since Whale  designs its purifiers and many of its parts  are a truly integrated system. Dealer only  can provide one-stop service ,guaranty and support for any service and maintenance, so most issues can be resolved in a single visit