Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在zillow中刮取价格/税收历史记录表?_Python_Selenium_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 如何在zillow中刮取价格/税收历史记录表?

Python 如何在zillow中刮取价格/税收历史记录表?,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,我试图在zillow中刮税/表价,但结果是没有。我怎么才能拿到那张桌子? 您不需要使用BeautifulSoup。您可以使用以下代码获取所需表格: from bs4 import BeautifulSoup from selenium import webdriver #import urllib2 import time driver = webdriver.Chrome() driver.maximize_window() driver.get("https://www.zill

我试图在zillow中刮税/表价,但结果是没有。我怎么才能拿到那张桌子?
您不需要使用BeautifulSoup。您可以使用以下代码获取所需表格:

from bs4 import BeautifulSoup
from selenium import webdriver
#import urllib2
import time

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.zillow.com/homes/recently_sold/Culver-City-CA/house,condo,apartment_duplex,townhouse_type/20432063_zpid/51617_rid/12m_days/globalrelevanceex_sort/34.048605,-118.340178,33.963223,-118.47785_rect/12_zm/")
time.sleep(3)
driver.find_element_by_class_name("collapsible-header").click()
soup = BeautifulSoup(driver.page_source,"lxml")

region = soup.find("div",{"id":"hdp-price-history"})
table = region.find('table',{'class':'zsg-table yui3-toggle-content-minimized'})
print table

所需的表是通过友好方式生成的,所以您需要等待一段时间,直到它出现在DOM中。这就是为什么在单击后无法立即在页面源代码中找到表的原因。下面使用
请求
美化组
来获取数据,不需要硒元素(而且速度很快)

这表明前5个条目是:

from bs4 import BeautifulSoup
import requests
import re

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}    
r = requests.get("https://www.zillow.com/homes/recently_sold/Culver-City-CA/house,condo,apartment_duplex,townhouse_type/20432063_zpid/51617_rid/12m_days/globalrelevanceex_sort/34.048605,-118.340178,33.963223,-118.47785_rect/12_zm/", headers=headers)
urls = re.findall(re.escape('AjaxRender.htm?') + '(.*?)"', r.content)
url = "https://www.zillow.com/AjaxRender.htm?{}".format(urls[4])
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content.replace('\\', ''), "html.parser")
data = []

for tr in soup.find_all('tr'):
    data.append([td.text for td in tr.find_all('td')])

for row in data[:5]:        # Show first 5 entries    
    print row

所需的HTML不在第一个GET中,但在展开
价格/税务历史记录
部分时按需生成。这会在浏览器中触发AJAX请求。代码在初始HTML中搜索所有这些请求,并发出相同的请求。第四个这样的请求用于获取所需的部分。返回的HTML需要删除
\
,然后可以作为表传递给BeautifulSoup进行解析。

您帮了我很多忙!非常感谢!很高兴我能帮忙!不要忘记点击上/下按钮下的灰色勾号,选择一个答案作为可接受的解决方案。
from bs4 import BeautifulSoup
import requests
import re

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}    
r = requests.get("https://www.zillow.com/homes/recently_sold/Culver-City-CA/house,condo,apartment_duplex,townhouse_type/20432063_zpid/51617_rid/12m_days/globalrelevanceex_sort/34.048605,-118.340178,33.963223,-118.47785_rect/12_zm/", headers=headers)
urls = re.findall(re.escape('AjaxRender.htm?') + '(.*?)"', r.content)
url = "https://www.zillow.com/AjaxRender.htm?{}".format(urls[4])
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content.replace('\\', ''), "html.parser")
data = []

for tr in soup.find_all('tr'):
    data.append([td.text for td in tr.find_all('td')])

for row in data[:5]:        # Show first 5 entries    
    print row
[u'06/16/17', u'Sold', u'$940,000-0.9%', u'K. Miller, A. Masket', u'']
[u'06/14/17', u'Price change', u'$949,000-1.0%', u'', u'']
[u'05/08/17', u'Pending sale', u'$959,000', u'', u'']
[u'04/17/17', u'Price change', u'$959,000+1.1%', u'', u'']
[u'02/27/17', u'Pending sale', u'$949,000', u'', u'']