Python Can';我不知道如何用selenium刮页面
我知道如何通过selenium进入结果页面,但我不知道如何真正抓取结果页面。我也试过使用mechanize,但这并没有让我走得更远。这就是我现在的处境:Python Can';我不知道如何用selenium刮页面,python,selenium,web-scraping,Python,Selenium,Web Scraping,我知道如何通过selenium进入结果页面,但我不知道如何真正抓取结果页面。我也试过使用mechanize,但这并没有让我走得更远。这就是我现在的处境: import re import urllib2 import csv import os from selenium import webdriver from selenium.webdriver.support.ui import Select from selenium.webdriver.common.by import By fro
import re
import urllib2
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup, SoupStrainer
import datetime
import time
import smtplib
import atexit
import signal
import json
import os
import gspread
import sys
import gc
script_path = os.path.dirname(os.path.realpath(__file__))
driver = webdriver.PhantomJS(executable_path="/usr/bin/phantomjs", service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])
#launches headless browser, completes proper search in Casenet
def main():
driver.get('https://www.courts.mo.gov/casenet/cases/nameSearch.do')
if 'Service Unavailable' in driver.page_source:
log('Casenet website seems to be down. Receiving "service unavailable"')
driver.quit()
gc.collect()
return False
court = Select(driver.find_element_by_id('courtId'))
court.select_by_visible_text('All Participating Courts')
case_enter = driver.find_element_by_id('inputVO.lastName')
case_enter.send_keys('Wakefield & Associates')
driver.find_element_by_id('findButton').click()
time.sleep(1)
number_of_pages = 204
for i in range(number_of_pages):
output_trs = []
party = (driver.find_element_by_class_name('outerTable'))
output_trs.append(party)
print output_trs
main()
最终的想法是将当事人、案件编号和提交日期作为字符串存储在.csv中。当我现在打印输出时,我得到:
selenium.webdriver.remote.webelement.WebElement (session="c4e7b9e0-7a3b-11e8-83f2-b9030062270d", element=":wdc:1530125781332")
感谢您的帮助。您正在尝试打印web元素对象 打印文本内容的一种方法(注意编码):
for content in output_trs:
print content.text