Python 将数据刮到csv文件beautifulsoup_Python_Csv_Beautifulsoup

Python 将数据刮到csv文件beautifulsoup

python csv

Python 将数据刮到csv文件beautifulsoup,python,csv,beautifulsoup,Python,Csv,Beautifulsoup,正如标题所说，我用beautifulsoup scraper从网站上抓取数据效果很好，但当我尝试将数据加载到csv文件时，它只保存1个区域的数据，而不是scraper提供的500个区域。下面是我的代码： #from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import csv #launch url url = "https://www.canlii.org/en/#sear

正如标题所说，我用beautifulsoup scraper从网站上抓取数据效果很好，但当我尝试将数据加载到csv文件时，它只保存1个区域的数据，而不是scraper提供的500个区域。下面是我的代码：

#from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import csv



#launch url
url = "https://www.canlii.org/en/#search/type=decision&jId=bc,ab,sk,mb,on,qc,nb,ns,pe,nl,yk,nt,nu&startDate=1990-01-01&endDate=1992-01-14&text=non-pecuniary%20award%20&resultIndex=1"

# create a new Chrome session
driver = webdriver.Chrome('C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\selenium\webdriver\common\chromedriver.exe')
driver.implicitly_wait(30)
driver.get(url)


#Selenium hands the page source to Beautiful Soup
soup=BeautifulSoup(driver.page_source, 'lxml')

csv_file = open('test.csv', 'w')

csv_writer = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
csv_writer.writerow(['Reference', 'case', 'link', 'province', 'keywords','snippets'])

#Scrape all
for scrape in soup.find_all('li', class_='result '):
    print(scrape.text)    
    
#Reference Index
    Reference = scrape.find('span', class_='reference')
    print(Reference.text)

#Case Name Index
    case = scrape.find('span', class_='name')
    print(case.text)
    
#Canlii Keywords Index
    keywords = scrape.find('div', class_='keywords')
    print(keywords.text)
    
#Province Index
    province = scrape.find('div', class_='context')
    print(province.text)
               
#snippet Index
    snippet = scrape.find('div', class_='snippet')
    print(snippet.text)
 
# Extracting URLs from the attribute href in the <a> tags.
    link = scrape.find('a', href=True)
    print(link)        
            
csv_writer.writerow([Reference.text, case.text,link.href, province.text, keywords.text, snippet.text])
csv_file.close()

#来自selenium.webdriver.common.keys导入密钥
从bs4导入BeautifulSoup
导入csv
#启动url
url=”https://www.canlii.org/en/#search/type=decision&jId=bc，ab，sk，mb，on，qc，nb，ns，pe，nl，yk，nt，nu&startDate=1990-01-01&endDate=1992-01-14&text=non-money%20奖励%20&resultIndex=1“
#创建一个新的Chrome会话
driver=webdriver.Chrome（'C:\ProgramFiles（x86）\Microsoft Visual Studio\Shared\Anaconda3\u 64\lib\site packages\selenium\webdriver\common\chromedriver.exe'）
驱动程序。隐式等待（30）
获取驱动程序（url）
#Selenium将页面源代码交给BeautifulSoup
soup=BeautifulSoup（driver.page_源代码'lxml'）
csv_file=open（'test.csv'，'w'）
csv_writer=csv.writer（csv_文件，quoting=csv.QUOTE_ALL）
csv_writer.writerow（['Reference'，'case'，'link'，'province'，'keywords'，'snippets']））
#刮掉所有
用于刮汤。查找所有（'li'，class='result'）：
打印（scrape.text）
#参考指数
Reference=scrape.find（'span'，class='Reference'）
打印（参考文本）
#案例名称索引
case=scrape.find（'span'，class='name'）
打印（case.text）
#Canlii关键字索引
关键词=scrape.find（'div'，class='keywords'）
打印（关键字.文本）
#省索引
省=scrape.find（'div'，class='context'）
打印（省.文本）
#代码段索引
snippet=scrape.find（'div'，class='snippet'）
打印（snippet.text）
#从标记中的属性href提取URL。
link=scrape.find（'a'，href=True）
打印（链接）
csv_writer.writerow（[Reference.text，case.text，link.href，province.text，keywords.text，snippet.text]）
csv_文件.close（）

您的csv\u writer.writerow（）函数在for循环之外。试着缩进它，看看它是否有效

在csv_writer.writerow（[Reference.text，case.text，link.href，province.text，keywords.text，snippet.text]）中执行文件“”（第37行）时出现此错误ValueError：对关闭的文件执行I/O操作。好吧，我修复了csv编写器的问题enrique是正确的，它是跳出循环的，然后是编码错误，但我有一个新问题，如果我想在现有csv中添加更多内容，我该怎么做如果我想按照您的教程进行，我在linux机器上遇到了问题-“回溯”（最后一次调用）：文件“/home/martin/.atom/python/examples/bs_canlii.py”，第10行，在driver=webdriver.Chrome（'C:\Program Files（x86）\Microsoft Visual Studio\Shared\Anaconda3\u 64\lib\site packages\selenium\webdriver\common\chromedriver.exe'））NameError:未定义名称“webdriver”-我在MX Linux上运行ATOM-我猜您是在win机器上