Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
需要帮助从多个网页中抓取信息并以表格形式导入csv文件-Python_Python_Csv_Nested Lists - Fatal编程技术网

需要帮助从多个网页中抓取信息并以表格形式导入csv文件-Python

需要帮助从多个网页中抓取信息并以表格形式导入csv文件-Python,python,csv,nested-lists,Python,Csv,Nested Lists,我一直在网站上工作,把维基百科上的信息框信息乱七八糟。这是我一直在使用的以下代码: import requests import csv from bs4 import BeautifulSoup URL = ['https://en.wikipedia.org/wiki/Workers_Credit_Union','https://en.wikipedia.org/wiki/San_Diego_County_Credit_Union', 'https://e

我一直在网站上工作,把维基百科上的信息框信息乱七八糟。这是我一直在使用的以下代码:

import requests 
import csv 
from bs4 import BeautifulSoup 
URL = ['https://en.wikipedia.org/wiki/Workers_Credit_Union','https://en.wikipedia.org/wiki/San_Diego_County_Credit_Union',
               'https://en.wikipedia.org/wiki/USA_Federal_Credit_Union','https://en.wikipedia.org/wiki/Commonwealth_Credit_Union',
               'https://en.wikipedia.org/wiki/Center_for_Community_Self-Help','https://en.wikipedia.org/wiki/ESL_Federal_Credit_Union',
               'https://en.wikipedia.org/wiki/State_Employees_Credit_Union','https://en.wikipedia.org/wiki/United_Heritage_Credit_Union'] 
for url in URL:
            headers=[]
            rows=[]
            response = requests.get(url)
            soup = BeautifulSoup(response.text,'html.parser')
            table = soup.find('table',class_ ='infobox')
            credit_union_name= soup.find('h1', id = "firstHeading")
            header_tags = table.find_all('th')
            headers = [header.text.strip() for header in header_tags]
            data_rows = table.find_all('tr')
            for row in data_rows:
                value = row.find_all('td')
                beautified_value = [dp.text.strip() for dp in value]
                if len(beautified_value) == 0: 
                    continue
                rows.append(beautified_value)
            rows.append("")
            rows.append([credit_union_name.text.strip()])
            rows.append([url])
            
            with open(r'credit_unions.csv','a+',newline="") as output:
                writer=csv.writer(output)
                writer.writerow(headers)
                writer.writerow(rows)

但是,我检查了csv文件,信息没有以表格形式显示。刮取的元素存储在嵌套列表中,而不是单个列表中。我需要每个网址的刮取信息存储在一个单一的列表,并打印在csv文件的表格形式与标题列表。需要相关帮助。

信息框具有不同的结构和标签。所以我认为解决这个问题的最好办法是使用口述和口述作者

import requests
import csv
from bs4 import BeautifulSoup

URL = ['https://en.wikipedia.org/wiki/Workers_Credit_Union',
       'https://en.wikipedia.org/wiki/San_Diego_County_Credit_Union',
       'https://en.wikipedia.org/wiki/USA_Federal_Credit_Union',
       'https://en.wikipedia.org/wiki/Commonwealth_Credit_Union',
       'https://en.wikipedia.org/wiki/Center_for_Community_Self-Help',
       'https://en.wikipedia.org/wiki/ESL_Federal_Credit_Union',
       'https://en.wikipedia.org/wiki/State_Employees_Credit_Union',
       'https://en.wikipedia.org/wiki/United_Heritage_Credit_Union']

csv_headers = set()
csv_rows = []

for url in URL:
    csv_row = {}
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    credit_union_name = soup.find('h1', id="firstHeading")
    table = soup.find('table', class_='infobox')
    data_rows = table.find_all('tr')
    for data_row in data_rows:
        label = data_row.find('th')
        value = data_row.find('td')
        if label is None or value is None:
            continue
        beautified_label = label.text.strip()
        beautified_value = value.text.strip()
        csv_row[beautified_label] = beautified_value
        csv_headers.add(beautified_label)
    csv_row["name"] = credit_union_name.text.strip()
    csv_row["url"] = url
    csv_rows.append(csv_row)

with open(r'credit_unions.csv', 'a+', newline="") as output:
    headers = ["name", "url"]
    headers += sorted(csv_headers)
    writer = csv.DictWriter(output, fieldnames=headers)
    writer.writeheader()
    writer.writerows(csv_rows)


太神了非常感谢。