Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/257.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PYTHON-BEAUTIFULSOUP如何将空TD(表数据)作为空值刮取,而不是跳过它_Python_Csv_Web Scraping_Beautifulsoup_Urllib - Fatal编程技术网

PYTHON-BEAUTIFULSOUP如何将空TD(表数据)作为空值刮取,而不是跳过它

PYTHON-BEAUTIFULSOUP如何将空TD(表数据)作为空值刮取,而不是跳过它,python,csv,web-scraping,beautifulsoup,urllib,Python,Csv,Web Scraping,Beautifulsoup,Urllib,我想将一个网页刮到一个4列csv文件中,其中一些表数据不包含任何数据,我想将其作为空单元格值写入,而不是使用.text跳过它。我还尝试使用.string,但它给了我类型错误:只能将str(而不是“NoneType”)连接到str 我还想设置一个动态查找,以获取是否有,然后追加标记数据。如果没有,则追加中的内容,但如果没有数据,则写出空值(或文本“无”)。 您可以在下面看到HTML示例 from urllib.request import urlopen as uReq from bs4 impo

我想将一个网页刮到一个4列csv文件中,其中一些表数据不包含任何数据,我想将其作为空单元格值写入,而不是使用
.text
跳过它。我还尝试使用
.string
,但它给了我
类型错误:只能将str(而不是“NoneType”)连接到str
我还想设置一个动态查找,以获取
是否有
,然后追加
标记数据。如果没有,则追加
中的内容,但如果
没有数据,则写出空值(或文本“无”)。 您可以在下面看到HTML示例

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.example.com'

# opening up connection, grabbing the page
uClient = uReq(my_url)

page_soup = soup(uClient.read(), "lxml")
uClient.close()

# containers = page_soup.find("table", {"class": "typetable"}).find_all("tr",{"class":"typetable"})
# container = containers[0]

containers = page_soup.find_all("tr", {"class": "typetable"})

# print(containers.td)

tds = []

out_filename = "output.csv"

headers = "Parameter,Type_Value,Cardinality,Description \n"

f = open(out_filename, "w")
f.write(headers)

parameter = []
type_value = []
cardinality = []
description = []

for container in containers:

    parameter = container.findAll('td')[0].text
 
    type_value = container.find_all('td')[1].text

    cardinality = container.find_all('td')[2].text

    description = container.find_all('td')[3].text


    print("parameter: " + parameter + "\n")
    print("type_value: " + type_value + "\n")
    print("cardinality: " + cardinality + "\n")
    print("description: " + description + "\n")

    #f.write(parameter + ', ' + type_value + ', ' + cardinality + ', "' + description + ' "\n')
    f.write(f'{parameter},{str(type_value)},{cardinality},"{description}"\n')

f.close()
下面是一个html示例:

<tr class="typetable">
  <td>Data 1&nbsp;</td>
  <td>Data 2&nbsp;</td>
  <td>&nbsp;</td>
  <td>Data 4&nbsp;</td>
</tr>
<tr class="typetable">
  <td>Data 10&nbsp;</td>
  <td>
     <a href="#2ndPage">2ndPage</a>"&nbsp;"
  </td>
  <td>Data 3&nbsp;</td>
  <td>&nbsp;</td>
</tr>

我已经在stackoverflow上测试和查找示例数周了:(,请帮助。提前感谢!

您可以使用此脚本从表中提取数据:

import csv
from bs4 import BeautifulSoup


txt = '''<tr class="typetable">
  <td>Data 1&nbsp;</td>
  <td>Data 2&nbsp;</td>
  <td>&nbsp;</td>
  <td>Data 4&nbsp;</td>
</tr>
<tr class="typetable">
  <td>Data 10&nbsp;</td>
  <td>
     <a href="#2ndPage">2ndPage</a>"&nbsp;"
  </td>
  <td>Data 3&nbsp;</td>
  <td>&nbsp;</td>
</tr>'''

soup = BeautifulSoup(txt, 'html.parser')

all_data = []
for row in soup.select('tr.typetable'):
    tds = [td.a.get_text(strip=True) if td.a else td.get_text(strip=True) for td in row.select('td')]
    all_data.append(tds)


with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    writer.writerow(['Parameter','Type_Value','Cardinality','Description'])
    for row in all_data:
        writer.writerow(row)
import csv
from bs4 import BeautifulSoup


txt = '''<tr class="typetable">
  <td>Data 1&nbsp;</td>
  <td>Data 2&nbsp;</td>
  <td>&nbsp;</td>
  <td>Data 4&nbsp;</td>
</tr>
<tr class="typetable">
  <td>Data 10&nbsp;</td>
  <td>
     <a href="#2ndPage">2ndPage</a>"&nbsp;"
  </td>
  <td>Data 3&nbsp;</td>
  <td>&nbsp;</td>
</tr>'''

soup = BeautifulSoup(txt, 'html.parser')

all_data = []
for row in soup.select('tr.typetable'):
    tds = [td.a.get_text(strip=True) if td.a else td.get_text(strip=True) for td in row.select('td')]
    all_data.append(tds)


with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    writer.writerow(['Parameter','Type_Value','Cardinality','Description'])
    for row in all_data:
        writer.writerow(row)
Parameter,Type_Value,Cardinality,Description
Data 1,Data 2,,Data 4
Data 10,2ndPage,Data 3,