Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用python和bs4修复scrape web表输出csv_Python_Csv_Web Scraping_Beautifulsoup_Scrape - Fatal编程技术网

如何使用python和bs4修复scrape web表输出csv

如何使用python和bs4修复scrape web表输出csv,python,csv,web-scraping,beautifulsoup,scrape,Python,Csv,Web Scraping,Beautifulsoup,Scrape,请帮帮我,, 我想在“td”、“条形码”和“nama produk”中获取2个数据,但我得到的数据非常糟糕。我该修什么 import csv import requests from bs4 import BeautifulSoup outfile = open("dataaa.csv","w",newline='') writer = csv.writer(outfile) page = 0 while page < 3 : url = "http://ciumi.com

请帮帮我,, 我想在“td”、“条形码”和“nama produk”中获取2个数据,但我得到的数据非常糟糕。我该修什么

import csv
import requests
from bs4 import BeautifulSoup


outfile = open("dataaa.csv","w",newline='')
writer = csv.writer(outfile)


page = 0
while page < 3 :
    url = "http://ciumi.com/cspos/barcode-ritel.php?page={:d}".format(page)
    response = requests.get(url)
    tree = BeautifulSoup(response.text, 'html.parser')
    page += 1
    table_tag = tree.select("table")[0]
    tab_data = [[item.text for item in row_data.select("tr")]
    for row_data in table_tag.select("td")]
    for data in tab_data:
        writer.writerow(data)
        print(table_tag)
        print(response, url, ' '.join(data))


import fileinput
seen = set() 
for line in fileinput.FileInput('dataaa.csv', inplace=1):
    if line in seen: continue

    seen.add(line)
    print (line)
导入csv
导入请求
从bs4导入BeautifulSoup
outfile=open(“dataaa.csv”,“w”,换行符=”)
writer=csv.writer(输出文件)
第页=0
而第3页:
url=”http://ciumi.com/cspos/barcode-ritel.php?page={:d}.格式(第页)
response=requests.get(url)
tree=BeautifulSoup(response.text'html.parser')
页码+=1
table_tag=树。选择(“表”)[0]
tab_data=[[item.text用于行_data中的项目。选择(“tr”)]
对于表标记中的行数据,选择(“td”)]
对于tab_数据中的数据:
writer.writerow(数据)
打印(表格标签)
打印(响应、url、.join(数据))
导入文件输入
seen=set()
对于fileinput.fileinput('dataaa.csv',inplace=1)中的行:
如果看到行,则继续
已看到。添加(行)
打印(行)

我需要改进什么才能获得漂亮的效果?

您可以使用pandas来简化这一点。熊猫在引擎盖下使用BeautifulSoup解析表格,顺便说一下:

import pandas as pd

results_df = pd.DataFrame()
for page in range(1,3):
    url = 'http://ciumi.com/cspos/barcode-ritel.php?page=%s' %page
    results_df = results_df.append(pd.read_html(url)[0], sort=True)

results_df.columns = ['Barcode', 'Nama Produk']
results_df = results_df.reset_index(drop=True)

results_df.to_csv('dataaa.csv', index=False)
输出:

print (results_df)
          Barcode                         Nama Produk
0   8992694242533             ZWITSAL SOAP 80G PACK 4
1   8992694247163         ZWITSAL SOAP 80G MILK&HONEY
2   8992694242502            ZWITSAL SOAP 80G CLASSIC
3   8992694245435  ZWITSAL SKIN GUARD LOT 100ML SPRAY
4   8992694246074               ZWITSAL SHP 600ML C&R
5   8992694242908             ZWITSAL SHP 50ML REBORN
6   8992694020025          ZWITSAL SHP 500ML REF AVKS
7   8992694246333           ZWITSAL SHP 500ML C&R REF
8   8992694246364              ZWITSAL SHP 300ML AVKS
9   8992694246319       ZWITSAL SHP 250ML REF CLEAN&R
10  8992694246357          ZWITSAL SHP 250ML REF AVKS
11  8992694242922            ZWITSAL SHP 200ML REBORN
12  8992694242915           ZWITSAL SHP 100ML CLASSIC
13  8992694246340              ZWITSAL SHP 100ML AVKS
14  8992694242601          ZWITSAL PWD 50G SOFTFLOWER
15  8992694244254               ZWITSAL PWD 50G FRESH
16  8992694242656         ZWITSAL PWD 500G SOFTFLORAL
17  8992694241055            ZWITSAL PWD 500G FRESH F
18  8992694244056        ZWITSAL PWD 300G SOFT FLORAL
19  8992694244513         ZWITSAL PWD 300G MILK&HONEY

看起来页面从1开始,所以我的范围循环从那里开始。然后,您可以使用对象来提高重用连接的效率。如果您明智地选择css选择器,则所有过滤都可以在该级别完成,然后您只能处理检索到的必需元素。您可以使用更轻的
csv
而不是更重的
pandas
导入

需要bs4 4.7.1+作为利用
:具有
伪选择器


快速解释:

print (results_df)
          Barcode                         Nama Produk
0   8992694242533             ZWITSAL SOAP 80G PACK 4
1   8992694247163         ZWITSAL SOAP 80G MILK&HONEY
2   8992694242502            ZWITSAL SOAP 80G CLASSIC
3   8992694245435  ZWITSAL SKIN GUARD LOT 100ML SPRAY
4   8992694246074               ZWITSAL SHP 600ML C&R
5   8992694242908             ZWITSAL SHP 50ML REBORN
6   8992694020025          ZWITSAL SHP 500ML REF AVKS
7   8992694246333           ZWITSAL SHP 500ML C&R REF
8   8992694246364              ZWITSAL SHP 300ML AVKS
9   8992694246319       ZWITSAL SHP 250ML REF CLEAN&R
10  8992694246357          ZWITSAL SHP 250ML REF AVKS
11  8992694242922            ZWITSAL SHP 200ML REBORN
12  8992694242915           ZWITSAL SHP 100ML CLASSIC
13  8992694246340              ZWITSAL SHP 100ML AVKS
14  8992694242601          ZWITSAL PWD 50G SOFTFLOWER
15  8992694244254               ZWITSAL PWD 50G FRESH
16  8992694242656         ZWITSAL PWD 500G SOFTFLORAL
17  8992694241055            ZWITSAL PWD 500G FRESH F
18  8992694244056        ZWITSAL PWD 300G SOFT FLORAL
19  8992694244513         ZWITSAL PWD 300G MILK&HONEY
以下内容通过仅将
center
元素与
center

soup.select('center')
然后

通过使用为第二列选择,以获取左侧表格单元格(td)旁边的右侧相邻表格单元格,该单元格具有
中心
子元素

检索到的标记列表在列表理解范围内提取并剥离它们的
.text
,然后将其压缩并再次转换为列表;并附加到最终列表
结果
,该结果随后循环写入csv

css选择器保持最小,以允许更快的匹配



附加阅读:

print (results_df)
          Barcode                         Nama Produk
0   8992694242533             ZWITSAL SOAP 80G PACK 4
1   8992694247163         ZWITSAL SOAP 80G MILK&HONEY
2   8992694242502            ZWITSAL SOAP 80G CLASSIC
3   8992694245435  ZWITSAL SKIN GUARD LOT 100ML SPRAY
4   8992694246074               ZWITSAL SHP 600ML C&R
5   8992694242908             ZWITSAL SHP 50ML REBORN
6   8992694020025          ZWITSAL SHP 500ML REF AVKS
7   8992694246333           ZWITSAL SHP 500ML C&R REF
8   8992694246364              ZWITSAL SHP 300ML AVKS
9   8992694246319       ZWITSAL SHP 250ML REF CLEAN&R
10  8992694246357          ZWITSAL SHP 250ML REF AVKS
11  8992694242922            ZWITSAL SHP 200ML REBORN
12  8992694242915           ZWITSAL SHP 100ML CLASSIC
13  8992694246340              ZWITSAL SHP 100ML AVKS
14  8992694242601          ZWITSAL PWD 50G SOFTFLOWER
15  8992694244254               ZWITSAL PWD 50G FRESH
16  8992694242656         ZWITSAL PWD 500G SOFTFLORAL
17  8992694241055            ZWITSAL PWD 500G FRESH F
18  8992694244056        ZWITSAL PWD 300G SOFT FLORAL
19  8992694244513         ZWITSAL PWD 300G MILK&HONEY