Python 我想刮多个页面,但我得到了最后一个url的结果。为什么?

Python 我想刮多个页面,但我得到了最后一个url的结果。为什么?,python,loops,for-loop,url,web-scraping,Python,Loops,For Loop,Url,Web Scraping,为什么结果会输出最后一个url? 我的代码有问题吗 import requests as uReq from bs4 import BeautifulSoup as soup import numpy as np #can i use while loop instead for? for page in np.arange(1,15): url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-in

为什么结果会输出最后一个url? 我的代码有问题吗

import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

#can i use while loop instead for?
for page in np.arange(1,15):
    url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text 

#have used for loop,but result is the last url
page_soup = soup(url,"html.parser")
info = page_soup.findAll("div",{"class: ","row detail_row"})

#Do all the url return output in one file?
filename = "wheel.csv"
file = open(filename,"w",encoding="utf-8")

您应该检查for循环后发生的缩进,否则,变量
url
将在循环的每次迭代中被替换,因此只保留最后一个变量

import requests as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

for page in np.arange(1,15):
    url = uReq.get('https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9'.format(page)).text 

    # this should be done N times (where N is the range param)
    page_soup = soup(url,"html.parser")
    info = page_soup.findAll("div",{"class: ","row detail_row"})

    # append the results to the csv file
    filename = "wheel.csv"
    file = open(filename,"a",encoding="utf-8")
    ...  # code for writing in the csv file
    file.close()
然后,您将在文件中找到所有内容。请注意,您还应该关闭文件以保存它。

试试这个

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import requests

urls=['https://www.myanmarbusiness-directory.com/en/categories-index/car-wheels-tyres-tubes-dealers/page{}.html?city=%E1%80%99%E1%80%9B%E1%80%99%E1%80%B9%E1%80%B8%E1%80%80%E1%80%AF%E1%80%94%E1%80%B9%E1%80%B8%E1%81%BF%E1%80%99%E1%80%AD%E1%80%B3%E1%82%95%E1%80%94%E1%80%9A%E1%80%B9']

links = []
for url in urls:
    response = requests.get(url)
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
    html_page = urlopen(req).read()
    soup = BeautifulSoup(html_page, features="html.parser")
    for link in soup.select_one('ol.list_products').findAll('a', attrs={'href': re.compile("^([a-zA-Z0-9\-])+$")}):
        links.append(link.get('href'))


filename = 'output.csv'

with open(filename, mode="w") as outfile:
    for s in links:
        outfile.write("%s\n" %s)