在python中,每个URL只得到1个结果(最后一个结果),而不是每个URL得到60个结果

在python中,每个URL只得到1个结果(最后一个结果),而不是每个URL得到60个结果,python,Python,在python中,每个URL只得到1个结果(最后一个结果),而不是每个URL得到60个结果 对这一打击进行编码 修复后的代码应该是什么 这是python代码 from bs4 import BeautifulSoup as soup from concurrent.futures import ThreadPoolExecutor import requests page_url = "url.xml" number_of_threads = 6 out_filename = "title.c

在python中,每个URL只得到1个结果(最后一个结果),而不是每个URL得到60个结果

对这一打击进行编码 修复后的代码应该是什么

这是python代码

from bs4 import BeautifulSoup as soup
from concurrent.futures import ThreadPoolExecutor
import requests

page_url = "url.xml"
number_of_threads = 6
out_filename = "title.csv"
headers = "title,brand,category \n"

def extract_data_from_url_func(url):
    print(url)
    response = requests.get(url)
    page_soup = soup(response.text, "html.parser")

    containers = page_soup.findAll('div',{'class' : 'column column-block block-grid-large single-item'})

    for container in containers:

        title = container['data-name'].replace(",", "|")
        brand = container['data-brand-name']
        category = container['data-category-name'].replace(",", "|")

        output_list = [title,brand,category]
        output = ",".join(output_list)
        print(output)
        return output

with open("url.xml", "r") as fr:
    URLS = list(map(lambda x: x.strip(), fr.readlines()))

with ThreadPoolExecutor(max_workers=number_of_threads) as executor:
    results = executor.map( extract_data_from_url_func, URLS)
    responses = []
    for result in results:
        responses.append(result)


with open(out_filename, "w", encoding='utf-8-sig') as fw:
  fw.write(headers)
  for response in responses:
      fw.write(response + "\n")

您应该附加输出以获得累积结果。您可以根据需要格式化输出。代码如下:

from bs4 import BeautifulSoup as soup
from concurrent.futures import ThreadPoolExecutor
import requests

page_url = "url.xml"
number_of_threads = 6
out_filename = "title.csv"
headers = "title,brand,category \n"

def extract_data_from_url_func(url):
    print(url)
    response = requests.get(url)
    page_soup = soup(response.text, "html.parser")

    containers = page_soup.findAll('div',{'class' : 'column column-block block-grid-large single-item'})
    output = ''
    for container in containers:

        title = container['data-name'].replace(",", "|")
        brand = container['data-brand-name']
        category = container['data-category-name'].replace(",", "|")

        output_list = [title,brand,category]
        output = output + ",".join(output_list)
        print(output)

    return output

with open("url.xml", "r") as fr:
    URLS = list(map(lambda x: x.strip(), fr.readlines()))

with ThreadPoolExecutor(max_workers=number_of_threads) as executor:
    results = executor.map( extract_data_from_url_func, URLS)
    responses = []
    for result in results:
        responses.append(result)


with open(out_filename, "w", encoding='utf-8-sig') as fw:
  fw.write(headers)
  for response in responses:
      fw.write(response + "\n")

打印(输出)和返回(输出)必须在for循环之外

因为你的回报在for循环中?所以你只返回一个,而不是累加所有并返回所有?