Python 美化通过变量Url循环的组_Python_Beautifulsoup

Python 美化通过变量Url循环的组

python

Python 美化通过变量Url循环的组,python,beautifulsoup,Python,Beautifulsoup,我试图存储一些从网站上搜集的数据。URL超过100+并且彼此相似。因此，我尝试在代码中使用带有%s标记的内容。我的URL示例：、、、、等等我的Django+Bs4循环： from django.core.management.base import BaseCommand from urllib.request import Request, urlopen from bs4 import BeautifulSoup from scraping.models import Job impor

我试图存储一些从网站上搜集的数据。URL超过100+并且彼此相似。因此，我尝试在代码中使用带有%s标记的内容。

我的URL示例：

、
、
、
、
等等

我的Django+Bs4循环：

from django.core.management.base import BaseCommand
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from scraping.models import Job
import requests as req


header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

class Command(BaseCommand):
    def handle(self,  *args, **options):
        TAGS = ['economy', 'food', 'sports', 'usa', 'health']
        resp = req.get('https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS),headers=header)
        soup = BeautifulSoup(resp.text, 'lxml')

        for i in range(len(soup)):
            titles = soup.findAll("div", {"class": "StretchedBox Z(1)"})
            
        print (titles)

错误消息是：

TypeError: not all arguments converted during string formatting

我一直在玩循环，但我对这个非常陌生，无法解决如何循环它。我错过了什么？

能找个更有知识的人给我指出正确的方向吗？非常感谢

您似乎希望在标记中单独插入每个值，并对每个值执行请求。因此，您需要循环标记并为每个标记提交一个请求。我希望你想要这样的东西：

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = req.get(f'https://www.yahoo.com/lifestyle/tagged/{tag}',headers=header)
    soup = BeautifulSoup(resp.text, 'lxml')
    <process the page>

TAGS=['economy'，'food'，'sports'，'usa'，'health']
对于标记中的标记：
resp=请求获取（f'https://www.yahoo.com/lifestyle/tagged/{tag}'，headers=header）
汤=BeautifulSoup（分别为文本“lxml”）

您似乎希望在标记中分别插入每个值，并对每个值执行请求。因此，您需要循环标记并为每个标记提交一个请求。我希望你想要这样的东西：

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = req.get(f'https://www.yahoo.com/lifestyle/tagged/{tag}',headers=header)
    soup = BeautifulSoup(resp.text, 'lxml')
    <process the page>

TAGS=['economy'，'food'，'sports'，'usa'，'health']
对于标记中的标记：
resp=请求获取（f'https://www.yahoo.com/lifestyle/tagged/{tag}'，headers=header）
汤=BeautifulSoup（分别为文本“lxml”）

您可以在标签之间循环发送每个标签的请求

header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = requests.get(f"https://www.yahoo.com/lifestyle/tagged/{tag}", headers=header)
    print(len(resp.text))

#341723
#442712
#447413
#368508
#445326

您可以循环遍历标记以发送每个标记的请求

header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = requests.get(f"https://www.yahoo.com/lifestyle/tagged/{tag}", headers=header)
    print(len(resp.text))

#341723
#442712
#447413
#368508
#445326

你期待什么https://www.yahoo.com/lifestyle/tagged/%s“%（TAGS）如果

TAGS

是字符串列表，该怎么办？显然，您希望在标记中单独插入每个值，并对每个值执行请求。但是这一行不在循环中，那么您希望它如何发出多个请求呢？您希望

得到什么https://www.yahoo.com/lifestyle/tagged/%s“%（TAGS）

如果

TAGS

是字符串列表，该怎么办？显然，您希望在标记中单独插入每个值，并对每个值执行请求。但是这一行不是循环中的，那么您希望它如何发出多个请求呢？请注意未来：如果有人需要这样的东西。史蒂夫的代码也很有效。注意未来：如果有人需要这样的东西。史蒂夫的代码也运行得很好。