Python 3.x Python—如何使用+=_Python 3.x_For Loop

Python 3.x Python—如何使用+=

python-3.x for-loop

Python 3.x Python—如何使用+=,python-3.x,for-loop,Python 3.x,For Loop,我写的for循环有问题，我无法让for循环返回到第一个for语句： def output(query,page,max_page): """ Parameters: query: a string max_page: maximum pages to be crawled per day, integer Returns: List of news dictionaries in a list: [[{...},{...}..],

我写的for循环有问题，我无法让for循环返回到第一个for语句：

def output(query,page,max_page):
    """
    Parameters:
        query: a string
        max_page: maximum pages to be crawled per day, integer

    Returns:
    List of news dictionaries in a list: [[{...},{...}..],[{...},]]
    """
    news_dicts_all = []
    news_dicts = []
    # best to concatenate urls here
    date_range = get_dates()
    for date in get_dates():
        s_date = date.replace(".","")
        while page < max_page:
            url = "https://search.naver.com/search.naver?where=news&query=" + query + "&sort=0&ds=" + date + "&de=" + date + "&nso=so%3Ar%2Cp%3Afrom" + s_date + "to" + s_date + "%2Ca%3A&start=" + str(page)
            header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
            req = requests.get(url,headers=header)
            cont = req.content
            soup = BeautifulSoup(cont, 'html.parser')
            for urls in soup.select("._sp_each_url"):
                try:
                    if urls["href"].startswith("https://news.naver.com"):
                        news_detail = get_news(urls["href"])
                        adict = dict()
                        adict["title"] = news_detail[0]
                        adict["date"] = news_detail[1]
                        adict["company"] = news_detail[3]
                        adict["text"] = news_detail[2]
                        news_dicts.append(adict)
                except Exception as e:
                    continue
            page += 10
        news_dicts_all.append(news_dicts)
    return news_dicts_all

def输出（查询、页面、最大页面）：
"""
参数：
查询：字符串
最大页面：每天要爬网的最大页面数，整数
返回：
列表中的新闻词典列表：[[{…}，{…}.]，{…}，]]
"""
新闻内容
新闻目录=[]
#最好在这里连接URL
日期范围=获取日期（）
对于get_dates（）中的日期：
s_日期=日期。替换（“.”，“”）
当页面<最大页面时：
url=”https://search.naver.com/search.naver?where=news&query=“+query+”&sort=0&ds=“+date+”&de=“+date+”&nso=so%3Ar%2Cp%3A从“+s_date+”到“+s_date+”%2Ca%3A&start=“+str（第页）
header={'User-Agent'：'Mozilla/5.0（Windows NT 10.0；Win64；x64）AppleWebKit/537.36（KHTML，类似Gecko）Chrome/58.0.3029.110 Safari/537.36'}
req=请求.get（url，headers=header）
cont=请求内容
soup=BeautifulSoup（续“html.parser”）
对于汤中的url。选择（“.\u sp\u each\u url”）：
尝试：
如果URL[“href”].startswith（“https://news.naver.com"):
新闻详情=获取新闻（URL[“href”]）
adict=dict（）
adict[“title”]=新闻详细信息[0]
adict[“日期”]=新闻详情[1]
adict[“公司”]=新闻详情[3]
adict[“text”]=新闻详情[2]
新闻（附后）
例外情况除外，如e：
持续
页码+=10
新闻dicts\u all.append（新闻dicts）
返回新闻

我已经执行了代码，似乎

page+=

在

部分时将代码返回到“”，但在页面到达max\u page
后，在get\u dates（）
部分不会返回到中的日期
基本上，我希望代码在到达max\u页面后返回get\u dates（）
中的for date，但我不知道如何才能实现这一点。
您永远不会重置页面
，因此当它移动到for循环中的下一个日期时，page>max\u page
已为true，因此它完全跳过while循环
您需要做一些事情，比如将page
参数更改为start\u page
，然后在for循环的开始处使用page=start\u page