Python 3.x 在抓取网页时遍历《金融时报》页面

Python 3.x 在抓取网页时遍历《金融时报》页面,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,我正在使用BeautifulSoup在英国《金融时报》网站上搜索新闻标题。网站url是以page=1、page=2等结尾的类型。所以我想把每一页的新闻标题都删掉。我目前的工作是: import subprocess news_titles=[] for page in range(5): url="https://www.ft.com/world?page=".format(page) result=requests.get(url) reshult=result.c

我正在使用BeautifulSoup在英国《金融时报》网站上搜索新闻标题。网站url是以page=1、page=2等结尾的类型。所以我想把每一页的新闻标题都删掉。我目前的工作是:

import subprocess

news_titles=[]

for page in range(5):
    url="https://www.ft.com/world?page=".format(page)
    result=requests.get(url)
    reshult=result.content
    soup=BeautifulSoup(reshult, "lxml")


for title in soup.findAll("div",{"class":"o-teaser__heading"}):
    titles=title.find(text=True)
    news_titles.append(titles)

with open('hug_file.txt', 'w') as f:
    for item in news_titles:
        f.write("%s\n" % item)

然而,我只是从第一页得到标题。是否有人可以帮助我编写代码?

请使用以下代码

import requests
from bs4 import BeautifulSoup
news_titles=[]

for page in range(1,6):
    url="https://www.ft.com/world?page={}".format(page)
    result=requests.get(url)
    reshult=result.content
    soup=BeautifulSoup(reshult, "lxml")
    for title in soup.findAll("div",{"class":"o-teaser__heading"}):
        titles=title.find(text=True)
        news_titles.append(titles)

print(news_titles)
输出

['US warns Boris Johnson that UK secrets are at risk', 'Algeria’s powerful army chief Ahmed Gaid Salah dies', 'Philippines seeks to relaunch nuclear power ambitions', 'Saudi Arabia sentences five to death for Khashoggi murder', 'Thousands flee renewed offensive by Syrian regime', 'Voters turn on India’s ruling party over Hindu-first agenda', 'Australia’s bushfires have exposed leaders’ failings', 'Japan is wondering if the Olympics are really worth it', 'What is India’s citizenship law and why has it stirred such anger?', '‘Afghanistan Papers’ shed light on Biden', 'The case for public research spending', 'Help fight the illegal wildlife trade', 'Ukrainegate: a guide to the US impeachment inquiry', 'FT’s foreign affairs podcast with Gideon Rachman', 'China’s global spending spree will collapse, says top US official', 'US warns Boris Johnson that UK secrets are at risk', 'Bank of Canada deputy governor leads race for top job', 'Saudi Arabia sentences five to death for Khashoggi murder', 'Johnson pledges to stand up for Christians', 'Turkish court defies European ruling over activist', 'Squaring the Brexit circle', 'Year in a word: Greenland', 'Will the lights go out on Sark this Christmas?', 'Best of our weekday letters 2019', 'Bank of Canada deputy governor leads race for top job', 'Italy seeks to end shoppers’ reliance on cash', 'Why 2019 was not as bad as you think', 'UK companies risk being uninsured for data losses', 'Packing T-shirts? There’s a Uniqlo robot for that', 'Can the new UK government end homelessness?', 'Cuadrilla pushes for progress on UK fracking', 'UK ministers under fire for vague audit reform pledge', 'Citigroup set to post record revenues in Hong Kong', 'China companies push US listings as appetite wanes', 'Johnson pledges to stand up for Christians', 'Football bodies under pressure over racist chants', 'FirstFT: 2019 in review', 'The spirit of endeavour has not dimmed in 2019', 'Fears for Vimto sales after UAE and Saudi Arabia impose sugar tax', 'Smaller banks turn to currency derivatives for short term liquidity', 'China banks: still standing', 'Big Ben should remain silent on Brexit Day', 'Pound drops as post-election glow evaporates', 'UK’s military seeks new place in world after Brexit', 'Thousands flee renewed offensive by Syrian regime', 'China banks: still standing', 'Big Ben should remain silent on Brexit Day', 'The case for public research spending', 'Pound drops as post-election glow evaporates', 'Voters turn on India’s ruling party over Hindu-first agenda', 'UK’s military seeks new place in world after Brexit', 'Boris Johnson faces a battle to save the union', '2019: the year of street protest', 'Croatia president reaches run-off in re-election bid', 'Trade Secrets: a year in charts', 'Japan is wondering if the Olympics are really worth it', 'Spain’s businesses worried by prospect of radical left', 'What I want for Xmas — more empathy', 'Trump exposed', 'Machine learning: the big risks and how to manage them', 'Xi turns peacemaker amid dispute between Tokyo and Seoul', 'Australia’s bushfires have exposed leaders’ failings', '‘Afghanistan Papers’ shed light on Biden', 'Can social pacts spur inflation?', 'Further reading', 'The European (In)stability Mechanism', 'Ethiopia seizes crown as fastest-growing country in the 2010s', 'China’s new foreign investment law is a missed opportunity', 'Latam renewable energy investment hits record high', 'Help fight the illegal wildlife trade', 'Ukrainegate: a guide to the US impeachment inquiry', 'FT’s foreign affairs podcast with Gideon Rachman', 'The big market moments of 2019', 'FT poll: Christine Lagarde expected to change ECB inflation target', 'Bermuda’s status as insurance safe harbour under threat', 'Plunder of the Commons, by Guy Standing', 'China’s global spending spree will collapse, says top US official', 'George Mitchell, transformer of the energy market', 'China’s might damps criticism of Uighur crackdown', 'How 2019’s mammoth bond rally buoyed entire eurozone', 'Rolls-Royce cuts apprentice and graduate schemes by almost 30%', 'Hong Kong protests loom large over Taiwan election', 'Battered chipmakers look forward to a better 2020', 'What is India’s citizenship law and why has it stirred such anger?', 'Southern manufacturing outpaces north and Midlands', 'Productivity growth of 0.3% is ‘statistic of decade’', 'Heathrow shows detailed costings for third runway', 'UK visa numbers to be raised in science research push', 'Why China’s AI companies are struggling to evolve beyond surveillance', 'UK corporate pension transfer market set for record year', 'FirstFT: Today’s top stories\xa0', 'Bond wobble shrinks global pile of negative yields', 'Corporate Japan posts record number of M&A deals', 'Carney leaves a BoE more in tune with the modern world', 'Interim candidate lined up to succeed Bailey at FCA', 'US threats to Afghan war probe ‘troubling’, says ICC', 'Modi castigates protesters as death toll rises', 'Help fight the illegal wildlife trade', 'Ukrainegate: a guide to the US impeachment inquiry', 'FT’s foreign affairs podcast with Gideon Rachman', 'Pentagon wants open-source 5G plan to take on Huawei', 'Year in a word: Be water', 'New battle over Scottish independence has begun', 'Macron vows to forgo presidential pension amid strikes', 'Tesco pulls Christmas cards over China forced labour claim', 'Austerity, not the populists, destroyed Europe’s centre ground', 'US envoy defends Nord Stream 2 sanctions as ‘pro-European’', 'France loosens ties with west African currency', 'Brazilian consumers feel festive cheer as economy returns to life', 'Global food supply chains caught in honey trap', 'America’s competitiveness problem', 'How asset managers turned into business agitators', 'Fear of Russian attack hangs over Germany’s Chechens', 'India is at risk of sliding into a second Emergency', 'Pantomimes: Dame Fortune', 'UK election: how the Tories ‘got it done’', 'Political life shows managers need a stable team', 'Xinjiang security crackdown sparks Han Chinese exodus', 'Britain’s homeless crisis can be solved — here’s how', 'Political nous helps Bailey win race to head Bank of England', 'BuzzFeed’s international business losses quadruple', 'Death toll rises as anti-Modi protests intensify', 'Scars of Romania’s revolution still to heal', 'Thousands of new homes to be built on England’s floodplains', 'Mexico plans crackdown on private electricity market', 'Help fight the illegal wildlife trade', 'Ukrainegate: a guide to the US impeachment inquiry', 'FT’s foreign affairs podcast with Gideon Rachman', 'What lies ahead for Boris Johnson’s government?', 'Extinction/Chinese medicine: diseconomies of scales', 'Treasury rakes in £9m in Lifetime Isa penalties', 'India’s youth voice anger at Modi’s citizenship law', 'The economy is king in Trump’s re-election bid', 'Ukraine and Russia sign deal on gas supply to Europe', 'N Ireland talks on resuming Stormont put on pause', 'Johnson’s bill victory breaks Brexit gridlock', 'Argentina delays payments on $9bn in debt', 'Wall Street adds to record run', 'Thousands face lower tax bills after ‘loan charge’ ruling', 'Facebook bans pro-Trump media outlet over fake accounts', 'FT Asia-Pacific Innovative Lawyers 2020 open for submissions', 'Johnson clears path for Brexit with draft bill victory', 'Trump tries to spin impeachment into re-election gold', 'Corporate year in review: deals, drama, spies and successes', 'Violence escalates in Libya following Turkey pledge', 'New Bank of England governor offers stability', 'UK seeks extradition of US diplomat’s wife over teen death', 'CC Land and Meyer Bergman invest in £1.25bn London property', 'Puigdemont prepares to take seat as MEP after court win', 'Defence minister says UK military faces shortfall', 'So what happened to the Boris bounce?', 'Andrew Bailey wins race to be Bank of England governor', 'Colombia approves tax reforms despite protests', 'Help fight the illegal wildlife trade', 'Ukrainegate: a guide to the US impeachment inquiry', 'FT’s foreign affairs podcast with Gideon Rachman']

第一个错误是,您需要将
标题
for循环放在
页面
for循环中

其次,
格式(page)
似乎没有将页码正确地附加到URL。只需简单地将url连接为主url和页码的字符串版本即可

第三,将范围设置为
(1,6)
,因为
page=
page=1
是相同的

我已经在下面修改了你的代码。它应该会起作用

import requests
from bs4 import BeautifulSoup

news_titles=[]

for page in range(1,6):
    url="https://www.ft.com/world?page=" + str(page)
    result=requests.get(url)
    reshult=result.content
    soup=BeautifulSoup(reshult, "lxml")
    for title in soup.findAll("div",{"class":"o-teaser__heading"}):
        titles=title.find(text=True)
        news_titles.append(titles)

for item in news_titles:
    print (item)

范围(5)
从0到4,因此我不确定这是如何在第一次迭代中不引发错误的。。。相反,您可以在范围(1,6)内为页面设置
从1到5。这个
url=”https://www.ft.com/world?page=“.格式(页面)
应为:
url=”https://www.ft.com/world?page={0}.格式(第页)
。然后您需要将第二个
for
循环放在第一个
for
循环中,因为它需要为每个页面执行。啊,这很有意义。谢谢你的帮助,圣诞快乐!!!没问题。也祝你圣诞快乐。要是圣诞老人能给我一张赞成票或接受我的圣诞答覆就好了:P