Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python中显示url标题的Web爬虫_Python_Web_Web Crawler - Fatal编程技术网

Python中显示url标题的Web爬虫

Python中显示url标题的Web爬虫,python,web,web-crawler,Python,Web,Web Crawler,所以这里的交易是:我有一个网络爬虫,我已经设置好解析用户输入的url。到目前为止,我已经完成了任务,我可以得到URL源的打印输出。现在我需要完成它。我需要显示页面中包含的URL的所有标题。例如,如果用户想要解析nytimes.com,bot将显示页面上所有指向其他URL的标题。如“最佳感恩节食谱”等。以下是我的代码: import urllib2 website = raw_input('Enter the website url: ') getwebsite = urllib2.urlop

所以这里的交易是:我有一个网络爬虫,我已经设置好解析用户输入的url。到目前为止,我已经完成了任务,我可以得到URL源的打印输出。现在我需要完成它。我需要显示页面中包含的URL的所有标题。例如,如果用户想要解析nytimes.com,bot将显示页面上所有指向其他URL的标题。如“最佳感恩节食谱”等。以下是我的代码:

import urllib2

website = raw_input('Enter the website url: ')

getwebsite = urllib2.urlopen(website)
readwebsite = getwebsite.read()
print readwebsite
您可以使用提取所有链接:

from bs4 import BeautifulSoup
import requests

r = requests.get("http://www.nytimes.com")

soup = BeautifulSoup(r.content)
print([a.get('href') for a in soup.find_all("a") ])
['http://www.nytimes.com/content/help/site/ie8-support.html', '#top-news', '#site-index-navigation', 'http://international.nytimes.com', 'http://cn.nytimes.com', 'http://www.nytimes.com/pages/todayspaper/index.html', 'http://www.nytimes.com/video', 'http://www.nytimes.com/weather', 'http://www.nytimes.com/pages/world/index.html', 'http://www.nytimes.com/pages/national/index.html', 'http://www.nytimes.com/pages/politics/index.html', 'http://www.nytimes.com/pages/nyregion/index.html', 'http://www.nytimes.com/pages/business/index.html', 'http://www.nytimes.com/pages/business/international/index.html', 'http://www.nytimes.com/pages/opinion/index.html', 'http://www.nytimes.com/pages/opinion/international/index.html', 'http://www.nytimes.com/pages/technology/index.html', 'http://www.nytimes.com/pages/science/index.html', 'http://www.nytimes.com/pages/health/index.html', 'http://www.nytimes.com/pages/sports/index.html', 'http://www.nytimes.com/pages/sports/international/index.html', 'http://www.nytimes.com/pages/arts/index.html', 'http://www.nytimes.com/pages/arts/international/index.html', 'http://www.nytimes.com/pages/fashion/index.html', 'http://www.nytimes.com/pages/style/international/index.html', 'http://www.nytimes.com/pages/dining/index.html', 'http://www.nytimes.com/pages/dining/international/index.html', 'http://www.nytimes.com/pages/garden/index.html', 'http://www.nytimes.com/pages/travel/index.html', 'http://www.nytimes.com/pages/magazine/index.html', 'http://www.nytimes.com/pages/realestate/index.html', 'http://www.nytimes.com', 'http://international.nytimes.com/?iht', 'http://www.nytimes.com/pages/world/index.html', 'http://www.nytimes.com/pages/national/index.html', 'http://www.nytimes.com/pages/politics/index.html', 'http://www.nytimes.com/pages/nyregion/index.html', 'http://www.nytimes.com/pages/business/index.html', 'http://www.nytimes.com/pages/business/international/index.html', 'http://www.nytimes.com/pages/opinion/index.html', 'http://www.nytimes.com/pages/opinion/international/index.html', 'http://www.nytimes.com/pages/technology/index.html', 'http://www.nytimes.com/pages/science/index.html', 'http://www.nytimes.com/pages/health/index.html', 'http://www.nytimes.com/pages/sports/index.html', 'http://www.nytimes.com/pages/sports/international/index.html', 'http://www.nytimes.com/pages/arts/index.html', 'http://www.nytimes.com/pages/arts/international/index.html', 'http://www.nytimes.com/pages/fashion/index.html', 'http://www.nytimes.com/pages/style/international/index.html', 'http://www.nytimes.com/pages/dining/index.html', 'http://www.nytimes.com/pages/dining/international/index.html', 'http://www.nytimes.com/pages/garden/index.html', 'http://www.nytimes.com/pages/travel/index.html', 'http://www.nytimes.com/pages/magazine/index.html', 'http://www.nytimes.com/pages/realestate/index.html', 'http://www.nytimes.com/pages/obituaries/index.html', 'http://www.nytimes.com/video/', 'http://www.nytimes.com/upshot/', None, '', 'http://www.nytimes.com/pages/world/africa/index.html', 'http://www.nytimes.com/pages/world/americas/index.html', 'http://www.nytimes.com/pages/world/asia/index.html', 'http://www.nytimes.com/pages/world/europe/index.html', 'http://www.nytimes.com/pages/world/middleeast/index.html', 'http://atwar.blogs.nytimes.com/', 'http://india.blogs.nytimes.com/', 'http://sinosphere.blogs.nytimes.com/', 'http://www.nytimes.com/pages/education/index.html', 'http://www.nytimes.com/politics/first-draft/', 'http://elections.nytimes.com/', 'http://cityroom.blogs.nytimes.com/', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/events/', 'http://dealbook.nytimes.com/', 'http://www.nytimes.com/pages/business/economy/index.html', 'http://www.nytimes.com/pages/business/energy-environment/index.html', 'http://markets.on.nytimes.com/', 'http://www.nytimes.com/pages/business/media/index.html', 'http://www.nytimes.com/pages/business/smallbusiness/index.html', 'http://www.nytimes.com/pages/your-money/index.html', 'http://dealbook.nytimes.com/', 'http://www.nytimes.com/pages/business/economy/index.html', 'http://www.nytimes.com/pages/business/energy-environment/index.html', 'http://markets.on.nytimes.com/', 'http://www.nytimes.com/pages/business/media/index.html', 'http://www.nytimes.com/pages/business/smallbusiness/index.html', 'http://www.nytimes.com/pages/your-money/index.html', 'http://www.nytimes.com/pages/opinion/index.html#columnists', 'http://www.nytimes.com/pages/opinion/index.html#editorials', 'http://www.nytimes.com/pages/opinion/index.html#contributing', 'http://www.nytimes.com/pages/opinion/index.html#op-ed', 'http://www.nytimes.com/pages/opinion/index.html#opinionator', 'http://www.nytimes.com/pages/opinion/index.html#letters', 'http://www.nytimes.com/pages/opinion/index.html#sundayreview', 'http://www.nytimes.com/pages/opinion/index.html#takingNote', 'http://www.nytimes.com/pages/opinion/index.html#roomfordebate', 'http://publiceditor.blogs.nytimes.com/', 'http://wordplay.blogs.nytimes.com/cartoons/', 'http://www.nytimes.com/pages/opinion/international/index.html#columnistsGlobal', 'http://www.nytimes.com/pages/opinion/international/index.html#editorialsGlobal', 'http://www.nytimes.com/pages/opinion/international/index.html#contributing', 'http://www.nytimes.com/pages/opinion/international/index.html#op-edGlobal', 'http://www.nytimes.com/pages/opinion/index.html#opinionator', 'http://www.nytimes.com/pages/opinion/international/index.html#letters', 'http://www.nytimes.com/pages/opinion/index.html#sundayreview', 'http://www.nytimes.com/pages/opinion/international/index.html#takingNote', 'http://www.nytimes.com/pages/opinion/international/index.html#roomfordebate', 'http://publiceditor.blogs.nytimes.com/', 'http://wordplay.blogs.nytimes.com/cartoons/', 'http://bits.blogs.nytimes.com/', 'http://www.nytimes.com/pages/technology/personaltech/index.html', 'http://www.nytimes.com/pages/science/earth/index.html', 'http://www.nytimes.com/pages/science/space/index.html', 'http://well.blogs.nytimes.com/', 'http://www.nytimes.com/health/guides/index.html', 'http://www.nytimes.com/pages/health/nutrition/index.html', 'http://www.nytimes.com/pages/health/policy/index.html', 'http://newoldage.blogs.nytimes.com/', 'http://www.nytimes.com/pages/health/views/index.html', 'http://www.nytimes.com/pages/sports/baseball/index.html', 'http://www.nytimes.com/pages/sports/ncaabasketball/index.html', 'http://www.nytimes.com/pages/sports/basketball/index.html', 'http://www.nytimes.com/pages/sports/ncaafootball/index.html', 'http://www.nytimes.com/pages/sports/football/index.html', 'http://www.nytimes.com/pages/sports/golf/index.html', 'http://www.nytimes.com/pages/sports/hockey/index.html', 'http://www.nytimes.com/pages/sports/soccer/index.html', 'http://www.nytimes.com/pages/sports/tennis/index.html', 'http://www.nytimes.com/pages/sports/baseball/index.html', 'http://www.nytimes.com/pages/sports/ncaabasketball/index.html', 'http://www.nytimes.com/pages/sports/basketball/index.html', 'http://www.nytimes.com/pages/sports/ncaafootball/index.html', 'http://www.nytimes.com/pages/sports/football/index.html', 'http://www.nytimes.com/pages/sports/golf/index.html', 'http://www.nytimes.com/pages/sports/hockey/index.html', 'http://www.nytimes.com/pages/sports/soccer/index.html', 'http://www.nytimes.com/pages/sports/tennis/index.html', 'http://www.nytimes.com/pages/arts/design/index.html', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/pages/books/index.html', 'http://www.nytimes.com/pages/arts/dance/index.html', 'http://www.nytimes.com/pages/movies/index.html', 'http://www.nytimes.com/pages/arts/music/index.html', 'http://www.nytimes.com/events/', 'http://www.nytimes.com/pages/arts/television/index.html', 'http://www.nytimes.com/pages/theater/index.html', 'http://www.nytimes.com/pages/arts/video-games/index.html', 'http://www.nytimes.com/pages/arts/design/index.html', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/pages/books/index.html', 'http://www.nytimes.com/pages/arts/dance/index.html', 'http://www.nytimes.com/pages/movies/index.html', 'http://www.nytimes.com/pages/arts/music/index.html', 'http://www.nytimes.com/events/', 'http://www.nytimes.com/pages/arts/television/index.html', 'http://www.nytimes.com/pages/theater/index.html', 'http://www.nytimes.com/pages/arts/video-games/index.html', 'http://www.nytimes.com/pages/t-magazine/index.html', 'http://parenting.blogs.nytimes.com/', 'http://runway.blogs.nytimes.com/', 'http://www.nytimes.com/pages/fashion/weddings/index.html', 'http://www.nytimes.com/pages/t-magazine/index.html', 'http://parenting.blogs.nytimes.com/', 'http://runway.blogs.nytimes.com/', 'http://www.nytimes.com/pages/fashion/weddings/index.html', 'http://cooking.nytimes.com', 'http://www.nytimes.com/restaurants/search/', 'http://cooking.nytimes.com', 'http://www.nytimes.com/restaurants/search/', 'http://www.nytimes.com/pages/realestate/commercial/index.html', 'http://www.nytimes.com/pages/great-homes-and-destinations/index.html', 'http://realestate.nytimes.com/my/saved_listings.aspx', 'http://www.nytimes.com/video/us-politics', 'http://www.nytimes.com/video/world', 'http://www.nytimes.com/video/n-y-region', 'http://www.nytimes.com/video/opinion', 'http://www.nytimes.com/video/times-documentaries', 'http://www.nytimes.com/video/business', 'http://www.nytimes.com/video/technology', 'http://www.nytimes.com/video/arts', 'http://www.nytimes.com/video/style', 'http://www.nytimes.com/video/health', 'http://www.nytimes.com/video/dining-and-wine', 'http://www.nytimes.com/video/travel', 'http://www.nytimes.com/video/sports', 'http://www.nytimes.com/video/real-estate', 'http://www.nytimes.com/video/science', 'http://www.nytimes.com/crosswords/', 'http://www.nytimes.com/times-insider', 'http://www.nytimes.com/pages/todayspaper/index.html', 'http://www.nytimes.com/pages/automobiles/index.html', 'http://www.nytimes.com/pages/corrections/index.html', 'http://www.nytimes.com/pages/multimedia/index.html', 'http://lens.blogs.nytimes.com/', 'http://www.nytimes.com/ref/classifieds/', 'http://www.nytimes.com/marketing/tools-and-services/', 'http://jobmarket.nytimes.com/pages/jobs/index.html', 'http://www.nytimes.com/pages/topics/', 'http://www.nytimes.com/interactive/blogs/directory.html', 'http://www.nytstore.com/?&t=qry542&utm_source=nytimes&utm_medium=HPB&utm_content=hp_browsetree&utm_campaign=NYT-HP&module=SectionsNav&action=click&region=TopBar&version=BrowseTree&contentCollection=NYT%20Store&contentPlacement=2&pgtype=Homepage', 'http://www.nytimes.com/times-journeys/?utm_source=nytimes&utm_medium=HPLink&utm_content=hp_browsetree&utm_campaign=NYT-HP&module=SectionsNav&action=click&region=TopBar&version=BrowseTree&contentCollection=Times%20Journeys&contentPlacement=2&pgtype=Homepage', 'http://www.nytimes.com/seeallnav', 'http://www.nytimes.com/membercenter', '', 'http://www.nytimes.com/pages/opinion/index.html#columnists/charlesMBlow', 'http://www.nytimes.com/pages/opinion/index.html#columnists/davidBrooks', 'http://www.nytimes.com/pages/opinion/index.html#columnists/frankBruni', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rogerCohen', 'http://www.nytimes.com/pages/opinion/index.html#columnists/gailCollins', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rossDouthat', 'http://www.nytimes.com/pages/opinion/index.html#columnists/maureenDowd', 'http://www.nytimes.com/pages/opinion/index.html#columnists/thomasLFriedman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/nicholasDKristof', 'http://www.nytimes.com/pages/opinion/index.html#columnists/paulKrugman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/joeNocera', 'http://www.nytimes.com/pages/opinion/index.html#columnists/charlesMBlow', 'http://www.nytimes.com/pages/opinion/index.html#columnists/davidBrooks', 'http://www.nytimes.com/pages/opinion/index.html#columnists/frankBruni', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rogerCohen', 'http://www.nytimes.com/pages/opinion/index.html#columnists/gailCollins', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rossDouthat', 'http://www.nytimes.com/pages/opinion/index.html#columnists/maureenDowd', 'http://www.nytimes.com/pages/opinion/index.html#columnists/thomasLFriedman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/nicholasDKristof', 'http://www.nytimes.com/pages/opinion/index.html#columnists/paulKrugman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/joeNocera', 'http://www.nytimes.com/2014/11/28/business/drug-maker-gave-large-payments-to-doctors-with-troubled-track-records.html', 'http://www.nytimes.com/upshot', 'http://www.nytimes.com/2014/11/28/upshot/under-pressure-from-uber-taxi-medallion-prices-are-plummeting.html', 'http://www.nytimes.com/2014/11/28/upshot/under-pressure-from-uber-taxi-medallion-prices-are-plummeting.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/27/us/without-passing-a-single-law-obama-crafts-bold-enviornmental-policy.html', 'http://www.nytimes.com/2014/11/27/us/ferguson-experts-weigh-darren-wilsons-decisions-leading-to-fatal-shooting-of-michael-brown.html', 'http://www.nytimes.com/2014/11/27/us/ferguson-experts-weigh-darren-wilsons-decisions-leading-to-fatal-shooting-of-michael-brown.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/us/ferguson-protests-michael-brown-darren-wilson.html', 'http://www.nytimes.com/2014/11/27/us/after-disputed-verdict-reckoning-for-ferguson.html', 'http://www.nytimes.com/2014/11/28/arts/international/p-d-james-mystery-novelist-known-as-queen-of-crime-dies-at-94.html', 'http://www.nytimes.com/2014/11/28/arts/international/p-d-james-mystery-novelist-known-as-queen-of-crime-dies-at-94.html', 'http://www.nytimes.com/2014/11/28/world/middleeast/iran-nuclear-talks-extension.html', 'http://www.nytimes.com/2014/11/26/world/middleeast/iran-nuclear-talks-extension.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/business/international/opec-leaves-oil-production-quotas-.......]
使用您自己的代码:

import urllib2
from bs4 import BeautifulSoup

website = raw_input('Enter the website url: ')
get_website = urllib2.urlopen(website)
read_website = get_website.read()
soup = BeautifulSoup(read_website)
print([a.get('href') for a in soup.find_all("a") ])