Python 从href链接提取CSS_Python_Beautifulsoup

Python 从href链接提取CSS

python

Python 从href链接提取CSS,python,beautifulsoup,Python,Beautifulsoup,这是通过传递网站url来提取网站所有href链接的代码 from BeautifulSoup import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen("http://kteq.in/services") soup = BeautifulSoup(html_page) for link in soup.findAll('a'): if link.get('href')==Non

这是通过传递网站url来提取网站所有href链接的代码

from BeautifulSoup import BeautifulSoup
import urllib2
import re
   html_page = urllib2.urlopen("http://kteq.in/services")
   soup = BeautifulSoup(html_page)
   for link in soup.findAll('a'):
      if link.get('href')==None:
          continue
      result = re.sub(r"http\S+", "", link.get('href'))
      print result

当我运行上述代码时，会提取该网站的href链接。我得到以下输出

  index
  index
  #
  solutions#internet-of-things
  solutions#online-billing-and-payment-solutions
  solutions#customer-relationship-management
  solutions#enterprise-mobility
  solutions#enterprise-content-management
  solutions#artificial-intelligence
  solutions#b2b-and-b2c-web-portals
  solutions#robotics
  solutions#augement-reality-virtual-reality
  solutions#azure
  solutions#omnichannel-commerce
  solutions#document-management
  solutions#enterprise-extranets-and-intranets
  solutions#business-intelligence
  solutions#enterprise-resource-planning
  services
  clients
  contact
  #
  #
  #

  #
  #
  #
  #
  #contactform
  #
  #
  #
  #
  #
  #
  #
  #
  # 
  #
  #
  #
  #
  #
  #
  index
  services
  #
  contact
  #
  iOSDevelopmentServices
  AndroidAppDevelopment
  WindowsAppDevelopment
  HybridSoftwareSolutions
  CloudServices
  HTML5Development
  iPadAppDevelopment
  services
  services
  services
  services
  services
  services
  contact
  contact
  contact
  contact
  contact

  #
  #
  #
  #

现在，我必须从这些href链接中提取CSS。例如，我必须从我在输出中获得的'index'href链接中提取CSS。请推荐我。

您可以循环浏览您收集的所有href链接，并获取这些页面中的css链接

base_link='http://kteq.in/'
hrefs = ['index']
for link in hrefs:
    url = base_link+link
    html_page = urllib.request.urlopen(url)
    soup = BeautifulSoup(html_page,'html.parser')
    css_links = []
    for link in soup.findAll('link'):
        css_links.append(re.search(r"[A-Za-z0-9:/.-]+.css",link.get('href')))

for i in css_links:
    if i==None:
        continue
   print(i[0])

通过浏览索引页面，我得到了以下css链接

输出 bootstrap/bootstrap.min.css

//fonts.googleapis.com/css
cards/card.css
GalleryStyle/set1.css
css/custom.css
页面转换/css/component.css
页面转换/css/animations.css

carousel/1.5.5/slick.min.css
css/scrollpage.css
css/changingtext.css
css/color-slider.css

非常感谢你的建议。这很有帮助。