(Python)试图从网站中隔离一些数据
基本上,该脚本将从wallbase.cc的随机页面和toplist页面下载图像。本质上,它寻找一个7位数的字符串,该字符串将每个图像标识为该图像。它将该id输入url并下载。我唯一的问题似乎是隔离7位字符串 我想做的是(Python)试图从网站中隔离一些数据,python,urllib,Python,Urllib,基本上,该脚本将从wallbase.cc的随机页面和toplist页面下载图像。本质上,它寻找一个7位数的字符串,该字符串将每个图像标识为该图像。它将该id输入url并下载。我唯一的问题似乎是隔离7位字符串 我想做的是 搜索如果只想使用默认库,可以使用正则表达式 pattern = re.compile(r'<div id="thumb(.{7})"') ... for data-id in re.findall(pattern, the_page): pass # do so
搜索
如果只想使用默认库,可以使用正则表达式
pattern = re.compile(r'<div id="thumb(.{7})"')
...
for data-id in re.findall(pattern, the_page):
pass # do something with data-id
pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式
pattern = re.compile(r'<div id="thumb(.{7})"')
...
for data-id in re.findall(pattern, the_page):
pass # do something with data-id
pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式
pattern = re.compile(r'<div id="thumb(.{7})"')
...
for data-id in re.findall(pattern, the_page):
pass # do something with data-id
pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式
pattern = re.compile(r'<div id="thumb(.{7})"')
...
for data-id in re.findall(pattern, the_page):
pass # do something with data-id
pattern=re.compile(r'您可能希望使用像BeautifulSoup这样的web刮片库,请参见例如关于Python中的web刮片
import urllib2
from BeautifulSoup import BeautifulSoup
# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
href = l.get('href')
print href # u'http://wallbase.cc/wallpaper/1750539'
print href.split('/')[-1] # u'1750539'
您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取
import urllib2
from BeautifulSoup import BeautifulSoup
# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
href = l.get('href')
print href # u'http://wallbase.cc/wallpaper/1750539'
print href.split('/')[-1] # u'1750539'
您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取
import urllib2
from BeautifulSoup import BeautifulSoup
# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
href = l.get('href')
print href # u'http://wallbase.cc/wallpaper/1750539'
print href.split('/')[-1] # u'1750539'
您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取
import urllib2
from BeautifulSoup import BeautifulSoup
# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
href = l.get('href')
print href # u'http://wallbase.cc/wallpaper/1750539'
print href.split('/')[-1] # u'1750539'
我忍不住链接到这个:我忍不住链接到这个:我忍不住链接到这个:我忍不住链接到这个: