Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
(Python)试图从网站中隔离一些数据_Python_Urllib - Fatal编程技术网

(Python)试图从网站中隔离一些数据

(Python)试图从网站中隔离一些数据,python,urllib,Python,Urllib,基本上,该脚本将从wallbase.cc的随机页面和toplist页面下载图像。本质上,它寻找一个7位数的字符串,该字符串将每个图像标识为该图像。它将该id输入url并下载。我唯一的问题似乎是隔离7位字符串 我想做的是 搜索如果只想使用默认库,可以使用正则表达式 pattern = re.compile(r'<div id="thumb(.{7})"') ... for data-id in re.findall(pattern, the_page): pass # do so

基本上,该脚本将从wallbase.cc的随机页面和toplist页面下载图像。本质上,它寻找一个7位数的字符串,该字符串将每个图像标识为该图像。它将该id输入url并下载。我唯一的问题似乎是隔离7位字符串

我想做的是


搜索
如果只想使用默认库,可以使用正则表达式

pattern = re.compile(r'<div id="thumb(.{7})"')

...

for data-id in re.findall(pattern, the_page):
    pass # do something with data-id

pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式

pattern = re.compile(r'<div id="thumb(.{7})"')

...

for data-id in re.findall(pattern, the_page):
    pass # do something with data-id

pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式

pattern = re.compile(r'<div id="thumb(.{7})"')

...

for data-id in re.findall(pattern, the_page):
    pass # do something with data-id

pattern=re.compile(r'如果只想使用默认库,可以使用正则表达式

pattern = re.compile(r'<div id="thumb(.{7})"')

...

for data-id in re.findall(pattern, the_page):
    pass # do something with data-id

pattern=re.compile(r'您可能希望使用像BeautifulSoup这样的web刮片库,请参见例如关于Python中的web刮片

import urllib2
from BeautifulSoup import BeautifulSoup

# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)

# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
    href = l.get('href')
    print href                # u'http://wallbase.cc/wallpaper/1750539'
    print href.split('/')[-1] # u'1750539'

您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取

import urllib2
from BeautifulSoup import BeautifulSoup

# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)

# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
    href = l.get('href')
    print href                # u'http://wallbase.cc/wallpaper/1750539'
    print href.split('/')[-1] # u'1750539'

您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取

import urllib2
from BeautifulSoup import BeautifulSoup

# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)

# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
    href = l.get('href')
    print href                # u'http://wallbase.cc/wallpaper/1750539'
    print href.split('/')[-1] # u'1750539'

您可能想要使用像BeautifulSoup这样的web抓取库,请参见例如关于Python中的web抓取

import urllib2
from BeautifulSoup import BeautifulSoup

# download and parse HTML
url = 'http://wallbase.cc/toplist'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)

# find the links we want
links = soup('a', href=re.compile('^http://wallbase.cc/wallpaper/\d+$'))
for l in links:
    href = l.get('href')
    print href                # u'http://wallbase.cc/wallpaper/1750539'
    print href.split('/')[-1] # u'1750539'

我忍不住链接到这个:我忍不住链接到这个:我忍不住链接到这个:我忍不住链接到这个: