Python 函数调用问题
我有这个python代码,但当我运行它时,它只是打印出第一个目标,下面是我的python代码:Python 函数调用问题,python,web-scraping,Python,Web Scraping,我有这个python代码,但当我运行它时,它只是打印出第一个目标,下面是我的python代码: def get_next_target(S): start_link = S.find('<a href=') start_quote = S.find('"', start_link) end_quote = S.find('"', start_quote + 1) url = S[start_quote + 1:end_quote] print url
def get_next_target(S):
start_link = S.find('<a href=')
start_quote = S.find('"', start_link)
end_quote = S.find('"', start_quote + 1)
url = S[start_quote + 1:end_quote]
print url
return url, end_quote
get_next_target(S)
def获取下一个目标:
start_link=S.find(“我认为您应该使用BeautifulSoup从html/xml中提取信息
In [1]: from bs4 import BeautifulSoup
In [2]: html = '''<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href=
...: "www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'''
In [3]: soup = BeautifulSoup(html, 'lxml')
In [4]: for a in soup.find_all('a'):
...: print(a['href'])
...:
www.target1.com
www.target2.com
www.target3.com
[1]中的:从bs4导入BeautifulSoup
在[2]:html=''sa''
在[3]中:soup=BeautifulSoup(html,'lxml')
在[4]中:对于汤中的a。查找所有('a'):
…:打印(a['href'])
...:
www.target1.com
www.target2.com
www.target3.com
我认为应该使用BeautifulSoup从html/xml中提取信息
In [1]: from bs4 import BeautifulSoup
In [2]: html = '''<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href=
...: "www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'''
In [3]: soup = BeautifulSoup(html, 'lxml')
In [4]: for a in soup.find_all('a'):
...: print(a['href'])
...:
www.target1.com
www.target2.com
www.target3.com
[1]中的:从bs4导入BeautifulSoup
在[2]:html=''sa''
在[3]中:soup=BeautifulSoup(html,'lxml')
在[4]中:对于汤中的a。查找所有('a'):
…:打印(a['href'])
...:
www.target1.com
www.target2.com
www.target3.com
如果您在逻辑上希望在不使用任何特殊模块的情况下实现这一点,那么下面的代码将实现这一点
import re
import sys
S = '<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href="www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'
abc = []
def get_next_target(S):
search_index = [i.start() for i in re.finditer('<a href=', S)]
for j in range(len(search_index)):
if ( j == len(search_index)-1):
A =S[ search_index[j]:len(S) ]
search_start_index = A.find('"')
search_end_index = A.rfind('"')
start_final = search_index[j] + search_start_index + 1
start_end = search_index[j] + search_end_index
final_result = S[ start_final:start_end ]
abc.append(final_result)
print abc
else:
A = S[ search_index[j]:search_index[j+1] ]
search_start_index = A.find('"')
search_end_index = A.rfind('"')
start_final = search_index[j] + search_start_index + 1
start_end = search_index[j] + search_end_index
final_result = S[ start_final:start_end ]
abc.append(final_result)`enter code here`
get_next_target(S)
重新导入
导入系统
S='sa'
abc=[]
def获取下一个目标:
在re.finditer(“中为i搜索_index=[i.start()”)如果您在逻辑上希望在不使用任何特殊模块的情况下实现这一点,那么下面的代码将实现这一点
import re
import sys
S = '<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href="www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'
abc = []
def get_next_target(S):
search_index = [i.start() for i in re.finditer('<a href=', S)]
for j in range(len(search_index)):
if ( j == len(search_index)-1):
A =S[ search_index[j]:len(S) ]
search_start_index = A.find('"')
search_end_index = A.rfind('"')
start_final = search_index[j] + search_start_index + 1
start_end = search_index[j] + search_end_index
final_result = S[ start_final:start_end ]
abc.append(final_result)
print abc
else:
A = S[ search_index[j]:search_index[j+1] ]
search_start_index = A.find('"')
search_end_index = A.rfind('"')
start_final = search_index[j] + search_start_index + 1
start_end = search_index[j] + search_end_index
final_result = S[ start_final:start_end ]
abc.append(final_result)`enter code here`
get_next_target(S)
重新导入
导入系统
S='sa'
abc=[]
def获取下一个目标:
在re.finditer中搜索i的索引=[i.start()(“S.find
只找到第一个,那么你试图解决你的问题吗?但是我返回了结尾,所以它一直在更新页面,P.S我刚开始学习python。你只调用了函数一次,而不是三次。即使你调用了,它仍然会找到第一个。此外,你返回了是的,但是你在做你只调用了函数我知道,我怎样才能修复它?S.find
只找到第一个,那么你试图解决你的问题吗?但是我返回了结尾,所以它一直在更新页面,P.S我刚开始学习python。你只调用了函数一次,而不是三次。即使你调用了,它也总是会我会找到第一个。此外,您确实返回了
,是的。但是您什么也没做。您只调用了函数。我知道,我如何修复它?