Python 2.7 如何拆分和删除url中不需要的字符串?
我有以下代码:Python 2.7 如何拆分和删除url中不需要的字符串?,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我有以下代码: import urllib from bs4 import BeautifulSoup f = open('log1.txt', 'w') url ='http://www.brothersoft.com/tamil-font-513607.html' pageUrl = urllib.urlopen(url) soup = BeautifulSoup(pageUrl) for a in soup.select("div.class1.coLeft a[href]"):
import urllib
from bs4 import BeautifulSoup
f = open('log1.txt', 'w')
url ='http://www.brothersoft.com/tamil-font-513607.html'
pageUrl = urllib.urlopen(url)
soup = BeautifulSoup(pageUrl)
for a in soup.select("div.class1.coLeft a[href]"):
try:
suburl = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
f.write ('http://www.brothersoft.com'+a['href']+'\n')
except:
print 'cannot read'
f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')
pass
content = urllib.urlopen(suburl)
soup = BeautifulSoup(content)
for a in soup.select("div.Sever1.coLeft a[href]"):
try:
suburl2 = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
f.write ('http://www.brothersoft.com'+a['href']+'\n')
except:
print 'cannot read'
f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')
pass
content = urllib.urlopen(suburl2)
soup = BeautifulSoup(content)
try:
suburl3 = soup.find('body')['onload'][10:-2]
print suburl3.replace("&" + url.split('&')[-1],"")
#f.write (soup.find('body')['onload'][10:-2]+'\n')
except:
print 'cannot read'
f.write(soup.find('body')['onload'][10:-2]+'\n')
pass
f.close()
我希望输出应如下所示:
试试这个:
url = "http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe&name=SynthFont"
print url.replace("&" + url.split('&')[-1],"")
输出:
http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
您的代码(有更改): 输出:
http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
这就是你想要的吗?我编辑了我的问题。我尝试了你的代码,但没有改变。你的代码输出的URL字符串与你想要的完全不同。换一下,让我看看你需要什么是的,这就是我想要的。你在哪里换的?我看不到。您必须将
url
更改为suburl3
两次(替换之前和之后),您上次尝试时的意思是:?