Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 如何拆分和删除url中不需要的字符串?_Python 2.7_Beautifulsoup - Fatal编程技术网

Python 2.7 如何拆分和删除url中不需要的字符串?

Python 2.7 如何拆分和删除url中不需要的字符串?,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我有以下代码: import urllib from bs4 import BeautifulSoup f = open('log1.txt', 'w') url ='http://www.brothersoft.com/tamil-font-513607.html' pageUrl = urllib.urlopen(url) soup = BeautifulSoup(pageUrl) for a in soup.select("div.class1.coLeft a[href]"):

我有以下代码:

import urllib
from bs4 import BeautifulSoup

f = open('log1.txt', 'w')

url ='http://www.brothersoft.com/tamil-font-513607.html'
pageUrl = urllib.urlopen(url)
soup = BeautifulSoup(pageUrl)

for a in soup.select("div.class1.coLeft a[href]"):
    try:
        suburl = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
        f.write ('http://www.brothersoft.com'+a['href']+'\n')
    except:
        print 'cannot read'
        f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')

        pass

    content = urllib.urlopen(suburl)
    soup = BeautifulSoup(content)
    for a in soup.select("div.Sever1.coLeft a[href]"):
        try:
            suburl2 = ('http://www.brothersoft.com'+a['href']).encode('utf-8','replace')
            f.write ('http://www.brothersoft.com'+a['href']+'\n')
        except:
            print 'cannot read'
            f.write('cannot read:'+'http://www.brothersoft.com'+a['href']+'\n')

            pass

        content = urllib.urlopen(suburl2)
        soup = BeautifulSoup(content)
        try:
            suburl3 = soup.find('body')['onload'][10:-2]
            print suburl3.replace("&" + url.split('&')[-1],"")
            #f.write (soup.find('body')['onload'][10:-2]+'\n')
        except:
            print 'cannot read'
            f.write(soup.find('body')['onload'][10:-2]+'\n')

            pass
f.close()
我希望输出应如下所示:

试试这个:

url = "http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe&name=SynthFont"
print url.replace("&" + url.split('&')[-1],"")
输出:

http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe

您的代码(有更改):

输出:

http://www.brothersoft.com/d.php?soft_id=159403&url=http%3A%2F%2Ffiles.brothersoft.com%2Fmp3_audio%2Fmidi_tools%2FSynthFontSetup.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Ffiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe&name=Tamil%20Font
http://www.brothersoft.com/d.php?soft_id=513607&url=http%3A%2F%2Fusfiles.brothersoft.com%2Fphotograph_graphics%2Ffont_tools%2Fkeyman.exe

这就是你想要的吗?

我编辑了我的问题。我尝试了你的代码,但没有改变。你的代码输出的URL字符串与你想要的完全不同。换一下,让我看看你需要什么是的,这就是我想要的。你在哪里换的?我看不到。您必须将
url
更改为
suburl3
两次(替换之前和之后),您上次尝试时的意思是:?