在Python2.x中拆分URL_Python_Python 2.7_Urlparse

在Python2.x中拆分URL

python python-2.7

在Python2.x中拆分URL,python,python-2.7,urlparse,Python,Python 2.7,Urlparse,我在一些HTML代码中解析了一个链接，如下所示：- "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?" 我想做的是从第二次出现的http开始提取代码的第二部分：所以在上面的例子中，我想提取 "http://truelink.com/football/abcde.html?" 我已经考虑过将URL分为多个部分，但我不确定随着时间的推移，第一部分的结构是否会保持不变是否可以识别第二次出现的“http

我在一些HTML代码中解析了一个链接，如下所示：-

"http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

我想做的是从第二次出现的http开始提取代码的第二部分：所以在上面的例子中，我想提取

"http://truelink.com/football/abcde.html?"

我已经考虑过将URL分为多个部分，但我不确定随着时间的推移，第一部分的结构是否会保持不变

是否可以识别第二次出现的“http”，然后从头到尾解析代码

link = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

link[link.rfind("http://"):]

"http://truelink.com/football/abcde.html?"

这就是我要做的。rfind查找http的最后一次出现并返回索引。在您的示例中，这种情况显然是真实的原始url。然后可以提取从该索引开始直到结束的子字符串

因此，如果您有一些字符串myStr，则会使用类似数组的表达式在python中提取一个子字符串：

myStr[0]    # returns the first character
myStr[0:5]  # returns the first 5 letters, so that 0 <= characterIndex < 5
myStr[5:]   # returns all characters from index 5 to the end of the string
myStr[:5]   # is the same like myStr[0:5]

我会这样做：

addr = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"
httpPart = 'http://'
split = addr.split(httpPart)
res = []
for str in split:
    if (len(str) > 0):
        res.append(httpPart+str);
print res

如果URL是http://advert.com/go/2/12345/0/http://truelink.com/football/http?@那应该是你真正的答案：啊，是的，我明白了。编辑：@ascenator也-如果在字符串中找不到http://您将得到一些有趣的结果：这是真的，但哪个url不提供http://？特别是当这位女士说出于某种原因，他有包含真实URL的广告URL时。只是出于好奇-你怎么会得到这样一个字符串