Python 2.7 作为一个初出茅庐的蟒蛇,我不';我不明白为什么我会有一个无限循环?
此代码在以下情况下始终提供无限循环:Python 2.7 作为一个初出茅庐的蟒蛇,我不';我不明白为什么我会有一个无限循环?,python-2.7,while-loop,infinite-loop,Python 2.7,While Loop,Infinite Loop,此代码在以下情况下始终提供无限循环: pos1 = 0 pos2 = 0 url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>''' i = int(len(url_string)) #print i # debug while i > 0: pos1 = int(url_string.find('>')
pos1 = 0
pos2 = 0
url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>'''
i = int(len(url_string))
#print i # debug
while i > 0:
pos1 = int(url_string.find('>'))
#print pos1 # debug
pos2 = int(url_string.find('<', pos1))
#print pos2 # debug
url_string = url_string[pos2:]
#print url_string # debug
print int(len(url_string)) # debug
i = int(len(url_string))
pos1=0
pos2=0
url\u string=''每日新闻这是每日新闻。结束''
i=int(len(url\u字符串))
#打印i#调试
当i>0时:
pos1=int(url_string.find('>'))
#打印pos1#调试
pos2=int(url\u string.find('
将使用url\u字符串[-1:]
,这是一个由url\u字符串的最后一个字符组成的片段。在这一点上,Python一直在循环,没有找到,正如@user2357112所指出的那样,您永远不会超过字符串的结尾
有几个解决方案,但一个简单的解决方案(基于不知道您想要实现什么)是在循环中包含pos1和pos2的知识
while (i > 0 && pos1 >= 0 && pos2 >= 0):
如果未找到您要查找的任一字符,则循环将停止。拆分字符串并按如下方式计算字母数更容易:
map(len, url_string.split('<')) # This equals [0, 14, 4, 25, 3, 5, 3]
这适用于单字符分割
编辑
正如所指出的,要求是只提取不属于标签一部分的东西。然后一行
''.join( map(lambda x: x.split('>')[-1] , url_string.split('<')) )
'.join(map(lambda x:x.split('>'))[-1],url_string.split('看起来您试图解析HTML以从元素中获取数据(例如,我希望数据位于h1标记内,如“Daily News”)。如果是这种情况,我建议在此链接使用另一个名为BeautifulSoup4的库:
这就是说,因为我不确定这个程序到底要做什么,所以我分解了你的代码,希望你能更容易地看到变量发生了什么(现在,去掉while循环)。这将让你准确地看到你的代码在没有无限循环的情况下做了什么
# Setup Variables
pos1 = 0
pos2 = 0
url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>'''
i = int(len(url_string)) # the url_string length is 60 characters
print "Setting up Variables with string at ", i, " characters"
print "String is: ", url_string
"""string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is
wholly contained in s[start:end]. Return -1 on failure. Defaults for start and
end and interpretation of negative values is the same as for slices.
Source: http://docs.python.org/2/library/string.html
"""
print "Running through program first time"
pos1 = int(url_string.find('>'))
# This finds the first occurrence of '>', which is at position 6
pos2 = int(url_string.find('<', pos1))
# This finds the first occurrence of '<' after position 3 ('>'),
# which is at position 15
print "Pos1 is at:", pos1, " and pos2 is at:", pos2
url_string = url_string[pos2:] # trimming string down?
print "The string is now: ", url_string
# </h1><p>This is the daily news.</p><p>end</p>
print "The string length is now: ", int(len(url_string)) # string length now 45
i = int(len(url_string)) # updating the length var to the new length
#设置变量
pos1=0
pos2=0
url\u string=''每日新闻这是每日新闻。结束''
i=int(len(url_字符串))#url_字符串长度为60个字符
打印“使用字符串设置变量”,i,“字符”
打印“字符串为:”,url\u字符串
“”“string.find(s,sub[,start[,end]])
返回在s中找到子字符串sub的最低索引,以便sub
完全包含在s[start:end]中。失败时返回-1。默认值为start和end
负值的结束和解释与切片相同。
资料来源:http://docs.python.org/2/library/string.html
"""
打印“第一次运行程序”
pos1=int(url_string.find('>'))
#这将查找第一个出现的“>”,它位于位置6
pos2=int(url_string.find('您的调试输出是什么?它一定是一个很大的提示。print url_string
注意:不需要int
强制转换,html代码不是“url”。对于多个字符,例如在''
处拆分,您需要将最后一行修改为lens=lens+len('')*arange(lens))
这不也包括所有的标签吗?我相信这个想法是要输出所有不是标签的东西。这些是标签的位置。我想我误解了这个问题。给我一点时间。我会再看一遍代码……在这种情况下,一行代码'.join(map(lambda x:x.split('>)[-1],url\u string.split('1
import numpy as np
lens = np.cumsum( map(len, url_string.split('<')) )
lens = lens + arange(len(lens))
''.join( map(lambda x: x.split('>')[-1] , url_string.split('<')) )
# Setup Variables
pos1 = 0
pos2 = 0
url_string = '''<h1>Daily News </h1><p>This is the daily news.</p><p>end</p>'''
i = int(len(url_string)) # the url_string length is 60 characters
print "Setting up Variables with string at ", i, " characters"
print "String is: ", url_string
"""string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is
wholly contained in s[start:end]. Return -1 on failure. Defaults for start and
end and interpretation of negative values is the same as for slices.
Source: http://docs.python.org/2/library/string.html
"""
print "Running through program first time"
pos1 = int(url_string.find('>'))
# This finds the first occurrence of '>', which is at position 6
pos2 = int(url_string.find('<', pos1))
# This finds the first occurrence of '<' after position 3 ('>'),
# which is at position 15
print "Pos1 is at:", pos1, " and pos2 is at:", pos2
url_string = url_string[pos2:] # trimming string down?
print "The string is now: ", url_string
# </h1><p>This is the daily news.</p><p>end</p>
print "The string length is now: ", int(len(url_string)) # string length now 45
i = int(len(url_string)) # updating the length var to the new length