Python 从解析的HTML中删除转义序列
我正在使用PythonPython 从解析的HTML中删除转义序列,python,html,escaping,mechanize,Python,Html,Escaping,Mechanize,我正在使用Pythonmechanize模块向网站提交一个简单的查询,然后分解返回的元素以获得所需的数据。但我似乎无法正确处理返回的转义序列。这是我的密码: def stripEscape(string): #credit goes to sarnold delete = "" i=1 while (i<0x20): delete += chr(i) i += 1 t = string.translate(None, d
mechanize
模块向网站提交一个简单的查询,然后分解返回的元素以获得所需的数据。但我似乎无法正确处理返回的转义序列。这是我的密码:
def stripEscape(string): #credit goes to sarnold
delete = ""
i=1
while (i<0x20):
delete += chr(i)
i += 1
t = string.translate(None, delete)
return t
def getHTML(metID):
br = mechanize.Browser()
response = br.open("http://urlgoeshere.com")
br.form = list(br.forms())[0]
br["PROMPT12"] = metID
response = br.submit()
htmlText = response.read()
parseHTML(htmlText)
def parseHTML(htmlText):
htmlText.index('table')
arr = re.split(r'(</?\w{2}>)',htmlText) # everything after background tag
logFile = open('Log.txt','wb')
for ele in arr:
ele = stripEscape(ele)
if ele == '':
arr.remove(ele)
for ele in arr:
logFile.write("ele: "+ele+'\n')
if re.match('/table', ele):
logFile.write("END OF TABLE FOUND")
logFile.write("\nele: "+ele+'\n')
break
# other element filters
def stripeescape(string):#归功于sarnold
delete=“”
i=1
而(i第一个for循环中的ele元素未保存到数组中
for ele in arr:
ele = stripEscape(ele)
if ele == '':
arr.remove(ele)
这部分代码只会更改ele
元素NOT,arr
arr
将保持不变。因此,所有转义序列都将NOT删除。您可以在该循环之后打印arr
来测试它
因此,您需要做的是将其保存为新数组,然后供下一个循环使用
for ele in arr:
if ele != "":
newArray.append(stripEscape(ele))
for ele in newArray:
logFile.write("ele: "+ele+'\n')
if re.match('/table', ele):
logFile.write("END OF TABLE FOUND")
logFile.write("\nele: "+ele+'\n')
break
绝对是一个脸掌的时刻。应该看到的。
for ele in arr:
if ele != "":
newArray.append(stripEscape(ele))
for ele in newArray:
logFile.write("ele: "+ele+'\n')
if re.match('/table', ele):
logFile.write("END OF TABLE FOUND")
logFile.write("\nele: "+ele+'\n')
break