Html python 3中给定行号和偏移量的子字符串_Html_Python 3.x

Html python 3中给定行号和偏移量的子字符串

html python-3.x

Html python 3中给定行号和偏移量的子字符串,html,python-3.x,Html,Python 3.x,我正在尝试用python 3中的库解析html页面。函数HTMLParser。（）返回最后解析的标记的行号和偏移量例如，我知道我想要的“字符串”从第10行偏移量5开始，到第30行偏移量10结束。我如何获得从第10行偏移量5到第30行偏移量10的子字符串谢谢 html = 'this holds the entire html code' MyParser.feed(html) #now the parser works its magic start = (10,5) #this is r

我正在尝试用python 3中的库解析html页面。函数HTMLParser。（）返回最后解析的标记的行号和偏移量

例如，我知道我想要的“字符串”从第10行偏移量5开始，到第30行偏移量10结束。我如何获得从第10行偏移量5到第30行偏移量10的子字符串

谢谢

html = 'this holds the entire html code'
MyParser.feed(html) #now the parser works its magic
start = (10,5) #this is returned from HTMLParser.getpos(), 10 is the line number and 5 is the offset of that line
end = (30,10) #same here
#I want to do something like this (I know this is invalid python code)
substring = html.substring(start,end) #return the html code as a string from line 10 offset 5 to line 30 offset 10

更好的解释：

我正在尝试从字符串中获取子字符串

我知道在Python3中它被称为slice:string[a:b] 因此，如果我想要子字符串“jonny”形成字符串“Hello jonny smith” 我会这样做：

substring='Hello jonny smith'[6:11]

问题是返回一个元组（行号，该行的偏移量），所以我无法执行：

substring=multy\u line\u string[line number:offset]

假设您对HTML解析感兴趣，请尝试lxml-->

您需要向我们展示更多内容。不理解您试图解决的问题。同意索恩的观点，请提供更多细节。