Python 2.7 Python文件名,而不是标记。打开此文件并将文件句柄传递到Beautiful Soup中
我已经更改了Python2.7例程,以接受文件路径作为例程的参数,这样就不必通过在方法中插入多个文件路径来复制代码 调用我的方法时,出现以下错误:Python 2.7 Python文件名,而不是标记。打开此文件并将文件句柄传递到Beautiful Soup中,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我已经更改了Python2.7例程,以接受文件路径作为例程的参数,这样就不必通过在方法中插入多个文件路径来复制代码 调用我的方法时,出现以下错误: looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup. '"%s" looks like a filename, not markup. You should probabl
looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
我的方法是:
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(filename, "html.parser")
th = soup.find_all('th')
td = soup.find_all('td')
headers = [header.get_text(strip=True) for header in soup.find_all("th")]
rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
for row in soup.find_all("tr")[1:-1]]
print(rows)
return rows
调用该方法的步骤如下:
rows_part1 = report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1
如何将文件名作为参数传递?您应该将已读取的文件的实际内容传递给
BeautifulSoup
:
html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(html_report_part1, "html.parser")
如果要传递文件句柄,则不调用read,只需传递
open(filename)
或不调用read的文件句柄即可:
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r')
soup = BeautifulSoup( html_report_part1, "html.parser")
或:
按建议调用read后,您可以传递
html\u report\u part1
,但您不需要,BeautifulSoup可以获取一个文件对象 BeautifulSoup是否处理该文件,还是应该将其放在with块中?@Mephy,一旦您离开该函数,该文件几乎肯定会关闭,如果没有对文件对象的引用,则在读取后将关闭该文件,您可以在下面的图中看到它。使用带有块但不是真正需要的块是无害的。
def extract_data_from_report3(filename):
soup = BeautifulSoup(open(filename), "html.parser")