Python 2.7 Python文件名,而不是标记。打开此文件并将文件句柄传递到Beautiful Soup中

Python 2.7 Python文件名,而不是标记。打开此文件并将文件句柄传递到Beautiful Soup中,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我已经更改了Python2.7例程,以接受文件路径作为例程的参数,这样就不必通过在方法中插入多个文件路径来复制代码 调用我的方法时,出现以下错误: looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup. '"%s" looks like a filename, not markup. You should probabl

我已经更改了Python2.7例程,以接受文件路径作为例程的参数,这样就不必通过在方法中插入多个文件路径来复制代码

调用我的方法时,出现以下错误:

looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
  '"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
我的方法是:

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r').read()
    soup = BeautifulSoup(filename, "html.parser")
    th = soup.find_all('th')
    td = soup.find_all('td')

    headers = [header.get_text(strip=True) for header in soup.find_all("th")]
    rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
        for row in soup.find_all("tr")[1:-1]]
    print(rows)
    return rows
调用该方法的步骤如下:

rows_part1 =  report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1

如何将文件名作为参数传递?

您应该将已读取的文件的实际内容传递给
BeautifulSoup

html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(html_report_part1, "html.parser")

如果要传递文件句柄,则不调用read,只需传递
open(filename)
或不调用read的文件句柄即可:

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r')
    soup = BeautifulSoup( html_report_part1, "html.parser")
或:


按建议调用read后,您可以传递
html\u report\u part1
,但您不需要,BeautifulSoup可以获取一个文件对象

BeautifulSoup是否处理该文件,还是应该将其放在with块中?@Mephy,一旦您离开该函数,该文件几乎肯定会关闭,如果没有对文件对象的引用,则在读取后将关闭该文件,您可以在下面的图中看到它。使用带有块但不是真正需要的块是无害的。
def extract_data_from_report3(filename):
    soup = BeautifulSoup(open(filename), "html.parser")