迭代并使用目录中的HTML文件-python
我需要遍历给定目录中的.html文件并从中删除数据。到目前为止,这是我的代码,我如何访问里面的脚本迭代并使用目录中的HTML文件-python,html,python-3.x,screen-scraping,Html,Python 3.x,Screen Scraping,我需要遍历给定目录中的.html文件并从中删除数据。到目前为止,这是我的代码,我如何访问里面的脚本 import os directory ='/Users/xxxxx/Documents/sample/' for filename in os.listdir(directory): if filename.endswith('.html'): print(os.path.join(directory,filename)) else: contin
import os
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
print(os.path.join(directory,filename))
else:
continue
(系统:Mac/Python3.x)您可以执行以下操作:
import os
from bs4 import BeautifulSoup
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
fname = os.path.join(directory,filename)
with open(fname, 'r') as f:
soup = BeautifulSoup(f.read(),'html.parser')
# parse the html as you wish
你可以这样做:
import os
from bs4 import BeautifulSoup
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
fname = os.path.join(directory,filename)
with open(fname, 'r') as f:
soup = BeautifulSoup(f.read(),'html.parser')
# parse the html as you wish