Python 使用文本文件对H1标记的多个URL进行web抓取_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 使用文本文件对H1标记的多个URL进行web抓取

python python-3.x web-scraping

Python 使用文本文件对H1标记的多个URL进行web抓取,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我一直在尝试实现Python3程序。我正在尝试做的是网页刮板打印机页面（例如hp laser jet），我正在尝试制作一个刮板，它将通过打印机页面转到大约20个不同的URL，并抓取H1标签，其中打印机型号存储在标签中我是python新手，我想使用带有url的txt文件，并使用for循环将url用作变量我目前的代码是这样的，只适用于一个url，但我不知道如何用词来表达我要找的内容，以确定如何使用文本文件和每一行作为变量以下是url文本文件，例如： http://192.168.1.10 h

我一直在尝试实现Python3程序。我正在尝试做的是网页刮板打印机页面（例如hp laser jet），我正在尝试制作一个刮板，它将通过打印机页面转到大约20个不同的URL，并抓取H1标签，其中打印机型号存储在标签中

我是python新手，我想使用带有url的txt文件，并使用for循环将url用作变量

我目前的代码是这样的，只适用于一个url，但我不知道如何用词来表达我要找的内容，以确定如何使用文本文件和每一行作为变量

以下是url文本文件，例如：

http://192.168.1.10
http://192.168.1.11
http://192.168.1.12
...etc one url per line

我的python 3代码如下所示：

import requests
from bs4 import BeautifulSoup

page = requests.get('http://192.168.1.10/')
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')

pagehp = page.find_all('h1')

for page in pagehp:
    print(page.prettify())

import requests
from bs4 import BeautifulSoup

url_file = "url_file.txt" #The URL should be written one per line in the url_file.txt file

在此处使用第行：

page = requests.get('http://192.168.1.10/')

如何将其更改为my urls.txt并使其成为一个循环，以便它使用每行上的每个url作为该字符串？

您可以像这样使用python

open

模块：

import requests
from bs4 import BeautifulSoup

page = requests.get('http://192.168.1.10/')
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')

pagehp = page.find_all('h1')

for page in pagehp:
    print(page.prettify())

import requests
from bs4 import BeautifulSoup

url_file = "url_file.txt" #The URL should be written one per line in the url_file.txt file

现在让我们从url_file.txt读取url

您可以像这样使用python

open

模块：

import requests
from bs4 import BeautifulSoup

page = requests.get('http://192.168.1.10/')
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')

pagehp = page.find_all('h1')

for page in pagehp:
    print(page.prettify())

import requests
from bs4 import BeautifulSoup

url_file = "url_file.txt" #The URL should be written one per line in the url_file.txt file

现在让我们从url_file.txt读取url

谢谢，你能解释一下背后的逻辑吗？我试着把我的脑袋绕到代码正在做的事情上。还是有一篇关于python的非常简单的文章？我会继续挖掘。谢谢您的时间好的，我们首先使用

open

模块打开.txt，文件参数是文件路径，而第二个参数是模式“r”，表示我们要读取文件。在开始抓取之前，我们将URL拆分为列表，以使其可编辑，然后在每个URL上运行

for

循环谢谢。这就是我需要知道的，把它分成列表好的，不客气。如果你觉得有帮助的话，别忘了标上答案！谢谢，你能解释一下背后的逻辑吗？我试着把我的脑袋绕到代码正在做的事情上。还是有一篇关于python的非常简单的文章？我会继续挖掘。谢谢您的时间好的，我们首先使用

open

for

循环谢谢。这就是我需要知道的，把它分成列表好的，不客气。如果你觉得有帮助的话，别忘了标上答案！