python类型错误:';int';对象在beautifulsoup库中不可编辑
有一个名为dnsdumpster的站点,它提供域的所有子域。我正在尝试自动化这个过程,并打印出一个子域列表。每个单独的子域都在“td”HTML标记内。我试图遍历所有这些标记并打印出子域,但我得到了一个错误python类型错误:';int';对象在beautifulsoup库中不可编辑,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,有一个名为dnsdumpster的站点,它提供域的所有子域。我正在尝试自动化这个过程,并打印出一个子域列表。每个单独的子域都在“td”HTML标记内。我试图遍历所有这些标记并打印出子域,但我得到了一个错误 import requests import re from bs4 import BeautifulSoup headers = { 'Host' : 'dnsdumpster.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 10
import requests
import re
from bs4 import BeautifulSoup
headers = {
'Host' : 'dnsdumpster.com',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip, deflate',
'DNT' : '1',
'Upgrade-Insecure-Requests' : '1',
'Referer' : 'https://dnsdumpster.com/',
'Connection' : 'close'
}
proxies = {
'http' : 'http://127.0.0.1:8080'
}
domain = 'google.com'
with requests.Session() as s:
url = 'https://dnsdumpster.com'
response = s.get(url, headers=headers, proxies=proxies)
response.encoding = 'utf-8' # Optional: requests infers this internally
soup1 = BeautifulSoup(response.text, 'html.parser')
input = soup1.find_all('input')
csrfmiddlewaretoken_raw = str(input[0])
csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
data = {
'csrfmiddlewaretoken' : csrfmiddlewaretoken,
'targetip' : domain
}
send_data = s.post(url, data=data, proxies=proxies, headers=headers)
print(send_data.status_code)
soup2 = BeautifulSoup(send_data.text, 'html.parser')
td = soup2.find_all('td')
for i in len(td):
item = str(td[i])
subdomain = item[21:37]
print(subdomain)
错误如下所示:
回溯(最后一次调用):文件“dns\u dumpster\u 4.py”,第行
39,在
对于len中的i(td):TypeError:'int'对象不可编辑
一旦上述错误得到解决,我还需要另一个问题的帮助:
我如何使用正则表达式从这个“td”标记中获取单个子域,因为这个标记的内容非常长且混乱,我只需要子域。如果有人能帮我简单地获取子域名,我将不胜感激。我尝试在不使用regex的情况下捕获子域名
import requests
from bs4 import BeautifulSoup
headers = {
'Host' : 'dnsdumpster.com',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip, deflate',
'DNT' : '1',
'Upgrade-Insecure-Requests' : '1',
'Referer' : 'https://dnsdumpster.com/',
'Connection' : 'close'
}
proxies = {
'http' : 'http://127.0.0.1:8080'
}
domain = 'google.com'
with requests.Session() as s:
url = 'https://dnsdumpster.com'
response = s.get(url, headers=headers, proxies=proxies)
response.encoding = 'utf-8' # Optional: requests infers this internally
soup1 = BeautifulSoup(response.text, 'html.parser')
input = soup1.find_all('input')
csrfmiddlewaretoken_raw = str(input[0])
csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
data = {
'csrfmiddlewaretoken' : csrfmiddlewaretoken,
'targetip' : domain
}
send_data = s.post(url, data=data, proxies=proxies, headers=headers)
print(send_data.status_code)
soup2 = BeautifulSoup(send_data.text, 'html.parser')
td = soup2.find_all('td', {'class': 'col-md-3'})
# for dom in range(0, len(td),2):
# print(td[dom].get_text(strip=True, separator='\n'))
mysubdomain = []
for dom in range( len(td)):
# print(td[dom].get_text(strip=True, separator='\n'))
if '.' in td[dom].get_text(strip=True):
x = td[dom].get_text(strip=True, separator=',').split(',')
mysubdomain.append(x)
# print(x)
# y = td[dom].get_text(strip=True, separator=',').split(',')[1]
# mysubdomain.append(td[dom].get_text(strip=True, separator=','))
print(mysubdomain)
# print(td)
# for i in range(len(td)):
# item = str(td[i])
# print('\n', item, '\n')
# subdomain = item[21:37]
# print(subdomain)
from functools import reduce
flat_list_of_mysubdomain = reduce(lambda x, y: x + y, mysubdomain)
print(flat_list_of_mysubdomain)
我希望它能帮助你。len()将获得长度;返回值是一个整数,不能对其进行迭代。去掉len()
对于td中的i
您不能迭代整数,对于范围(0,len(td))中的i使用:
从0迭代到对象的长度td
。对于范围(len(td))中的i,您需要将其用作。