python类型错误:';int';对象在beautifulsoup库中不可编辑

python类型错误:';int';对象在beautifulsoup库中不可编辑,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,有一个名为dnsdumpster的站点,它提供域的所有子域。我正在尝试自动化这个过程,并打印出一个子域列表。每个单独的子域都在“td”HTML标记内。我试图遍历所有这些标记并打印出子域,但我得到了一个错误 import requests import re from bs4 import BeautifulSoup headers = { 'Host' : 'dnsdumpster.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 10

有一个名为dnsdumpster的站点,它提供域的所有子域。我正在尝试自动化这个过程,并打印出一个子域列表。每个单独的子域都在“td”HTML标记内。我试图遍历所有这些标记并打印出子域,但我得到了一个错误

import requests
import re
from bs4 import BeautifulSoup

headers = {
    'Host' : 'dnsdumpster.com',
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language' : 'en-US,en;q=0.5',
    'Accept-Encoding' : 'gzip, deflate',
    'DNT' : '1',
    'Upgrade-Insecure-Requests' : '1',
    'Referer' : 'https://dnsdumpster.com/',
    'Connection' : 'close'
}

proxies = {
    'http' : 'http://127.0.0.1:8080'
}

domain = 'google.com'

with requests.Session() as s:
    url = 'https://dnsdumpster.com'
    response = s.get(url, headers=headers, proxies=proxies)
    response.encoding = 'utf-8' # Optional: requests infers this internally
    soup1 = BeautifulSoup(response.text, 'html.parser')
    input = soup1.find_all('input')
    csrfmiddlewaretoken_raw = str(input[0])
    csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
    data = {
        'csrfmiddlewaretoken' : csrfmiddlewaretoken,
        'targetip' : domain
    }
    send_data = s.post(url, data=data, proxies=proxies, headers=headers)
    print(send_data.status_code)
    soup2 = BeautifulSoup(send_data.text, 'html.parser')
    td = soup2.find_all('td')
    for i in len(td):
        item = str(td[i])
        subdomain = item[21:37]
        print(subdomain)
错误如下所示:

回溯(最后一次调用):文件“dns\u dumpster\u 4.py”,第行 39,在 对于len中的i(td):TypeError:'int'对象不可编辑

一旦上述错误得到解决,我还需要另一个问题的帮助:
我如何使用正则表达式从这个“td”标记中获取单个子域,因为这个标记的内容非常长且混乱,我只需要子域。如果有人能帮我简单地获取子域名,我将不胜感激。

我尝试在不使用regex的情况下捕获子域名

import requests
from bs4 import BeautifulSoup

headers = {
    'Host' : 'dnsdumpster.com',
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language' : 'en-US,en;q=0.5',
    'Accept-Encoding' : 'gzip, deflate',
    'DNT' : '1',
    'Upgrade-Insecure-Requests' : '1',
    'Referer' : 'https://dnsdumpster.com/',
    'Connection' : 'close'
}

proxies = {
    'http' : 'http://127.0.0.1:8080'
}

domain = 'google.com'

with requests.Session() as s:
    url = 'https://dnsdumpster.com'
    response = s.get(url, headers=headers, proxies=proxies)
    response.encoding = 'utf-8' # Optional: requests infers this internally
    soup1 = BeautifulSoup(response.text, 'html.parser')
    input = soup1.find_all('input')
    csrfmiddlewaretoken_raw = str(input[0])
    csrfmiddlewaretoken = csrfmiddlewaretoken_raw[55:119]
    data = {
        'csrfmiddlewaretoken' : csrfmiddlewaretoken,
        'targetip' : domain
    }
    send_data = s.post(url, data=data, proxies=proxies, headers=headers)
    print(send_data.status_code)
    soup2 = BeautifulSoup(send_data.text, 'html.parser')
    td = soup2.find_all('td', {'class': 'col-md-3'})
    # for dom in range(0, len(td),2):
    #     print(td[dom].get_text(strip=True, separator='\n'))

    mysubdomain = []
    for dom in range( len(td)):
        # print(td[dom].get_text(strip=True, separator='\n'))
        if '.' in td[dom].get_text(strip=True):
            x = td[dom].get_text(strip=True, separator=',').split(',')
            mysubdomain.append(x)
            # print(x)
            # y = td[dom].get_text(strip=True, separator=',').split(',')[1]
           
            # mysubdomain.append(td[dom].get_text(strip=True, separator=','))
    print(mysubdomain)
    # print(td)

    # for i in range(len(td)):
        # item = str(td[i])
        # print('\n', item, '\n')
        # subdomain = item[21:37]
        # print(subdomain)
from functools import reduce
flat_list_of_mysubdomain = reduce(lambda x, y: x + y, mysubdomain)
print(flat_list_of_mysubdomain)

我希望它能帮助你。

len()将获得长度;返回值是一个整数,不能对其进行迭代。去掉len()
对于td中的i
您不能迭代整数,对于范围(0,len(td))中的i使用
从0迭代到对象的长度
td
。对于范围(len(td))中的i,您需要将其用作