Python 去除汤中的特定项目
我怎样才能用python和beautiful soup去掉这些,剩下的就不用了,Python 去除汤中的特定项目,python,beautifulsoup,Python,Beautifulsoup,我怎样才能用python和beautiful soup去掉这些,剩下的就不用了,td中的其他项目需要保留 <td style="background:#aaccff" width="50"></td> <td align="left" style="background:#aaccff" width="150">Device Type</td> <td align="left" style="background:#aaccff" width
td
中的其他项目需要保留
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
这是电流输出
设备类型
IP地址
设备名
注释
音频编码网关
172.31.31.2
FXO
设备类型
IP地址
设备名
注释
集成电路服务器
172.31.56.151
IND056GIC151
NAT'd IP=挂起的MPLS,语音IP=172.31.52.151
集成电路服务器
172.31.56.152
IND056GIC152
NAT'd IP=挂起的MPLS,语音IP=172.31.52.152
媒体服务器
IND1106HMS07
IND1106HMS07
媒体服务器
IND1106HMS07
IND1106HMS07通常,当人们问如何使用
bs4
来“删除”某个东西时,他们实际上只是问如何在find
操作中不包含它
您希望排除额外的空格(即带有tag.text=''
的标记)和这四个“列标题”标记。您可以通过CSS选择器执行后者,但前者需要显式过滤。因此,同时做这两件事是最容易的,而且在我看来更具声明性:
soup = BeautifulSoup(that_long_html_you_gave)
blacklist = {'Device Type','IP Address','Device Name','Notes'}
table = soup.body # to match your variable name. I think.
table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]:
[<td align="left" width="150">AudioCodes Gateway</td>,
<td align="left" width="115">172.31.31.2</td>,
<td align="left" width="215">FXO</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.151</td>,
<td align="left" width="100">IND056GIC151</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.152</td>,
<td align="left" width="100">IND056GIC152</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>]
soup=BeautifulSoup(你给的)
黑名单={'Device Type','IP Address','Device Name','Notes'}
table=soup.body#以匹配变量名。我想。
table.find_all(lambda标记:tag.text和tag.text不在黑名单中)
出[45]:
[音频代码网关,
172.31.31.2,
FXO,
IC服务器,
172.31.56.151,
IND056GIC151,
NAT'd IP=挂起的MPLS,语音IP=172.31.52.151,
IC服务器,
172.31.56.152,
IND056GIC152,
NAT'd IP=挂起的MPLS,语音IP=172.31.52.152,
媒体服务器,
IND1106HMS07,
IND1106HMS07,
媒体服务器,
IND1106HMS07,
IND1106HMS07]
你说的“删除”是什么意思?去掉它们,这样它们就不会在最终输出中显示。我不清楚为什么要选择要删除的标记和保留的标记。到目前为止,你的代码做了什么?要删除设备类型IP地址设备名称注释和额外的空格吗
from ntlm import HTTPNtlmAuthHandler
from bs4 import BeautifulSoup
import requests, os, bleach, urllib2, cookielib
os.system('clear')
user = 'user'
password = "pass"
url = "url"
cookies = cookielib.CookieJar()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman))
pagedata=opener.open(url)
soup=BeautifulSoup(pagedata)
def myfunction(b):
table = b.find('ul', {'class': 'dfwp-column dfwp-list'})
for a in table.findAll('a'):
[a.decompose() for a in table("a")]
for tr in table.findAll('tr'):
for td in tr.findAll('td'):
print td
myfunction(soup)
soup = BeautifulSoup(that_long_html_you_gave)
blacklist = {'Device Type','IP Address','Device Name','Notes'}
table = soup.body # to match your variable name. I think.
table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]:
[<td align="left" width="150">AudioCodes Gateway</td>,
<td align="left" width="115">172.31.31.2</td>,
<td align="left" width="215">FXO</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.151</td>,
<td align="left" width="100">IND056GIC151</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.152</td>,
<td align="left" width="100">IND056GIC152</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>]