使用python读取网页中的特定类_Python

使用python读取网页中的特定类

python

使用python读取网页中的特定类,python,Python,我有一个脚本，它使用HTMLParser从网页读取数据： import urllib from HTMLParser import HTMLParser import re class get_HTML_Info(HTMLParser): def handle_data(self, data): print data adib = urllib.urlopen('http://www.bulldoghax.com/secret/spinner') htmlsour

我有一个脚本，它使用

HTMLParser

从网页读取

数据

：

import urllib
from HTMLParser import HTMLParser
import re


class get_HTML_Info(HTMLParser):
    def handle_data(self, data):
        print data


adib = urllib.urlopen('http://www.bulldoghax.com/secret/spinner')
htmlsource = adib.read()
adib.close()

parser = get_HTML_Info()
parser.feed(str(htmlsource))

我最终得到两组数据，如下所示：

bulldoghax

8530330882

在终端中，我只想提取该数字并将其设置为python中的字符串。

简单，这里：

n="".join(filter(str.isdigit, data))

它根据是否为数字过滤字符串，然后将其合并为字符串。

简单，此处：

n="".join(filter(str.isdigit, data))

它根据是否为数字对字符串进行过滤，然后将其合并为一个字符串。

使用Beautiful Soup来抓取数据

pip安装美化组

import urllib
from HTMLParser import HTMLParser
import re

adib = urllib.urlopen('http://www.bulldoghax.com/secret/spinner')

htmlsource = adib.read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(htmlsource)
for each_div in soup.findAll('div',{'class':'number'}):
    print each_div.text

使用漂亮的汤来抓取数据

pip安装美化组

import urllib
from HTMLParser import HTMLParser
import re

adib = urllib.urlopen('http://www.bulldoghax.com/secret/spinner')

htmlsource = adib.read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(htmlsource)
for each_div in soup.findAll('div',{'class':'number'}):
    print each_div.text

在这方面，这应该是可行的：

import os
import re
import subprocess
import commands


output = commands.getstatusoutput("curl http://www.bulldoghax.com/secret/spinner | 
grep 'number'")

print(output)
grr = str(output)
grr = grr.split('"')
print "------------------------------------------------------"
cmd = grr[2]

cmd = re.sub("\D", "", cmd)

output = commands.getstatusoutput("curl -v --cookie 'timelock=" + cmd + "' 
http://www.bulldoghax.com/secret/codes")

print(output)
cmd = str(output)
cmd = cmd.split("code")

for i in cmd:
 print("*****" + i)

它从

/secret/spinner

页面读取每个代码，并将它们发送到cookie（timelock）。它将向您打印获取标志所需的代码列表。

在这里，这应该可以：

import os
import re
import subprocess
import commands


output = commands.getstatusoutput("curl http://www.bulldoghax.com/secret/spinner | 
grep 'number'")

print(output)
grr = str(output)
grr = grr.split('"')
print "------------------------------------------------------"
cmd = grr[2]

cmd = re.sub("\D", "", cmd)

output = commands.getstatusoutput("curl -v --cookie 'timelock=" + cmd + "' 
http://www.bulldoghax.com/secret/codes")

print(output)
cmd = str(output)
cmd = cmd.split("code")

for i in cmd:
 print("*****" + i)

它从

/secret/spinner

页面读取每个代码，并将它们发送到cookie（timelock）。它将向您打印获取标志所需的代码列表。

谢谢，现在它只显示数字，是否有任何方法我可以删除“\n”新行内容，我只希望输出为该数字谢谢，现在它只显示数字，是否有任何方法我可以删除“\n”新行内容，我只希望输出是那个数字谢谢！，太好了！，我只需要将

soup=beautifulsop（htmlsource）

更改为

soup=beautifulsop（htmlsource，“lxml”）

，因为我第一次尝试时它给了我一个错误it@himanshu_dua你能帮我写一个代码给这个网站发送一个cookie值吗

http://www.bulldoghax.com/secret/codes

谢谢！，太好了！，我只需要将

soup=beautifulsop（htmlsource）

更改为

soup=beautifulsop（htmlsource，“lxml”）

，因为我第一次尝试时它给了我一个错误it@himanshu_dua你能帮我写一个代码给这个网站发送一个cookie值吗

http://www.bulldoghax.com/secret/codes