Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 美联能';在文件中找不到exist href_Python_Beautifulsoup - Fatal编程技术网

Python 美联能';在文件中找不到exist href

Python 美联能';在文件中找不到exist href,python,beautifulsoup,Python,Beautifulsoup,我有一个html文件,如下所示: <form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post"> <div> <a href="/2811457/follow?page=2&amp;gsid=3_5bce9b871484d3af90c89f37">next_page</a> &nbsp;<input name="mp" type="hi

我有一个html文件,如下所示:

<form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post">
<div>
<a href="/2811457/follow?page=2&amp;gsid=3_5bce9b871484d3af90c89f37">next_page</a>
&nbsp;<input name="mp" type="hidden" value="3" />
<input type="text" name="page" size="2" style='-wap-input-format: "*N"' />
<input type="submit" value="jump" />&nbsp;1/3
</div>
</form>
工作代码:

from BeautifulSoup import BeautifulSoup
import re

with open("html.txt","r") as f:
    response = f.read()
    print response
    soup = BeautifulSoup(response)
    delete_urls = soup.findAll('a', href=re.compile('follow\?page'))   #works,should escape ?
    print delete_urls
    #total_urls_num = re.findall('\d+/\d+',response)   
    total_urls_num = soup.find('input',type='submit')   
    print total_urls_num

我认为问题在于,您搜索的文本不是某个标记的属性,而是后面的。您可以使用
访问它。下一步

In [144]: soup.find("input", type="submit")
Out[144]: <input type="submit" value="jump" />

In [145]: soup.find("input", type="submit").next
Out[145]: u'&nbsp;1/3\n'
或者简单地说:

In [153]: soup.findAll("input", type="submit", text=re.compile("\d+/\d+"))
Out[153]: [u'&nbsp;1/3\n']
读这个

不是

您应该使用
类型
而不是
样式

>>>temp = soup.find('input',type='submit').next
'&nbsp;1/3\n'
>>>re.findall('\d+/\d+', temp)
[u'1/3']
>>>re.findall('\d+/\d+', temp).[0]
u'1/3'

但是当我更改为(.*\d/\d*)时,仍然不起作用,它返回none如何
soup.find('input',value='jump).next
?您的意思是:“total_urls_num=soup.find('input',value='jump')。next”,返回的是AttributeError:'NoneType'对象没有属性'next',这很奇怪,
soup.find('input',value='jump'))
真的应该有用。你确定你的汤包含你发布的html吗?你能成功地从你的输入中提取任何标签吗?@young001汤怎么样。找到(r“\d\/\d”)你的标签正是我想要的,thx DSM,我应该多读一些beautifulsoup文档。
In [153]: soup.findAll("input", type="submit", text=re.compile("\d+/\d+"))
Out[153]: [u'&nbsp;1/3\n']
total_urls_num = soup.find('input',style='submit')   #can't work 
>>>temp = soup.find('input',type='submit').next
'&nbsp;1/3\n'
>>>re.findall('\d+/\d+', temp)
[u'1/3']
>>>re.findall('\d+/\d+', temp).[0]
u'1/3'