Python urllib2或请求post方法

Python urllib2或请求post方法,python,python-2.7,post,python-requests,urllib2,Python,Python 2.7,Post,Python Requests,Urllib2,我基本上了解如何使用urllib2(对数据进行编码等)发出POST请求,但问题是所有在线教程都使用完全无用的虚构示例URL来演示如何做到这一点(someserver.com,coolsite.org,等等),因此,我看不到与他们使用的示例代码对应的特定html。在这方面,即使是python.org自己的网站也毫无用处 我需要向此url发出POST请求: https://patentscope.wipo.int/search/en/search.jsf 代码的相关部分如下(我认为): 我得到了一

我基本上了解如何使用urllib2(对数据进行编码等)发出
POST
请求,但问题是所有在线教程都使用完全无用的虚构示例URL来演示如何做到这一点(
someserver.com
coolsite.org
,等等),因此,我看不到与他们使用的示例代码对应的特定html。在这方面,即使是
python.org
自己的网站也毫无用处

我需要向此url发出
POST
请求:

https://patentscope.wipo.int/search/en/search.jsf
代码的相关部分如下(我认为):

我得到了一个成功的响应(
200
),但数据再次只是原始页面中的数据,因此我不知道我是否正确地发布到表单中,我还需要做一些其他事情才能让表单从搜索结果页面返回数据,或者我是否仍然发布了错误的数据


是的,我意识到这使用了
请求
而不是
urlib2
,但是我想要做的就是获取数据

这不是最直接的post请求,如果您查看开发人员工具或firebug,您可以从成功的浏览器post中看到formdata:

所有这些都非常简单,除非您看到一些嵌入在键中的
,这可能有点让人困惑,
SimpleSearchForm:commandSimpleFPSearch
是键,
搜索

您唯一无法硬编码的是
javax.faces.ViewState
,我们需要向站点发出请求,然后解析我们可以使用BeautifulSoup执行的值:

import requests
from bs4 import BeautifulSoup

url = "https://patentscope.wipo.int/search/en/search.jsf"

data = {"simpleSearchSearchForm": "simpleSearchSearchForm",
        "simpleSearchSearchForm:j_idt341": "EN_ALLTXT",
        "simpleSearchSearchForm:fpSearch": "automata",
        "simpleSearchSearchForm:commandSimpleFPSearch": "Search",
        "simpleSearchSearchForm:j_idt406": "workaround"}
head = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

with requests.Session() as s:
    # Get the cookies and the source to parse the Viewstate token
    init = s.get(url)
    soup = BeautifulSoup(init.text, "lxml")
    val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]
    # update post data dict
    data["javax.faces.ViewState"] = val
    r = s.post(url, data=data, headers=head)
    print(r.text)
如果我们运行上面的代码:

In [13]: import requests

In [14]: from bs4 import BeautifulSoup

In [15]: url = "https://patentscope.wipo.int/search/en/search.jsf"

In [16]: data = {"simpleSearchSearchForm": "simpleSearchSearchForm",
   ....:         "simpleSearchSearchForm:j_idt341": "EN_ALLTXT",
   ....:         "simpleSearchSearchForm:fpSearch": "automata",
   ....:         "simpleSearchSearchForm:commandSimpleFPSearch": "Search",
   ....:         "simpleSearchSearchForm:j_idt406": "workaround"}

In [17]: head = {
   ....:     "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

In [18]: with requests.Session() as s:
   ....:         init = s.get(url)
   ....:         soup = BeautifulSoup(init.text, "lxml")
   ....:         val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]
   ....:         data["javax.faces.ViewState"] = val
   ....:         r = s.post(url, data=data, headers=head)
   ....:         print("\n".join([s.text.strip() for s in BeautifulSoup(r.text,"lxml").select("span.trans-section")]))
   ....:     

Fuzzy genetic learning automata classifier
Fuzzy genetic learning automata classifier
FINITE AUTOMATA MANAGER
CELLULAR AUTOMATA MUSIC GENERATOR
CELLULAR AUTOMATA MUSIC GENERATOR
ANALOG LOGIC AUTOMATA
Incremental automata verification
Cellular automata music generator
Analog logic automata
Symbolic finite automata
您将看到它与网页匹配。如果你想刮网站,你需要熟悉开发者工具/firebug等。。观察请求是如何发出的,然后尝试模仿。要打开firebug,右键单击页面并选择inspect element,单击网络选项卡并提交请求。您只需从列表中选择请求,然后选择您想要了解的任何选项卡,即out post请求的参数:


您可能会发现,这对于如何向网站发布信息也很有用。

搜索是作为表单数据发送的,而不是作为URL的一部分发送的。查询文本的键是
SimpleSearchForm:fpSearch
。请注意,当您手动提交搜索时,您可以使用浏览器中的开发人员工具看到这一点。好的。这是什么意思?我的意思是:我应该做些什么来发布数据和检索我需要的页面?我阅读了页面,但它没有包含我需要的信息。然后前往谷歌,找到一个包含这些信息的地方——在你已经阅读过的教程中,看到这非常有用。非常感谢你!
import requests

headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'name':'simpleSearchSearchForm:fpSearch','value':'2014084003'}
link    = 'https://patentscope.wipo.int/search/en/search.jsf'
session = requests.Session()
resp    = session.get(link,headers=headers)
cookies = requests.utils.cookiejar_from_dict(requests.utils.dict_from_cookiejar(session.cookies))
resp    = session.post(link,headers=headers,data=payload,cookies =cookies)

r = session.get(link)

f = open('htmltext.txt','w')

f.write(r.content)

f.close()
import requests
from bs4 import BeautifulSoup

url = "https://patentscope.wipo.int/search/en/search.jsf"

data = {"simpleSearchSearchForm": "simpleSearchSearchForm",
        "simpleSearchSearchForm:j_idt341": "EN_ALLTXT",
        "simpleSearchSearchForm:fpSearch": "automata",
        "simpleSearchSearchForm:commandSimpleFPSearch": "Search",
        "simpleSearchSearchForm:j_idt406": "workaround"}
head = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

with requests.Session() as s:
    # Get the cookies and the source to parse the Viewstate token
    init = s.get(url)
    soup = BeautifulSoup(init.text, "lxml")
    val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]
    # update post data dict
    data["javax.faces.ViewState"] = val
    r = s.post(url, data=data, headers=head)
    print(r.text)
In [13]: import requests

In [14]: from bs4 import BeautifulSoup

In [15]: url = "https://patentscope.wipo.int/search/en/search.jsf"

In [16]: data = {"simpleSearchSearchForm": "simpleSearchSearchForm",
   ....:         "simpleSearchSearchForm:j_idt341": "EN_ALLTXT",
   ....:         "simpleSearchSearchForm:fpSearch": "automata",
   ....:         "simpleSearchSearchForm:commandSimpleFPSearch": "Search",
   ....:         "simpleSearchSearchForm:j_idt406": "workaround"}

In [17]: head = {
   ....:     "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

In [18]: with requests.Session() as s:
   ....:         init = s.get(url)
   ....:         soup = BeautifulSoup(init.text, "lxml")
   ....:         val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]
   ....:         data["javax.faces.ViewState"] = val
   ....:         r = s.post(url, data=data, headers=head)
   ....:         print("\n".join([s.text.strip() for s in BeautifulSoup(r.text,"lxml").select("span.trans-section")]))
   ....:     

Fuzzy genetic learning automata classifier
Fuzzy genetic learning automata classifier
FINITE AUTOMATA MANAGER
CELLULAR AUTOMATA MUSIC GENERATOR
CELLULAR AUTOMATA MUSIC GENERATOR
ANALOG LOGIC AUTOMATA
Incremental automata verification
Cellular automata music generator
Analog logic automata
Symbolic finite automata