Python 我可以在网上填刮痧吗？_Python_Web Scraping_Scrapy_Form Submit

Python 我可以在网上填刮痧吗？

python web-scraping scrapy

Python 我可以在网上填刮痧吗？,python,web-scraping,scrapy,form-submit,Python,Web Scraping,Scrapy,Form Submit,现在，我使用iMacros从web中提取数据并填写提交数据的表格但iMacros是一个昂贵的工具。我需要一个免费的图书馆，而且我读过关于Scrapy的资料挖掘。我用它编程要复杂一点，但钱是最重要的问题是我是否可以用Scrapy填充html表单并提交到web页面。我不想使用Javascript，我只想使用Python脚本我搜索了表单提交，但没有找到任何关于表单提交的信息。是一个python库，可以让您自动与网站交互。它支持HTML表单填充。使用scrapy.http.FormRequest类

现在，我使用iMacros从web中提取数据并填写提交数据的表格

但iMacros是一个昂贵的工具。我需要一个免费的图书馆，而且我读过关于Scrapy的资料挖掘。我用它编程要复杂一点，但钱是最重要的

问题是我是否可以用Scrapy填充html表单并提交到web页面。我不想使用Javascript，我只想使用Python脚本

我搜索了表单提交，但没有找到任何关于表单提交的信息。

是一个python库，可以让您自动与网站交互。它支持HTML表单填充。

使用

scrapy.http.FormRequest

类

FormRequest类通过处理HTML表单的功能扩展了基本请求

下面的程序解释了如何填写表格：

import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
# Browser
br = mechanize.Brenter code hereowser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# The site we will navigate into, handling it's session
br.open('http://gmail.com')

# Select the first (index zero) form
br.select_form(nr=0)

# User credentials
br.form['Email'] = 'user'
br.form['Passwd'] = 'password'
# Login
br.submit()

# Filter all links to mail messages in the inbox
all_msg_links = [l for l in br.links(url_regex='\?v=c&th=')]
# Select the first 3 messages
for msg_link in all_msg_links[0:3]:
    print msg_link
    # Open each message
    br.follow_link(msg_link)
    html = br.response().read()
    soup = BeautifulSoup(html)
    # Filter html to only show the message content
    msg = str(soup.findAll('div', attrs={'class': 'msg'})[0])
    # Show raw message content
    print msg
    # Convert html to text, easier to read but can fail if you have intl
    # chars
#   print html2text.html2text(msg)
    print
    # Go back to the Inbox
    br.follow_link(text='Inbox')

# Logout
br.follow_link(text='Sign out')

它是一个web刮板，而不是web请求库。5分钟前，我从另一个问题中得到了这个链接。。试试看：谢谢，我会看的。web表单使用POST方法，我也需要上传文件。没关系。仍然无法使用scrapy:P其他所有不用于刮罐的库。tho:）iMacros一点也不贵。FireFox插件对大多数人来说已经足够好了。您只需编写一些JavaScript脚本。@Macroscript iMacros很酷，脚本接口（API）功能强大且易于使用，但企业版（唯一支持脚本接口的版本，我真的需要它）的成本为995美元。我去年购买了该许可证，但几个月后我将到期，我正在寻找替代解决方案。我需要Python 3支持。我正在尝试使用冲浪者建议的“刮痧”。或试试Selenium（使用HTMLUNIT驱动程序实现无头），我想这与Scrapy无关。