如何创建一个Python脚本，从一个站点获取文本并将其重新发布到另一个站点？_Python_Scripting

如何创建一个Python脚本，从一个站点获取文本并将其重新发布到另一个站点？

python scripting

如何创建一个Python脚本，从一个站点获取文本并将其重新发布到另一个站点？,python,scripting,Python,Scripting,我想创建一个Python脚本，从该站点获取Pi的数字：并将其重新发布到此网站：我不是在发垃圾邮件或恶作剧，这是和创作者和网站管理员的内部玩笑，一个迟到的Pi日庆祝活动，如果你愿意的话您可以使用任何Python发行版中的urllib2模块它允许您在文件系统上打开文件时打开URL。因此，您可以使用 pi_million_file = urllib2.urlopen("http://www.piday.org/million.php") 解析生成的文件，该文件将是您在浏览器中看到的网页的HT

我想创建一个Python脚本，从该站点获取Pi的数字：并将其重新发布到此网站：

我不是在发垃圾邮件或恶作剧，这是和创作者和网站管理员的内部玩笑，一个迟到的Pi日庆祝活动，如果你愿意的话

您可以使用任何Python发行版中的

urllib2

模块

它允许您在文件系统上打开文件时打开URL。因此，您可以使用

pi_million_file = urllib2.urlopen("http://www.piday.org/million.php")

解析生成的文件，该文件将是您在浏览器中看到的网页的HTML代码

然后，您应该为您的网站使用正确的URL来发布PI。

导入urllib2和BeautifulSoup

import urllib2
from BeautifulSoup import BeautifulSoup

指定url并使用urllib2获取

url = 'http://www.piday.org/million.php'
response = urlopen(url)

然后使用页面中的标记来构建字典，然后可以使用定义数据的相关标记查询字典，以提取所需内容

soup = BeautifulSoup(response)

pi = soup.findAll('TAG')

其中“TAG”是您想要找到的标识pi所在位置的相关标记

指定要打印的内容

out = '<html><body>'+pi+'</html></body>

然后使用Web服务器提供文件“file.html”

如果您不想使用BeautifulSoup，可以使用re和urllib，但它没有BeautifulSoup那么“漂亮”。

当您发布帖子时，通过发送到服务器的

post

请求完成。查看您网站上的代码：

<form action="enter.php" method="post">
  <textarea name="post">Enter text here</textarea> 
</form>

查看该源代码，您可以看到该页面只是一个从

标记直接降下来的

标记（该站点没有

，但我将包括一个）：

但是，如果您将此用于恶意目的，请务必知道服务器会记录所有IP地址。

也许更适合这样做，也许？无论如何，到目前为止你都试过什么？

<form action="enter.php" method="post">
  <textarea name="post">Enter text here</textarea> 
</form>

 http://www.piday.org/includes/pi_to_1million_digits_v2.html

<!DOCTYPE html>

<html>
  <head>
    ...
  </head>

  <body>
    <p>3.1415926535897932384...</p>
  </body>
</html>

import urllib, httplib
from BeautifulSoup import BeautifulSoup

# Downloads and parses the webpage with Pi
page = urllib.urlopen('http://www.piday.org/includes/pi_to_1million_digits_v2.html')
soup = BeautifulSoup(page)

# Extracts the Pi. There's only one <p> tag, so just select the first one
pi_list = soup.findAll('p')[0].contents
pi = ''.join(str(s).replace('\n', '') for s in pi_list).replace('<br />', '')

# Creates the POST request's body. Still bad object naming on the creator's part...
parameters = urllib.urlencode({'post':      pi, 
                               'name':      'spammer',
                               'post_type': 'confession',
                               'school':    'all'})

# Crafts the POST request's header.
headers = {'Content-type': 'application/x-www-form-urlencoded',
           'Accept':       'text/plain'}

# Creates the connection to the website
connection = httplib.HTTPConnection('freelove-forum.com:80')
connection.request('POST', '/enter.php', parameters, headers)

# Sends it out and gets the response
response = connection.getresponse()
print response.status, response.reason

# Finishes the connections
data = response.read()
connection.close()