Python 在Discord Bot脚本中创建定时循环以重新加载网页（web刮板Bot）_Python_Time_Discord_Discord.py_Code Structure

Python 在Discord Bot脚本中创建定时循环以重新加载网页（web刮板Bot）

python time discord discord.py

Python 在Discord Bot脚本中创建定时循环以重新加载网页（web刮板Bot）,python,time,discord,discord.py,code-structure,Python,Time,Discord,Discord.py,Code Structure,我目前正在设计一个discord机器人，它可以抓取一个不断更新与PBE服务器相关补丁的网页。我现在已经成功地让机器人通过了Heroku。我遇到的问题是，我想创建一个自动（定时循环）刷新，重新加载我请求的网站。目前，它只加载网站的一个实例，如果该网站发生更改/更新，我的任何内容都不会更新，因为我正在使用网站的“旧”请求是否有一种方法可以将代码隐藏在函数中，以便创建一个定时循环，还是只需要围绕我的网站请求创建一个循环？那看起来怎么样？谢谢 from bs4 import BeautifulSoup

我目前正在设计一个discord机器人，它可以抓取一个不断更新与PBE服务器相关补丁的网页。我现在已经成功地让机器人通过了Heroku。我遇到的问题是，我想创建一个自动（定时循环）刷新，重新加载我请求的网站。目前，它只加载网站的一个实例，如果该网站发生更改/更新，我的任何内容都不会更新，因为我正在使用网站的“旧”请求

是否有一种方法可以将代码隐藏在函数中，以便创建一个定时循环，还是只需要围绕我的网站请求创建一个循环？那看起来怎么样？谢谢

from bs4 import BeautifulSoup
from urllib.request import urlopen
from discord.ext import commands
import discord

# what I want the commands to start with
bot = commands.Bot(command_prefix='!')

# instantiating discord client
token = "************************************"
client = discord.Client()

# begin the scraping of passed in web page
URL = "*********************************"
page = urlopen(URL)
soup = BeautifulSoup(page, 'html.parser')
pbe_titles = soup.find_all('h1', attrs={'class': 'news-title'})  # using soup to find all header tags with the news-title
                                                                 # class and storing them in pbe_titles
linksAndTitles = []
counter = 0

# finding tags that start with 'a' as in a href and appending those titles/links
for tag in pbe_titles:
    for anchor in tag.find_all('a'):
        linksAndTitles.append(tag.text.strip())
        linksAndTitles.append(anchor['href'])

# counts number of lines stored inside linksAndTitles list
for i in linksAndTitles:
    counter = counter + 1
print(counter)

# separates list by line so that it looks nice when printing
allPatches = '\n'.join(str(line) for line in linksAndTitles[:counter])
# stores the first two lines in list which is the current pbe patch title and link
currPatch = '\n'.join(str(line) for line in linksAndTitles[:2])


# command that allows user to type in exactly what patch they want to see information for based off date
@bot.command(name='patch')
async def pbe_patch(ctx, *, arg):
    if any(item.startswith(arg) for item in linksAndTitles):
        await ctx.send(arg + " exists!")
    else:
        await ctx.send('The date you entered: ' + '"' + arg + '"' + ' does not have a patch associated with it or that patch expired.')


# command that displays the current, most up to date, patch
@bot.command(name='current')
async def current_patch(ctx):
    response = currPatch
    await ctx.send(response)


bot.run(token)

我和你玩过

while True:

循环，但每当我在循环中嵌套任何东西时，我就无法在其他地方访问代码。

discord

有特殊的装饰器

任务

定期运行一些代码

from discord.ext import tasks

@tasks.loop(seconds=5.0)
async def scrape(): 
    # ... your scraping code ...


# ... your commands ...


scrape.start()
bot.run(token)

它将每隔5秒重复功能

scrape

文件：

在Linux上，我最终会使用标准服务

cron

定期运行一些脚本。此脚本可以刮取数据并保存在文件或数据库中，

discord

可以从此文件或数据库中读取。但是

cron

每1分钟检查一次任务，这样它就不能更频繁地运行任务

编辑：

最小工作代码

我使用创建的页面进行粗略学习

我改变了一些元素。当有

bot

时，不需要创建

client

，因为

bot

是一种特殊的

client

我将

标题

和

链接

保留为字典

            {
                'title': tag.text.strip(),
                'link': url + anchor['href'],
            }

因此，以后更容易创建像这样的文本

title: A Light in the ...
link: http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html

使用

而不是True:

这完全是一个错误的想法

discord

具有对象

task

，您可以使用计时器每隔几分钟运行该对象以重复任务。最后，我会使用系统服务

cron

运行定期分离的脚本，该脚本从页面获取数据并保存在本地文件中，并且

discord

应该从该文件中读取。所以这将是创建此bot的理想约定，是吗？从文件读取？顺便说一下：如果您创建了

bot

命令.bot（…）

-那么您就不需要

客户机

discord.client（）

bot

是一种特殊类型的

client

shorter

counter=len（linksandthiles）

您可以将标题和链接保留为单个元素，即作为列表

linksandthiles.append（[tag.text.strip（），linksandthiles.append（anchor['href']）））

或字典

linksandthiles.append（{“title”：tag.text.strip（），“link”：linksAndTitles.append（anchor['href']）}

因此，如果我的刮码在函数中，我可以在刮码（）之后调用函数吗？如果代码在刮码（）中，那么

@tasks.loop（seconds=5.0）

将每5秒自动运行一次。但是您仍然可以使用

scrape（）

手动运行它。因此，我创建了一个名为

scrape\u and\u store

的函数，该函数包含上面所有的代码，用于查找url，用BS解析url，然后将该信息存储在列表中并返回列表。然后，当我将该函数放入循环时，它仍然不会更新。该机器人运行良好，没有错误——它似乎不会不断地反复拖动网页。（我正在自己的html上测试它，并更新html以查看命令是否触发任何更改）您不将其放入循环中-使用

@tasks.loop（seconds=5.0）

，它应该每5秒自动运行一次。使用

return

是没有用的，因为当

task.loop

将运行它时，它将不知道如何处理返回的值。您必须将结果分配给

global

变量。我忘记添加

scrape.start（）

。它现在是代码。您也可以在文档中看到

.start（）

。

import os
import discord
from discord.ext import commands, tasks
from bs4 import BeautifulSoup
from urllib.request import urlopen

# default value at start (before `scrape` will assign new value)
# because some function may try to use these variables before `scrape` will create them
links_and_titles = []   # PEP8: `lower_case_namese`
counter = 0
items = []

bot = commands.Bot(command_prefix='!')

@tasks.loop(seconds=5)
async def scrape():
    global links_and_titles
    global counter
    global items

    url = "http://books.toscrape.com/"
    page = urlopen(url)
    soup = BeautifulSoup(page, 'html.parser')
    #pbe_titles = soup.find_all('h1', attrs={'class': 'news-title'})  
    pbe_titles = soup.find_all('h3')  

    # remove previous content
    links_and_titles = []

    for tag in pbe_titles:
        for anchor in tag.find_all('a'):
            links_and_titles.append({
                'title': tag.text.strip(),
                'link': url + anchor['href'],
            })

    counter = len(links_and_titles)
    print('counter:', counter)
    items = [f"title: {x['title']}\nlink: {x['link']}" for x in links_and_titles]

@bot.command(name='patch')
async def pbe_patch(ctx, *, arg=None):
    if arg is None:
        await ctx.send('Use: !patch date')
    elif any(item['title'].startswith(arg) for item in links_and_titles):        
        await ctx.send(arg + " exists!")
    else:
        await ctx.send(f'The date you entered: "{arg}" does not have a patch associated with it or that patch expired.')

@bot.command(name='current')
async def current_patch(ctx, *, number=1):
    if items:
        responses = items[:number]
        text = '\n----\n'.join(responses)
        await ctx.send(text)
    else:
        await ctx.send('no patches')

scrape.start()

token = os.getenv('DISCORD_TOKEN')
bot.run(token)