Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrape论坛为每个帖子提供标题_Python_Web Scraping - Fatal编程技术网

Python Scrape论坛为每个帖子提供标题

Python Scrape论坛为每个帖子提供标题,python,web-scraping,Python,Web Scraping,我对网页抓取和Python都是新手。我想在论坛的URL上搜索每篇帖子的标题,然后用下面的标题中的一个创建一篇新帖子。我想收到一封带有该帖子链接的邮件 通过搜索div structItem标题,我收到了1页上的23条帖子。但是当我想打印每篇文章的文本时,我只收到打印(键入(first_result.Text))和打印(键入(first_result)) 搜索标题 # Jeti_DS_16 = soup.find_all(text="Jeti DS 16") #

我对网页抓取和Python都是新手。我想在论坛的URL上搜索每篇帖子的标题,然后用下面的标题中的一个创建一篇新帖子。我想收到一封带有该帖子链接的邮件

通过搜索
div structItem标题
,我收到了1页上的23条帖子。但是当我想打印每篇文章的文本时,我只收到
打印(键入(first_result.Text))
打印(键入(first_result))

搜索标题

    # Jeti_DS_16 = soup.find_all(text="Jeti DS 16")
    # Jeti_DS_16_v2 = soup.find_all(text="Jeti DS 16 2")
    # Jeti_DC_16 = soup.find_all(text="Jeti DC 16")
    # Jeti_DC_16_v2 = soup.find_all(text="Jeti DC 16 2")
代码

from requests import get
from bs4 import BeautifulSoup
import re
import smtplib
import time
import lxml


URL = 'https://www.rc-network.de/forums/biete-rc-elektronik-zubeh%C3%B6r.135/'

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'}


def checkForSearchItem():

    response = get(URL)
    # print(response.text[:500])

    # page = requests.get(URL, headers=headers)
    # page = requests.get(URL, headers=headers).text
    # page = requests.get(URL).text
    # page = requests.get(URL)

    soup = BeautifulSoup(response.content, "lxml")
    # soup = BeautifulSoup(page.content, "html.parser")
    # soup = BeautifulSoup(page.text, "html.parser")

    search_for_class = soup.find_all(
        'div', class_='structItem-title')
    # search_for_main = soup.find_all(
    #     'div', class_="structItemContainer-group js-threadList")
    # Jeti_DS_16 = soup.find_all(text="Jeti DS 16")
    # Jeti_DS_16_v2 = soup.find_all(text="Jeti DS 16 2")
    # Jeti_DC_16 = soup.find_all(text="Jeti DC 16")
    # Jeti_DC_16_v2 = soup.find_all(text="Jeti DC 16 2")

    # if(Jeti_DC_16, Jeti_DC_16_v2, Jeti_DS_16, Jeti_DS_16_v2):
    #     send_mail()

    # print('Die Nummer {0} {1} {2} {3} wurden gezogen'.format(
    #     Jeti_DC_16, Jeti_DC_16_v2, Jeti_DS_16, Jeti_DS_16_v2))

    print(type(search_for_class))
    print(len(search_for_class))

    first_result = search_for_class[0]
    # print(type(first_result.h3))
    # print(type(first_result.div.a.text))
    # print(type(first_result.a.text))
    # print(type(first_result.p.text))
    # print(type(first_result.name.text))
    # print(type(first_result.title))
    print(type(first_result))
    print(type(first_result.text))
    # print(soup.div)


# def send_mail():

#     server_ssl = smtplib.SMTP_SSL('smtp.gmail.com', 465)
#     server_ssl.ehlo()
#     # server.starttls()
#     # server.ehlo()

#     server_ssl.login('Secure@gmail.com', 'SecurePassword')

#     subject = 'Es gibt ein neuer Post im RC-Network auf deine gespeicherte Anfragen. Sieh in dir an{Link to Post}'
#     body = 'Sieh es dir an Link: https://www.rc-network.de/forums/biete-rc-elektronik-zubeh%C3%B6r.135/'

#     msg = f"Subject: {subject}\n\n{body}"
#     emails = ["Secure@gmx.de"]

#     server_ssl.sendmail(
#         'Secure@gmail.com',
#         emails,
#         msg
#     )
#     print('e-Mail wurde versendet!')

#     server_ssl.quit


while(True):
    checkForSearchItem()
    time.sleep(600)
    # time.sleep(86400)

当您想要打印文本时,您不需要type()。只需查看变量的类型(int,str,…)。如果没有type(),我可以很好地打印文本。这意味着,在打印语句上,而不是:

打印(键入(第一个结果文本))
写下:

打印(第一个结果文本)
我希望这就是你的问题所在,我可以帮助你。 当您需要post的URI时,必须在post div中获取a标记并从中提取您的URI,如下所示:

def checkForSearchItem():
response=get(URL)
汤=美汤(response.content,“lxml”)
posts=soup.find_all('div',class='structItem-title')
在职人员:
a_tag=post.find_all('a')[0]#div内的a-tag
link=a_标记。获取('href')#a标记内的href
url=f'https://www.rc-network.de{link}#完整URI,因为“link”看起来像/threads/sensoren-von-graupner.11835933/
打印(post.text)
打印(url)

我真傻,谢谢。你能帮我一下吗,如果有一篇文章有这样的标题,我该如何获取链接?我用URI功能编辑了这篇文章