Python 要求用户输入一些东西并使用BeautifulSoup解析网站_Python_Python 3.x_Beautifulsoup_Python Requests

Python 要求用户输入一些东西并使用BeautifulSoup解析网站

python python-3.x

Python 要求用户输入一些东西并使用BeautifulSoup解析网站,python,python-3.x,beautifulsoup,python-requests,Python,Python 3.x,Beautifulsoup,Python Requests,我应该用BeautifulSoup4从学校网站上获取课程信息作为练习。我已经在这几天了，我的代码仍然不起作用我要求用户做的第一件事是导入课程目录缩写。例如，ICS缩写为计算机科学信息。《靓汤4》应该列出所有课程以及注册学生人数虽然我能够让输入部分工作，但仍然有错误或程序只是停止问：Beauty Soup是否有一种方法可以接受用户输入，这样当用户输入ICS时，输出将是与ICS相关的所有课程的列表下面是代码和我的尝试： from bs4 import BeautifulSoup import

我应该用BeautifulSoup4从学校网站上获取课程信息作为练习。我已经在这几天了，我的代码仍然不起作用

我要求用户做的第一件事是导入课程目录缩写。例如，ICS缩写为计算机科学信息。《靓汤4》应该列出所有课程以及注册学生人数

虽然我能够让输入部分工作，但仍然有错误或程序只是停止

问：Beauty Soup是否有一种方法可以接受用户输入，这样当用户输入ICS时，输出将是与ICS相关的所有课程的列表

下面是代码和我的尝试：

from bs4 import BeautifulSoup
import requests
import re

#get input for course
course = input('Enter the course:')
#Here is the page link
BASE_AVAILABILITY_URL = f"https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s={course}"


#get request and response
page_response = requests.get(BASE_AVAILABILITY_URL)
#getting Beautiful Soup to gather the html content
page_content = BeautifulSoup(page_response.content, 'html.parser')
#getting course information
main = page_content.find_all(class_='parent clearfix')
main_p = "".join(str (x) for x in main)
#get the course anchor tags
main_q = BeautifulSoup(main_p, "html.parser")
courses = main.find('a', href = True)
#get each course name
#empty dictionary for course list
courses_list = []
for a in courses:
    courses_list.append(a.text)
    search = input('Enter the course title:')
for course in courses_list:
    if re.search(search, course, re.IGNORECASE):
        print(course)

这是Juypter笔记本中提供的原始代码

import requests, bs4

BASE_AVAILABILITY_URL = f"https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s={course}"
#get input for course
course = input('Enter the course:')



def scrape_availability(text):
    soup = bs4.BeautifulSoup(text)
    r = requests.get(str(BASE_AVAILABILITY_URL)  + str(course))
    rows = soup.select('.listOfClasses tr')

    for row in rows[1:]:
        columns = row.select('td')
        class_name = columns[2].contents[0]
        if len(class_name) > 1 and class_name != b'\xa0':
            print(class_name)
            print(columns[4].contents[0])
            print(columns[7].contents[0])
            print(columns[8].contents[0])

奇怪的是，如果用户保存html文件，将其上传到Juypter笔记本，然后打开要阅读的文件，就会显示课程。但是，对于此任务，用户无法保存文件，必须直接输入才能获得输出

代码的问题是页面内容。请查找\u allclass='parent clearfix'retuns和空列表[]。所以这是你需要改变的第一件事。查看html，您需要查找、、标记

根据原始代码提供的内容，您只需修改一些内容，使其符合逻辑：

我要指出我改变了什么：

import requests, bs4

BASE_AVAILABILITY_URL = f"https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s={course}"
#get input for course
course = input('Enter the course:')



def scrape_availability(text):
    soup = bs4.BeautifulSoup(text)   #<-- need to get the html text before creating a bs4 object. So I move the request (line below) before this, and also adjusted the parameter for this function.
                                     # the rest of the code is fine
    r = requests.get(str(BASE_AVAILABILITY_URL)  + str(course))
    rows = soup.select('.listOfClasses tr')

    for row in rows[1:]:
        columns = row.select('td')
        class_name = columns[2].contents[0]
        if len(class_name) > 1 and class_name != b'\xa0':
            print(class_name)
            print(columns[4].contents[0])
            print(columns[7].contents[0])
            print(columns[8].contents[0])

我仍然有错误或程序只是停止-一个特定的错误告诉你什么失败了，为什么失败。当你看到一条错误信息时不要放弃。阅读并理解信息，利用信息理解并纠正问题。所以错误是什么？具体发生在哪里？当它发生时，运行时的值是什么？@David起初它是一个未定义的东西，这让我意识到代码顺序是错误的。在之前的一次尝试中，我在输入之前首先打开了url，而本来应该是相反的。现在的问题是，确保输入框已显示，但在我键入课程缩写后，在这种情况下，它是ICS，什么也没有发生。这是在我输入课程后特别发生的。@David我知道有关于靓汤的教程，但没有一个是关于我需要做什么的。到目前为止，我看到的所有教程都有一个url='http://不管它是什么'，我知道这是固定的，这意味着我不能有用户输入。所以Beauty Soup将解析url，这取决于需要刮取/解析的内容现在是进行一些调试的时候了。什么事也没有发生是有点不可能的。在没有调试器的情况下，您可以做的一件事是在整个代码中添加各种输出行。您可以输出各种运行时值、指示已到达哪些代码块的输出行等。代码不会停止。它要么带着异常退出，要么执行到逻辑完成。例如，获得用户输入后的每一行都在创建一个值。这些值最终是什么？@chitown88在本学期的早些时候，我被教授python的输入，它正在测试=输入'Type hi'，然后当我在Juypter笔记本上运行这个时，它就会工作了。现在，我需要将输入带到网站的url中，但要做到这一点，我需要course=input‘输入课程字母：’通常没有url，代码只会给出一个框，要求用户输入课程字母。。。但现在我需要输入的是课程字母，它将被放置在url中，这样输出的内容就是被访问的url，并打印课程。它很有效！我被困在这上面有一段时间了。我想顺序在代码中很重要，因为当我这么多次这样做时，输出什么都不是。现在，我只需要为注册的学生数量和可用座位制作标签。谢谢你的帮助！：好的我希望这是有道理的。实际上，在实际获取HTMLR=requests.geturl之前，您试图解析HTMLSoup=bs4.BeautifulSoupr.text“html.parser”。有点像在你启动引擎之前试着开车。

import requests, bs4

BASE_AVAILABILITY_URL = "https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s="
#get input for course
course = input('Enter the course:')

url = BASE_AVAILABILITY_URL  + course

def scrape_availability(url):

    r = requests.get(url)
    soup = bs4.BeautifulSoup(r.text, 'html.parser')
    rows = soup.select('.listOfClasses tr')

    for row in rows[1:]:
        columns = row.select('td')
        class_name = columns[2].contents[0]
        if len(class_name) > 1 and class_name != b'\xa0':
            print(class_name)
            print(columns[4].contents[0])
            print(columns[7].contents[0])
            print(columns[8].contents[0])



scrape_availability(url)