为什么李';它没有显示python请求响应吗?

为什么李';它没有显示python请求响应吗?,python,html,beautifulsoup,python-requests,Python,Html,Beautifulsoup,Python Requests,我有一个关于网络抓取的家庭作业项目,我打算在一个月内从学校网站上收集所有的信息。我使用Python处理请求和漂亮的汤。我已经写了一些代码来获取一个url,并试图从保存事件信息的页面中获取所有的li。然而,当我去抓取所有的li内容时,我注意到我并没有收到所有的内容。我一直在想,这是由于ul的“溢出:隐藏”风格,但为什么我能得到前几个李的呢 from bs4 import BeautifulSoup import requests url = 'https://apps.iu.edu/ccl-pr

我有一个关于网络抓取的家庭作业项目,我打算在一个月内从学校网站上收集所有的信息。我使用Python处理请求和漂亮的汤。我已经写了一些代码来获取一个url,并试图从保存事件信息的页面中获取所有的li。然而,当我去抓取所有的li内容时,我注意到我并没有收到所有的内容。我一直在想,这是由于ul的“溢出:隐藏”风格,但为什么我能得到前几个李的呢

from bs4 import BeautifulSoup
import requests

url = 'https://apps.iu.edu/ccl-prd/events/view?date=06012016&type=day&pubCalId=GRP1322'
r = requests.get(url)
bsObj =  BeautifulSoup(r.text,"html.parser")    

eventList = []
eventURLs = bsObj.find_all("a",href=True)
print len(eventURLs)

count = 1
for url in eventURLs:
    print str(count) + '. ' + url['href']
    count += 1
我正在打印URL,因为我计划转到事件内部的href链接,以获得完整的描述和提供的其他元数据。然而,我并没有得到所有的事件列表。我只拿到前5分。我得到的输出中用于事件的链接是数字19到23。不过,该页面总共有10个事件

输出:

1. https://www.indiana.edu/
2. #advancedSearch
3. /ccl-prd/events/view?type=week&date=06012016&pubCalId=GRP1322
4. /ccl-prd/events/view?type=month&date=06012016&pubCalId=GRP1322
5. /ccl-prd/events/view?type=day&date=06222016&pubCalId=GRP1322
6. /ccl-prd/events/view?pubCalId=GRP1432&type=day&date=06012016
7. /ccl-prd/events/view?pubCalId=GRP1445&type=day&date=06012016
8. /ccl-prd/events/view?pubCalId=GRP1436&type=day&date=06012016
9. /ccl-prd/events/view?pubCalId=GRP1438&type=day&date=06012016
10. /ccl-prd/events/view?pubCalId=GRP1440&type=day&date=06012016
11. /ccl-prd/events/view?pubCalId=GRP1443&type=day&date=06012016
12. /ccl-prd/events/view?pubCalId=GRP1434&type=day&date=06012016
13. /ccl-prd/events/view?pubCalId=GRP1447&type=day&date=06012016
14. /ccl-prd/events/view?pubCalId=GRP1450&type=day&date=06012016
15. http://newsinfo.iu.edu/
16. http://www.indiana.edu/~iuvis/
17. /ccl-prd/events/view?type=day&date=06012016&iub=BL011&pubCalId=GRP1322
18. /ccl-prd/events/view?type=day&date=06012016&iub=BL153&pubCalId=GRP1322
19. /ccl-prd/events/view/13147231?viewParams=%26type%3dday%26date%3d06012016&theDate=06222016&referrer=listView&pubCalId=GRP1322
20. /ccl-prd/events/view/13163329?viewParams=%26type%3dday%26date%3d06012016&referrer=listView&pubCalId=GRP1322
21. /ccl-prd/events/view/13163465?viewParams=%26type%3dday%26date%3d06012016&theDate=06222016&referrer=listView&pubCalId=GRP1322
22. /ccl-prd/events/view/13110443?viewParams=%26type%3dday%26date%3d06012016&theDate=06222016&referrer=listView&pubCalId=GRP1322
23. /ccl-prd/events/view/11744967?viewParams=%26type%3dday%26date%3d06012016&theDate=06222016&referrer=listView&pubCalId=GRP1322
24. http://www.iu.edu/copyright/index.shtml
25. http://www.iu.edu/
TLDR:当我使用Python请求和BeautifulSoup时,我并没有从页面上的lis获取所有链接。为什么我得不到链接?有没有更好的方法解决这个问题


编辑以给出答案:我需要的链接都是用Javascript创建的,由于请求和Beautiful soup没有运行Javascript,我转而使用PhantomJS迁移到Selenium。然而,下面的答案显示了如何通过在Python请求中使用参数来获取Javascript创建的信息,这是一种完美的方法

我查看了页面的源代码,在纯HTML中,有25个

Beauty Soup是一个Python库,用于从HTML和XML文件中提取数据。它与您喜爱的解析器一起工作,提供导航、搜索和修改解析树的惯用方法

您需要利用JavaScript引擎来实际生成这些元素,或者找出此页面从何处提取其事件列表,然后转到那里获取数据

您可以尝试使用一个真正的浏览器,它甚至可以让您在DOM中搜索类似于BeautifulSoup的内容,这样您就不需要再使用BeautifulSoup了。但是,如果您执意使用BeautifulSoup,您可以使用Selenium来控制浏览器,使其使用JavaScript生成元素(因为浏览器是自动生成元素的),然后让Selenium通过调用类似这样的东西来提供源代码(
driver.page\u source
只会获取
请求
提供给您的内容):

html=driver.execute_脚本(“return document.getElementsByTagName('html')[0].innerHTML”)
也有无头浏览器(“headless”意思是它没有GUI,所以你永远看不到它,也不需要显示器),如果你愿意,你可以使用它,或者你的脚本需要在没有显示器的东西上运行(我知道如果你没有显示器连接,Firefox根本不会启动)我想如果你真的想的话,有一种方法可以在这些浏览器中使用BeautifulSoup

如果您决定选择查看此页面从何处提取事件数据的路径,那么您可能不需要使用
请求
,因为如果JavaScript只是获取一些JSON文件,
请求
有一个
响应.JSON()
函数,该函数将整个内容转换为python
dict
,您只需搜索即可

如果您使用的是HTML解析器(例如,BeautifulSoup、Selenium),那么您应该在页面上找到包含所有这些
元素的元素,然后调用
.find_all(“a”,href=True)
(用于BeautifulSoup),从而缩小搜索这些链接的范围或者
。在元素对象上通过_css_选择器(“a[href]”
(对于Selenium)查找_元素(是的,您可以这样做,这太棒了!)


我不确定你的作业的确切标准,因此我不知道这些选项是否与它们冲突。但我希望我至少为你指出了正确的方向。

一些链接是用js生成的,但你可以通过请求以json格式从十个事件中获取所有事件数据:

import requests

params = {"pageNum": "1",
          "date": "06012016",
          "type": "day",
          "isSearch": "false",
          "pubCalId": "GRP1322"}

r = requests.get("https://apps.iu.edu/ccl-prd/events/view/page", params=params)

for ev in r.json()["events"][0]["events"]:
    print ev
这给了你:

{u'groupEvent': True, u'allDay': True, u'description': u'\n\tOnline processing is not available. Drop forms should be obtained from the student's school. Completed forms must be submitted for processing at Student Central on Union.\n\n\tDates and times are subject to change without notice. See the Official Calendar for more details.\n', u'startDate': u'12:00am', u'calendarName': None, u'recurDateUtc': None, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'Summer 2016: Withdrawal with Grade of W or F for First Six Week classes', u'recurs': False, u'id': u'13139699'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tFor freshman Theodore Dreiser in 1889, Indiana University served as fertile ground for his future literary endeavors, but to him “the life of the town, the character of its people, the professors and the students, and the mechanism, politics, and social interests of the University body proper” were far more influential. For generatio', u'startDate': u'8:00am', u'calendarName': None, u'recurDateUtc': 1464796800000, u'imageId': 125740, u'privateAndViewing': False, u'imageEventId': 13147231, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'Exhibit: Student Reform Movements at IU', u'recurs': True, u'id': u'13147231'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tJoin us for Traditional Arts Indiana's traveling Bicentennial exhibit, Indiana Folk Arts: 200 Years of Tradition and Innovation. Before the exhibit begins its travels across Indiana, the MMWC will present it to the IU Bloomington campus and local communities. The exhibit will be on display through July 29, 2016.\r\n', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 129351, u'privateAndViewing': False, u'imageEventId': 13163465, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Indiana Folk Arts: 200 Years of Tradition and Innovation"', u'recurs': True, u'id': u'13163465'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tIn 1913, Joseph Dixon visited the Tuscarora Nation, the smallest of the Haudenosaunee (Iroquois) communities, located in western New York. Dixon photographed six individuals during his visit, and those images became part of the Wanamaker Collection of Native American photographs, now housed at the Mathers Museum of World Cultures. While reviewin', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 115080, u'privateAndViewing': False, u'imageEventId': 13110443, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N Indiana Ave, Bloomington, IN 47408', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Stirring the Pot: Bringing the Wanamakers Home"', u'recurs': True, u'id': u'13110443'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Cherokee Craft, 1973," at the Mathers Museum of World Cultures, presents a snapshot of craft production among the Eastern Band Cherokee at a key moment in both an ongoing Appalachian craft revival and the specific cultural and economic life of the Cherokee people in western North Carolina. The exhibition showcases basketry in three di', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 96460, u'privateAndViewing': False, u'imageEventId': 11744967, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT:  "Cherokee Craft, 1973"', u'recurs': True, u'id': u'11744967'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"MONSTERS!" are extraordinary or unnatural beings that challenge the predictable fabric of everyday life. This exhibition looks at monsters from around the world, discovering who they are and what purposes they serve in various cultures, as different images of monstrousness emerge from the dark recesses of human imagination. The exhibi', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 109380, u'privateAndViewing': False, u'imageEventId': 13088883, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "MONSTERS!\'', u'recurs': True, u'id': u'13088883'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Tools of Travel" features objects that people in different times and places have used to transport themselves and their belongings, exploring the technology of travel (wagon, saddle, sled, and canoe) and how it is powered (horse, camel, dog, and human). The exhibit opens March 22,2016 and will be open through December 17, 2017.\r\n', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 129348, u'privateAndViewing': False, u'imageEventId': 13146383, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Tools of Travel"', u'recurs': True, u'id': u'13146383'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Thoughts, Things, and Theories...What Is Culture?"  at the Mathers Museum of World Cultures, examines the nature of culture through the exploration of cultural traditions surrounding life stages and universal needs.\r\n\r\n\t \r\n\r\n\tFree visitor parking is available by the Indiana Avenue lobby entrance. Metered parking is available', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 76320, u'privateAndViewing': False, u'imageEventId': 10124630, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave., Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Thoughts, Things, and Theories...What Is Culture?"', u'recurs': True, u'id': u'10124630'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tNew Acquisitions: African American Art\r\n\r\n\tA group of local community, university, and business leaders, headed by Donald Griffin, Jr., broker/owner of Griffin Realty, has formed a coalition to help the IU Art Museum build its collection of works by African American artists. These first acquisitions of what is hoped will become an annual endeavo', u'startDate': u'10:00am', u'calendarName': None, u'recurDateUtc': 1464804000000, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'Art Museum', u'imageCampus': u'BL', u'summary': u'New in the Galleries', u'recurs': True, u'id': u'13164911'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tDavid Konisky\r\n\t\r\n\tExtreme Weather Exposure and Support for Climate Change Adaptation\r\n', u'startDate': u'12:00pm', u'calendarName': None, u'recurDateUtc': None, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'PAPF and G&M Summer Research Workshop', u'recurs': False, u'id': u'13164381'}
Summer 2016: Withdrawal with Grade of W or F for First Six Week classes
8:00am
Exhibit: Student Reform Movements at IU
9:00am
EXHIBIT: "Indiana Folk Arts: 200 Years of Tradition and Innovation"
9:00am
EXHIBIT: "Stirring the Pot: Bringing the Wanamakers Home"
9:00am
EXHIBIT:  "Cherokee Craft, 1973"
9:00am
EXHIBIT: "MONSTERS!'
9:00am
EXHIBIT: "Tools of Travel"
9:00am
EXHIBIT: "Thoughts, Things, and Theories...What Is Culture?"
10:00am
New in the Galleries
12:00pm
PAPF and G&M Summer Research Workshop
单击“更多”或摘要标题时弹出的大多数信息都包含在json中

要获取开始时间和摘要,请执行以下操作:

for ev in r.json()["events"][0]["events"]:
    print(ev["startDate"])
    print ev["summary"]
这给了你:

{u'groupEvent': True, u'allDay': True, u'description': u'\n\tOnline processing is not available. Drop forms should be obtained from the student's school. Completed forms must be submitted for processing at Student Central on Union.\n\n\tDates and times are subject to change without notice. See the Official Calendar for more details.\n', u'startDate': u'12:00am', u'calendarName': None, u'recurDateUtc': None, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'Summer 2016: Withdrawal with Grade of W or F for First Six Week classes', u'recurs': False, u'id': u'13139699'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tFor freshman Theodore Dreiser in 1889, Indiana University served as fertile ground for his future literary endeavors, but to him “the life of the town, the character of its people, the professors and the students, and the mechanism, politics, and social interests of the University body proper” were far more influential. For generatio', u'startDate': u'8:00am', u'calendarName': None, u'recurDateUtc': 1464796800000, u'imageId': 125740, u'privateAndViewing': False, u'imageEventId': 13147231, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'Exhibit: Student Reform Movements at IU', u'recurs': True, u'id': u'13147231'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tJoin us for Traditional Arts Indiana's traveling Bicentennial exhibit, Indiana Folk Arts: 200 Years of Tradition and Innovation. Before the exhibit begins its travels across Indiana, the MMWC will present it to the IU Bloomington campus and local communities. The exhibit will be on display through July 29, 2016.\r\n', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 129351, u'privateAndViewing': False, u'imageEventId': 13163465, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Indiana Folk Arts: 200 Years of Tradition and Innovation"', u'recurs': True, u'id': u'13163465'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tIn 1913, Joseph Dixon visited the Tuscarora Nation, the smallest of the Haudenosaunee (Iroquois) communities, located in western New York. Dixon photographed six individuals during his visit, and those images became part of the Wanamaker Collection of Native American photographs, now housed at the Mathers Museum of World Cultures. While reviewin', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 115080, u'privateAndViewing': False, u'imageEventId': 13110443, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N Indiana Ave, Bloomington, IN 47408', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Stirring the Pot: Bringing the Wanamakers Home"', u'recurs': True, u'id': u'13110443'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Cherokee Craft, 1973," at the Mathers Museum of World Cultures, presents a snapshot of craft production among the Eastern Band Cherokee at a key moment in both an ongoing Appalachian craft revival and the specific cultural and economic life of the Cherokee people in western North Carolina. The exhibition showcases basketry in three di', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 96460, u'privateAndViewing': False, u'imageEventId': 11744967, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT:  "Cherokee Craft, 1973"', u'recurs': True, u'id': u'11744967'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"MONSTERS!" are extraordinary or unnatural beings that challenge the predictable fabric of everyday life. This exhibition looks at monsters from around the world, discovering who they are and what purposes they serve in various cultures, as different images of monstrousness emerge from the dark recesses of human imagination. The exhibi', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 109380, u'privateAndViewing': False, u'imageEventId': 13088883, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "MONSTERS!\'', u'recurs': True, u'id': u'13088883'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Tools of Travel" features objects that people in different times and places have used to transport themselves and their belongings, exploring the technology of travel (wagon, saddle, sled, and canoe) and how it is powered (horse, camel, dog, and human). The exhibit opens March 22,2016 and will be open through December 17, 2017.\r\n', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 129348, u'privateAndViewing': False, u'imageEventId': 13146383, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave, Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Tools of Travel"', u'recurs': True, u'id': u'13146383'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\t"Thoughts, Things, and Theories...What Is Culture?"  at the Mathers Museum of World Cultures, examines the nature of culture through the exploration of cultural traditions surrounding life stages and universal needs.\r\n\r\n\t \r\n\r\n\tFree visitor parking is available by the Indiana Avenue lobby entrance. Metered parking is available', u'startDate': u'9:00am', u'calendarName': None, u'recurDateUtc': 1464800400000, u'imageId': 76320, u'privateAndViewing': False, u'imageEventId': 10124630, u'going': False, u'location': u'Mathers Museum of World Cultures, 416 N. Indiana Ave., Bloomington, IN', u'imageCampus': u'BL', u'summary': u'EXHIBIT: "Thoughts, Things, and Theories...What Is Culture?"', u'recurs': True, u'id': u'10124630'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tNew Acquisitions: African American Art\r\n\r\n\tA group of local community, university, and business leaders, headed by Donald Griffin, Jr., broker/owner of Griffin Realty, has formed a coalition to help the IU Art Museum build its collection of works by African American artists. These first acquisitions of what is hoped will become an annual endeavo', u'startDate': u'10:00am', u'calendarName': None, u'recurDateUtc': 1464804000000, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'Art Museum', u'imageCampus': u'BL', u'summary': u'New in the Galleries', u'recurs': True, u'id': u'13164911'}
{u'groupEvent': True, u'allDay': False, u'description': u'\r\n\tDavid Konisky\r\n\t\r\n\tExtreme Weather Exposure and Support for Climate Change Adaptation\r\n', u'startDate': u'12:00pm', u'calendarName': None, u'recurDateUtc': None, u'imageId': None, u'privateAndViewing': False, u'imageEventId': None, u'going': False, u'location': u'', u'imageCampus': u'BL', u'summary': u'PAPF and G&M Summer Research Workshop', u'recurs': False, u'id': u'13164381'}
Summer 2016: Withdrawal with Grade of W or F for First Six Week classes
8:00am
Exhibit: Student Reform Movements at IU
9:00am
EXHIBIT: "Indiana Folk Arts: 200 Years of Tradition and Innovation"
9:00am
EXHIBIT: "Stirring the Pot: Bringing the Wanamakers Home"
9:00am
EXHIBIT:  "Cherokee Craft, 1973"
9:00am
EXHIBIT: "MONSTERS!'
9:00am
EXHIBIT: "Tools of Travel"
9:00am
EXHIBIT: "Thoughts, Things, and Theories...What Is Culture?"
10:00am
New in the Galleries
12:00pm
PAPF and G&M Summer Research Workshop

我的第一个猜测是它们在iframe中,但它们不是。所以还有其他选项:1.它们是用脚本生成的,2.你的代码中有一个问题,我没有看到。你是否检查了该页面的源代码?这些链接在代码中是如何显示的?我已经查看了源代码。它们都在那里。但是,有一个他们所处的ul元素的样式为“溢出:隐藏”。我不知道这是否是一个因素,因为我得到了一些链接。我也在描述中发布了链接。如果你检查页面,你会发现一些链接是由javascript生成的,要取消这些链接,你必须使用scrapy或phantom。所以这些链接是由javascript生成的?为什么我能得到一些呢然后,不是全部吗?标准相当宽松。我想使用Python请求远离浏览器,我认为请求甚至比使用无头浏览器的Selenium更好(我可能错了)。在页面上有id为“mainEvents”的ul中的事件,但当我收到对给定url的请求的响应时,此ul不会显示。这就是为什么我决定尝试获取所有链接,看看是否可以稍后对其进行排序。这是当我注意到我没有获得所有必要的链接时。所以你说我缺少这些链接由Javascript生成的e和请求可以运行它吗?我想我必须使用Selenium来实现这一点,因为我读到,“请求是一个http库,它不能运行Javascript。”所以我将使用Selenium和Phanto来实现这一点