Python 网站列表解析器仅检索20个项目，如何使网站加载更多_Python_Parsing

Python 网站列表解析器仅检索20个项目，如何使网站加载更多

python parsing

Python 网站列表解析器仅检索20个项目，如何使网站加载更多,python,parsing,Python,Parsing,有一个网站上有50家公司的名单，我正试图解析该名单并将其导出到一个csv文件中代码我只得到我20，因为页面加载时，你向下滚动。有没有办法模拟向下滚动或使其完全加载 from lxml import html import requests def schindler(max): # create a list of the companies page = requests.get('http://beta.fortune.com/worlds-most-admired-compan

有一个网站上有50家公司的名单，我正试图解析该名单并将其导出到一个

csv

文件中

代码我只得到我20，因为页面加载时，你向下滚动。有没有办法模拟向下滚动或使其完全加载

from lxml import html
import requests

def schindler(max): # create a list of the companies
    page = requests.get('http://beta.fortune.com/worlds-most-admired-companies/list/')
    tempContainer = html.fromstring(page.content)
    names = []
    position = 1

    while position <= max:
        names.extend(tempContainer.xpath('//*[@id="pageContent"]/div[2]/div/div/div[1]/div[1]/ul/li['+str(position)+']/a/span[2]/text()'))
        position = position + 1

    return names

看起来您能够以JSON的形式获取数据。url中的20似乎是开始的排名，30是项目数

示例代码：

url=”http://fortune.com/api/v2/list/1918408/expand/item/ordering/asc/20/30"
resp=requests.get（url）
对于resp.json（）['list-items']中的条目：
打印（条目['rank']，条目['name']）

我想说这是的复制品，但我知道这是从“阅读静态网页”过渡到“与网页交互”，这是一个不幸的大步骤。谢谢！我会检查的，我试图寻找一个预先确定的答案，但没有发现任何有用的。可能是因为我没有足够的技术词汇来恰当地陈述我的问题。三个相关的词是：你遗漏了吗？（顺便说一句，我喜欢你对galdolf的引用）太棒了，我需要格式化条目['name']，删除末尾的数字

print(schindler(50))