Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用difflib&;忽略页面某些部分的内容差异;美丽之群_Python_Html_Web Scraping_Beautifulsoup_Difflib - Fatal编程技术网

Python 使用difflib&;忽略页面某些部分的内容差异;美丽之群

Python 使用difflib&;忽略页面某些部分的内容差异;美丽之群,python,html,web-scraping,beautifulsoup,difflib,Python,Html,Web Scraping,Beautifulsoup,Difflib,以下代码可以很好地将仅内容的更改提取到: 此页面本质上是一个文档,因此,我只感兴趣的是检测页面页脚上方和顶部菜单下方部分的差异。我原以为这样一个页面上的页脚或菜单很少会有变化,但几天后重新运行diff显示出细微的变化: - Potential Entitlement - Social Security Statement - American Indians and Alaska Natives + American Indians/Alaska Natives - Asian American

以下代码可以很好地将仅内容的更改提取到:

此页面本质上是一个文档,因此,我只感兴趣的是检测页面页脚上方和顶部菜单下方部分的差异。我原以为这样一个页面上的页脚或菜单很少会有变化,但几天后重新运行diff显示出细微的变化:

- Potential Entitlement
- Social Security Statement
- American Indians and Alaska Natives
+ American Indians/Alaska Natives
- Asian Americans and Pacific Islanders
+ Asian Americans/Pacific Islanders
- Self-employed
+ Self-Employed
- Awards
+ Digital Government Strategy
+ Open Government
- Podcasts
- Webinars
- Digital Government Strategy
考虑到我已经走了解析整个页面的beautifulsou路线(而不是说,用lxml只解析其中的部分),我是否被限制在这里?在运行difflib之前,我是否需要返回并将页面分为多个部分(或者仅仅是部分
//div[@class='grid']

- Potential Entitlement
- Social Security Statement
- American Indians and Alaska Natives
+ American Indians/Alaska Natives
- Asian Americans and Pacific Islanders
+ Asian Americans/Pacific Islanders
- Self-employed
+ Self-Employed
- Awards
+ Digital Government Strategy
+ Open Government
- Podcasts
- Webinars
- Digital Government Strategy