Python 使用difflib&;忽略页面某些部分的内容差异;美丽之群
以下代码可以很好地将仅内容的更改提取到: 此页面本质上是一个文档,因此,我只感兴趣的是检测页面页脚上方和顶部菜单下方部分的差异。我原以为这样一个页面上的页脚或菜单很少会有变化,但几天后重新运行diff显示出细微的变化:Python 使用difflib&;忽略页面某些部分的内容差异;美丽之群,python,html,web-scraping,beautifulsoup,difflib,Python,Html,Web Scraping,Beautifulsoup,Difflib,以下代码可以很好地将仅内容的更改提取到: 此页面本质上是一个文档,因此,我只感兴趣的是检测页面页脚上方和顶部菜单下方部分的差异。我原以为这样一个页面上的页脚或菜单很少会有变化,但几天后重新运行diff显示出细微的变化: - Potential Entitlement - Social Security Statement - American Indians and Alaska Natives + American Indians/Alaska Natives - Asian American
- Potential Entitlement
- Social Security Statement
- American Indians and Alaska Natives
+ American Indians/Alaska Natives
- Asian Americans and Pacific Islanders
+ Asian Americans/Pacific Islanders
- Self-employed
+ Self-Employed
- Awards
+ Digital Government Strategy
+ Open Government
- Podcasts
- Webinars
- Digital Government Strategy
考虑到我已经走了解析整个页面的beautifulsou路线(而不是说,用lxml只解析其中的部分),我是否被限制在这里?在运行difflib之前,我是否需要返回并将页面分为多个部分(或者仅仅是部分//div[@class='grid']
)
- Potential Entitlement
- Social Security Statement
- American Indians and Alaska Natives
+ American Indians/Alaska Natives
- Asian Americans and Pacific Islanders
+ Asian Americans/Pacific Islanders
- Self-employed
+ Self-Employed
- Awards
+ Digital Government Strategy
+ Open Government
- Podcasts
- Webinars
- Digital Government Strategy