Python 如何从帖子中获取html文本&;登录后LinkedIn上的活动?
我正在学习网络抓取,我正在尝试收集LinkedIn帖子上的喜欢数量。登录后,我重定向到Posts&Activity页面,但我无法获取post的html文本以从中提取数据Python 如何从帖子中获取html文本&;登录后LinkedIn上的活动?,python,selenium,web-scraping,Python,Selenium,Web Scraping,我正在学习网络抓取,我正在尝试收集LinkedIn帖子上的喜欢数量。登录后,我重定向到Posts&Activity页面,但我无法获取post的html文本以从中提取数据 from selenium import webdriver from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import requests # Creation of a new instance of Google
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests
# Creation of a new instance of Google Chrome
browser = webdriver.Chrome('PATH/chromedriver.exe')
browser.get('https://www.linkedin.com/login')
browser.find_element_by_id('username').send_keys('****')
browser.find_element_by_id('password').send_keys('****' + Keys.RETURN)
#Go to Posts and Activity --> Posts
browser.get('https://www.linkedin.com/in/LINKEDIN_PROFILE_NAME/detail/recent-activity/shares/')
URL = 'https://www.linkedin.com/in/LINKEDIN_PROFILE_NAME/detail/recent-activity/shares/'
source = requests.get(URL).text
soup = BeautifulSoup(source, 'lxml')
print(soup)
以下是我得到的输出:
<html><head>
<script type="text/javascript">
window.onload = function() {
// Parse the tracking code from cookies.
var trk = "bf";
var trkInfo = "bf";
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
trk = cookies[i].substring(8);
}
else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {
trkInfo = cookies[i].substring(8);
}
}
if (window.location.protocol == "http:") {
// If "sl" cookie is set, redirect to https.
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
return;
}
}
}
// Get the new domain. For international domains such as
// fr.linkedin.com, we convert it to www.linkedin.com
var domain = "www.linkedin.com";
if (domain != location.host) {
var subdomainIndex = location.host.indexOf(".linkedin");
if (subdomainIndex != -1) {
domain = "www" + location.host.substring(subdomainIndex);
}
}
window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
"&originalReferer=" + document.referrer.substr(0, 200) +
"&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>
</head></html>
window.onload=函数(){
//从cookies解析跟踪代码。
var trk=“bf”;
var trkInfo=“bf”;
var cookies=document.cookie.split(“;”);
对于(变量i=0;i8)){
trk=cookies[i].子串(8);
}
else if((cookies[i].indexOf(“trkInfo=”)=0)和&(cookies[i].length>8)){
trkInfo=cookies[i].子串(8);
}
}
如果(window.location.protocol==“http:”){
//如果设置了“sl”cookie,则重定向到https。
对于(变量i=0;i3)){
window.location.href=“https:”+window.location.href.substring(window.location.protocol.length);
返回;
}
}
}
//获取新域。对于国际域,如
//fr.linkedin.com,我们将其转换为www.linkedin.com
var domain=“www.linkedin.com”;
如果(域!=location.host){
var subdomainIndex=location.host.indexOf(“.linkedin”);
如果(子域索引!=-1){
domain=“www”+location.host.substring(subdomainIndex);
}
}
window.location.href=“https://“+domain+”/authwall?trk=“+trk+”&trkInfo=“+trkInfo+
“&originalReferer=“+document.referer.substr(0,200)+
“&sessionRedirect=“+encodeURIComponent(window.location.href));
}
请帮助,我做错了什么?不要使用BS4源代码使用browser.page\u源代码。