在python中使用漂亮的汤刮桌子
如何使用在python中使用漂亮的汤刮桌子,python,python-3.x,web-scraping,beautifulsoup,pycharm,Python,Python 3.x,Web Scraping,Beautifulsoup,Pycharm,如何使用for循环中的find_all访问内的标记,因为每个似乎彼此独立,并且具有可选类'偶数'和'奇数'。我只能在find_all中传递两个参数。i、 efind_all('tr',class='odd')或(偶数) 另外,如何仅访问每个中的第1、第3、第4和第6个。 这些标记没有id或类 代码: 您可以首先找到父标记 from bs4 import BeautifulSoup import requests src_code = requests.get('https://bschoo
for
循环中的find_all
访问内的标记,因为每个
似乎彼此独立,并且具有可选类'偶数'
和'奇数'
。我只能在find_all
中传递两个参数。i、 efind_all('tr',class='odd')
或(偶数)
另外,如何仅访问每个中的第1、第3、第4和第6个。
这些标记没有id或类
代码:
您可以首先找到父标记
from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').content
soup = BeautifulSoup(src_code, features="html5lib")
trs=soup.find(name = "div",id="related-results").find_all(name = "tr")
trs
trs是您想要的:
[<tr><th>College Name</th><th>Rank</th><th>Overall Score</th><th>Rating</th><th>Ownership</th><th>Intake Exams</th><th></th></tr>,
<tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-ahmedabad">Indian Institute of Management Ahmedabad</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">1</span></td><td><span class="overall_scoredata">427.92</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.7 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-7057"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=7057&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="7057"/><span></span> <i>Compare</i> </label></div></td></tr>,
<tr class="even"><td><a href="https://www.careers360.com/university/indian-institute-of-management-bangalore">Indian Institute of Management Bangalore</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">2</span></td><td><span class="overall_scoredata">408.32</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.1 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6872"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=6872&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6872"/><span></span> <i>Compare</i> </label></div></td></tr>,
<tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-calcutta">Indian Institute of Management Calcutta</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">3</span></td><td><span class="overall_scoredata">375.18</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.9 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">GMAT</span><ul><li>CAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6933"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=6933&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6933"/><span></span> <i>Compare</i> </label></div></td></tr>,
......
[学院名称Rankoveral分数所有者入学考试,
1427.92aaaaaaa2017评级:aaaaaaa政府用户评级:4.7/5 CAT- GMAT
比较,
2408.32AAAA2017评级:AAAA政府用户评级:4.1/5 CAT- GMAT
比较,
3375.18AAAA2017年评级:AAAA政府用户评级:4.9/5 GMAT- CAT
比较,
......
您可以首先找到父标记
from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').content
soup = BeautifulSoup(src_code, features="html5lib")
trs=soup.find(name = "div",id="related-results").find_all(name = "tr")
trs
trs是您想要的:
[<tr><th>College Name</th><th>Rank</th><th>Overall Score</th><th>Rating</th><th>Ownership</th><th>Intake Exams</th><th></th></tr>,
<tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-ahmedabad">Indian Institute of Management Ahmedabad</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">1</span></td><td><span class="overall_scoredata">427.92</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.7 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-7057"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=7057&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="7057"/><span></span> <i>Compare</i> </label></div></td></tr>,
<tr class="even"><td><a href="https://www.careers360.com/university/indian-institute-of-management-bangalore">Indian Institute of Management Bangalore</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">2</span></td><td><span class="overall_scoredata">408.32</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.1 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6872"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=6872&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6872"/><span></span> <i>Compare</i> </label></div></td></tr>,
<tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-calcutta">Indian Institute of Management Calcutta</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">3</span></td><td><span class="overall_scoredata">375.18</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.9 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">GMAT</span><ul><li>CAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6933"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&nid=6933&flag=bookmarks&click_location=follow_button&popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6933"/><span></span> <i>Compare</i> </label></div></td></tr>,
......
[学院名称Rankoveral分数所有者入学考试,
1427.92aaaaaaa2017评级:aaaaaaa政府用户评级:4.7/5 CAT- GMAT
比较,
2408.32AAAA2017评级:AAAA政府用户评级:4.1/5 CAT- GMAT
比较,
3375.18AAAA2017年评级:AAAA政府用户评级:4.9/5 GMAT- CAT
比较,
......
find_all(“tr”,class=[“奇数”,“偶数”)
这将获取所有tr标记,然后是带有标记的td标记和标记的文本
from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').text
soup = BeautifulSoup(src_code, features="html.parser")
alltr=soup.find_all("tr",class_=['odd','even'])
for x in alltr:
print(x.td.a.text)
find_all(“tr”,class_=['odd','偶数])
这将获取所有tr标记,然后是带有标记的td标记和标记的文本
from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').text
soup = BeautifulSoup(src_code, features="html.parser")
alltr=soup.find_all("tr",class_=['odd','even'])
for x in alltr:
print(x.td.a.text)