Regex 美丽的汤正则表达式列表中的Python循环
当我运行下面的代码时,我得到三个列表,一个在另一个的下面。 我希望它们是水平的,用逗号分隔(类似于上一个print list语句,其中数据用逗号分隔)。 我尝试重新排列for循环语句,得到了各种各样的组合,但没有像我上面描述的那样。请帮忙Regex 美丽的汤正则表达式列表中的Python循环,regex,python-3.x,loops,beautifulsoup,Regex,Python 3.x,Loops,Beautifulsoup,当我运行下面的代码时,我得到三个列表,一个在另一个的下面。 我希望它们是水平的,用逗号分隔(类似于上一个print list语句,其中数据用逗号分隔)。 我尝试重新排列for循环语句,得到了各种各样的组合,但没有像我上面描述的那样。请帮忙 import bs4 as bs import urllib.request import re sauce = urllib.request.urlopen('http://www5.statcan.gc.ca/cimt-cicm/topNCountryC
import bs4 as bs
import urllib.request
import re
sauce = urllib.request.urlopen('http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27§ionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3').read()
soup = bs.BeautifulSoup(sauce,'lxml')
regexQ = re.compile('.*Date1 Qty.*')
regexC = re.compile('.*Footnote.*')
regexV = re.compile('.*Date1 Val.*')
for countryPart in soup.findAll("a",{"href":regexC}):
Country = countryPart.text.strip()
print(Country)
for DatePart in soup.findAll("td",{"headers":regexQ}):
Quantity = DatePart.text.strip()
print(Quantity)
for ValPart in soup.findAll("td",{"headers": regexV}):
Value = ValPart.text.strip()
print(Value)
list = [Country,Quantity,Value]
print(list)
尝试将您的国家和其他结果折叠成一个列表 然后试试这个:
for mylist in lists:
print(*mylist, end=", ")
尝试将您的国家和其他结果折叠成一个列表 然后试试这个:
for mylist in lists:
print(*mylist, end=", ")
看一看
另外,在BeautifulSoup中使用正则表达式时,不需要*
来匹配任何字符
使用此选项可获得您想要的:
regexQ = re.compile('Date1 Qty')
regexC = re.compile('Footnote')
regexV = re.compile('Date1 Val')
country = [x.text.strip() for x in soup.find_all("a", {"href": regexC})]
quantity = [x.text.strip() for x in soup.find_all("td", {"headers": regexQ})]
value = [x.text.strip() for x in soup.find_all("td", {"headers": regexV})]
total_list = [list(x) for x in zip(country, quantity, value)]
for item in total_list:
print(item)
输出:
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
看一看
另外,在BeautifulSoup中使用正则表达式时,不需要*
来匹配任何字符
使用此选项可获得您想要的:
regexQ = re.compile('Date1 Qty')
regexC = re.compile('Footnote')
regexV = re.compile('Date1 Val')
country = [x.text.strip() for x in soup.find_all("a", {"href": regexC})]
quantity = [x.text.strip() for x in soup.find_all("td", {"headers": regexQ})]
value = [x.text.strip() for x in soup.find_all("td", {"headers": regexV})]
total_list = [list(x) for x in zip(country, quantity, value)]
for item in total_list:
print(item)
输出:
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
您可以不使用
regex
来完成此操作。尝试下面的方法来达到同样的效果。我使用了列表理解
使用urllib
:
from urllib.request import urlopen
from bs4 import BeautifulSoup
res = urlopen("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27§ionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.read(),"lxml")
for items in soup.find_all(class_="ResultRow"):
data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
print(data)
使用请求
:
import requests
from bs4 import BeautifulSoup
res = requests.get("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27§ionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find_all(class_="ResultRow"):
data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
print(data)
输出:
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
您可以不使用
regex
来完成此操作。尝试下面的方法来达到同样的效果。我使用了列表理解
使用urllib
:
from urllib.request import urlopen
from bs4 import BeautifulSoup
res = urlopen("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27§ionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.read(),"lxml")
for items in soup.find_all(class_="ResultRow"):
data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
print(data)
使用请求
:
import requests
from bs4 import BeautifulSoup
res = requests.get("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27§ionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find_all(class_="ResultRow"):
data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
print(data)
输出:
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']
['World', '282,911,404', '67,284,637']
['Equatorial Guinea', '146,027,530', '40,493,766']
['Trinidad and Tobago', '136,883,464', '26,790,695']
['Japan', '410', '176']