Python 正在删除\r\n和空白
如何使用BS和Python从打印文本中删除所有空行? 我还是新手,我想我说的可能叫做空白 电流输出:Python 正在删除\r\n和空白,python,python-2.7,beautifulsoup,kodi,Python,Python 2.7,Beautifulsoup,Kodi,如何使用BS和Python从打印文本中删除所有空行? 我还是新手,我想我说的可能叫做空白 电流输出: 02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks - Channel 60 02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield - Channel 04 03:00 - 05:00 MLS: Portland Timbers at Los Angeles
02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks
- Channel 60
02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield
- Channel 04
03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy
- Channel 05
期望输出:
02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks - Channel 60
02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield - Channel 04
03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy - Channel 05
代码:
import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os
pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')
try:
response = urllib2.urlopen(req)
pg_source = response.read().decode('utf-8' , 'ignore')
response.close()
except:
pass
content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', { 'id' : 'myUL' }).prettify())
print (content.text)
只需一点时间,您就可以构建如下输出:
代码:
测试代码:
结果:
实现相同结果但代码更少的一种非常简单的方法是使用requests模块 这是代码
import requests
from bs4 import BeautifulSoup
html = requests.get('http://rushmore.tv/schedule').text
soup = BeautifulSoup(html,'lxml')
ul = soup.find('ul', { 'id' : 'myUL' })
for content in ul.find_all('li'):
print(content.text)
试试这个。这对我来说很好。将strip()函数与string一起使用您真是太好了,这将非常完美,但Kodi不使用lxml(此脚本将在Kodi中使用)。我在读ElementTree。你知道我能不能用它代替lxml?谢谢。好的,所以我改用html.parser。我现在面临一个关于Kodi的问题,我现在需要解决。这对我来说是可行的,但是有替代打印方法的方法吗?当我打印时,我不使用打印功能。我正在使用xbmc.gui,要使用它打印,我必须打印字符串。
print
只是一个接受字符串的函数。你可以用这个字符串做任何你想做的事情。但是这个字符串上的打印包括“('\n'.join('.join(l)代表zip中的l(text[::2],text[1::2]))”我怎么能把它改成更简单的“print(string)”呢?我不会使用打印函数,因为我将在kodi上使用这个脚本,并且需要提供一个简单的字符串。抱歉我的无知,我还是Python新手。谢谢。打印函数的()
之间的所有内容都会生成一个字符串。您可以简单地将其分配给变量。例如:mystring=“\n”.joi….
import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os
pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent',
'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')
try:
response = urllib2.urlopen(req)
pg_source = response.read().decode('utf-8', 'ignore')
response.close()
except:
pass
content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', {'id': 'myUL'}).prettify())
text = [l.strip() for l in content.text.split('\n') if l.strip()]
print('\n'.join(' '.join(l) for l in zip(text[::2], text[1::2])))
21:00 - 23:00 NCAAB: Pepperdine vs Saint Mary's - Channel 03
21:30 - 00:00 AFL: Gold Coast vs. Geelong - Channel 47
22:00 - 00:00 A-League: Western Sydney Wanderers vs Perth Glory - BT Sport 1
22:45 - 03:00 Ski Classic: Mora - Channel 93
23:00 - 00:30 Freestyle Skiing WC: Ski Cross - Channel 106
import requests
from bs4 import BeautifulSoup
html = requests.get('http://rushmore.tv/schedule').text
soup = BeautifulSoup(html,'lxml')
ul = soup.find('ul', { 'id' : 'myUL' })
for content in ul.find_all('li'):
print(content.text)