Python 正在删除\r\n和空白_Python_Python 2.7_Beautifulsoup_Kodi

Python 正在删除\r\n和空白

python python-2.7

Python 正在删除\r\n和空白,python,python-2.7,beautifulsoup,kodi,Python,Python 2.7,Beautifulsoup,Kodi,如何使用BS和Python从打印文本中删除所有空行？我还是新手，我想我说的可能叫做空白电流输出： 02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks - Channel 60 02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield - Channel 04 03:00 - 05:00 MLS: Portland Timbers at Los Angeles

如何使用BS和Python从打印文本中删除所有空行？我还是新手，我想我说的可能叫做空白

电流输出：

02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks

 - Channel 60







02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield

 - Channel 04







03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy

 - Channel 05

期望输出：

02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks - Channel 60
02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield - Channel 04 
03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy - Channel 05

代码：

import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os

pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')

try:
    response = urllib2.urlopen(req)
    pg_source = response.read().decode('utf-8' , 'ignore')
    response.close()
except:
    pass

content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', { 'id' : 'myUL' }).prettify())

print (content.text)

只需一点时间，您就可以构建如下输出：

代码：测试代码：结果：

实现相同结果但代码更少的一种非常简单的方法是使用requests模块

这是代码

import requests
from bs4 import BeautifulSoup

html = requests.get('http://rushmore.tv/schedule').text

soup = BeautifulSoup(html,'lxml')

ul = soup.find('ul', { 'id' : 'myUL' })

for content in ul.find_all('li'):
    print(content.text)

试试这个。这对我来说很好。

将strip（）函数与string一起使用您真是太好了，这将非常完美，但Kodi不使用lxml（此脚本将在Kodi中使用）。我在读ElementTree。你知道我能不能用它代替lxml？谢谢。好的，所以我改用html.parser。我现在面临一个关于Kodi的问题，我现在需要解决。这对我来说是可行的，但是有替代打印方法的方法吗？当我打印时，我不使用打印功能。我正在使用xbmc.gui，要使用它打印，我必须打印字符串。

print

只是一个接受字符串的函数。你可以用这个字符串做任何你想做的事情。但是这个字符串上的打印包括“（'\n'.join（'.join（l）代表zip中的l（text[：：2]，text[1:：2]））”我怎么能把它改成更简单的“print（string）”呢？我不会使用打印函数，因为我将在kodi上使用这个脚本，并且需要提供一个简单的字符串。抱歉我的无知，我还是Python新手。谢谢。打印函数的

（）

之间的所有内容都会生成一个字符串。您可以简单地将其分配给变量。例如：

mystring=“\n”.joi….

import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os

pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent',
               'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 '
               '(KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')

try:
    response = urllib2.urlopen(req)
    pg_source = response.read().decode('utf-8', 'ignore')
    response.close()
except:
    pass

content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', {'id': 'myUL'}).prettify())

text = [l.strip() for l in content.text.split('\n') if l.strip()]
print('\n'.join(' '.join(l) for l in zip(text[::2], text[1::2])))

21:00 - 23:00 NCAAB:    Pepperdine vs Saint Mary's - Channel 03
21:30 - 00:00 AFL: Gold Coast vs. Geelong - Channel 47
22:00 - 00:00 A-League: Western Sydney Wanderers vs Perth Glory - BT Sport 1
22:45 - 03:00 Ski Classic: Mora - Channel 93
23:00 - 00:30 Freestyle Skiing WC: Ski Cross - Channel 106

import requests
from bs4 import BeautifulSoup

html = requests.get('http://rushmore.tv/schedule').text

soup = BeautifulSoup(html,'lxml')

ul = soup.find('ul', { 'id' : 'myUL' })

for content in ul.find_all('li'):
    print(content.text)