python'；s.替换空行_Python_String_Scrapy

python'；s.替换空行

python string scrapy

python'；s.替换空行,python,string,scrapy,Python,String,Scrapy,我正在使用scrapy从网站中提取数据。纯正版本如下所示： {eps: 25} {eps:[]} {eps:[]} {eps:[]} {eps: 50} {eps:[]} {eps:[]} {eps:[]} 25 50 # Code comment to show extra spaces. MSFT A H 现在我不知道为什么空白的东西会出现，但是我能把它们清除掉。问题是当我使用.replace时，结果如下： {eps: 25} {eps:[]} {eps:[]} {eps:

我正在使用scrapy从网站中提取数据。纯正版本如下所示：

{eps: 25}
{eps:[]}
{eps:[]}
{eps:[]}
{eps: 50}
{eps:[]}
{eps:[]}
{eps:[]}

25



50



# Code comment to show extra spaces.

MSFT
A
H

现在我不知道为什么空白的东西会出现，但是我能把它们清除掉。问题是当我使用

.replace

时，结果如下：

{eps: 25}
{eps:[]}
{eps:[]}
{eps:[]}
{eps: 50}
{eps:[]}
{eps:[]}
{eps:[]}

25



50



# Code comment to show extra spaces.

MSFT
A
H

我试过

.split

，

.sub

，

.strip

，但都没用。我不知道还能尝试什么

更新：

添加源代码

# coding: utf-8
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter
import re
import csv
import urlparse
from stockscrape.items import EPSItem

class epsScrape(BaseSpider):
        name = "eps"
        allowed_domains = ["investors.com"]
        ifile = open('test.txt', "r")
        reader = csv.reader(ifile)
        start_urls = []
        for row in ifile:
                url = row.replace("\n","")
                if url == "symbol":
                        continue
                else:
                        start_urls.append("http://research.investors.com/quotes/nyse-" + url + ".htm")
        ifile.close()

        def parse(self, response):
                f = open("eps.txt", "a+")
                sel = HtmlXPathSelector(response)
                sites = sel.select("//tbody/tr")
                items = []
                for site in sites:
                        item = EPSItem()
                        item['eps']  = site.select("td[contains(@class, 'rating')]/span/text()").extract()
                        strItem = str(item)
                        newItem = strItem.replace(" ","").replace("'","").replace("{eps:[","").replace("]}","").replace("u","").replace("\\r\\n",'').replace('$
                        f.write("%s\n" % newItem)
                f.close()

text.txt中有一个库存符号，如下所示：

{eps: 25}
{eps:[]}
{eps:[]}
{eps:[]}
{eps: 50}
{eps:[]}
{eps:[]}
{eps:[]}

25



50



# Code comment to show extra spaces.

MSFT
A
H

等等

空行包含换行符；也更换

\n

如果您发现最终要删除所有新行，则在新行上拆分并删除所有空字符串值：

outputstring = '\n'.join([line for line in inputstring.splitlines() if line.strip()])

这将删除所有空行，将剩余的非空行与新换行重新连接

相反，如果您是通过打印或写入文件逐行生成输出，请不要在该行为空时打印或写入：

newItem = newItem.replace(.., ..)
if newItem.strip():
    print newItem
    f.write('{}\n'.format(newItem))

if

语句测试的行不只是包含空格。

空行包含换行符；也更换

\n

如果您发现最终要删除所有新行，则在新行上拆分并删除所有空字符串值：

outputstring = '\n'.join([line for line in inputstring.splitlines() if line.strip()])

这将删除所有空行，将剩余的非空行与新换行重新连接

相反，如果您是通过打印或写入文件逐行生成输出，请不要在该行为空时打印或写入：

newItem = newItem.replace(.., ..)
if newItem.strip():
    print newItem
    f.write('{}\n'.format(newItem))

if

语句测试的行不仅仅包含空格。

除非我有误解，否则，.replace（'\n'，''）会nothing@Resin：您是如何产生输出的？作为一个大字符串，还是逐行打印？@Resin:如果没有看到您的确切代码，恐怕我不能多说了。我正在逐行打印。如果我把它写到这样一个文件中，f.write（newItem）结果是2550。。。。等等。如果我执行f.write（“%s\n”%newItem），它将执行上述操作。我会清理我所有的东西并发布。@Resin:然后你可以使用

if newItem.strip（）：

测试。除非我有误解，.replace（'\n'，''）会nothing@Resin：您是如何产生输出的？作为一个大字符串，还是逐行打印？@Resin:如果没有看到您的确切代码，恐怕我不能多说了。我正在逐行打印。如果我把它写到这样一个文件中，f.write（newItem）结果是2550。。。。等等。如果我执行f.write（“%s\n”%newItem），它将执行上述操作。我会把我的东西清理干净，然后把它贴出来。@Resin:然后你可以使用

if newItem.strip（）：

测试。