Python 如何在TSV文件中用制表符替换逗号_Python_Pandas

Python 如何在TSV文件中用制表符替换逗号

python pandas

Python 如何在TSV文件中用制表符替换逗号,python,pandas,Python,Pandas,在下面的数据框中，我试图用制表符和下面的字符串替换曲线类型、到期日、债券、地理时间列中的逗号，这样我就可以从中创建新列 curv_typ,maturity,bonds,geo\time 2015M06D16 2015M06D15 2015M06D11 \ 0 PYC_RT,Y1,GBAAA,EA -0.24 -0.24 -0.24 1 PYC_RT,Y1,GBA_AAA,E

在下面的数据框中，我试图用制表符和下面的字符串替换

曲线类型、到期日、债券、地理时间

列中的逗号，这样我就可以从中创建新列

 curv_typ,maturity,bonds,geo\time  2015M06D16   2015M06D15   2015M06D11   \
0                 PYC_RT,Y1,GBAAA,EA        -0.24        -0.24        -0.24   
1               PYC_RT,Y1,GBA_AAA,EA        -0.02        -0.03        -0.10   
2                PYC_RT,Y10,GBAAA,EA         0.94         0.92         0.99   
3              PYC_RT,Y10,GBA_AAA,EA         1.67         1.70         1.60   
4                PYC_RT,Y11,GBAAA,EA         1.03         1.01         1.09

代码如下所示，但它并没有去掉逗号，这就是我正在努力的地方

import os
import urllib2
import gzip
import StringIO
import pandas as pd

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

outFilePath = filename.split('/')[1][:-3] #As in the code above, just put here for reference
csvout = 'C:\Users\Sidney\ECB.tsv'
outfile = open(csvout, "w")
with open(outFilePath, "rb") as f:
    for line in f.read():
        line.replace(",", "\t")
        outfile.write(line)
outfile.close()

df = pd.DataFrame.from_csv("ECB.tsv", sep="\t", index_col=False)

谢谢

拆分列名以生成新列名，然后使用param

expand=True调用向量化方法

：

In [26]:
cols = 'curv_typ,maturity,bonds,geo\\time'.split(',')
df[cols] = df['curv_typ,maturity,bonds,geo\\time'].str.split(',', expand=True)
df

Out[26]:
  curv_typ,maturity,bonds,geo\time  2015M06D16  2015M06D15  2015M06D11  \
0               PYC_RT,Y1,GBAAA,EA       -0.24       -0.24       -0.24   
1             PYC_RT,Y1,GBA_AAA,EA       -0.02       -0.03       -0.10   
2              PYC_RT,Y10,GBAAA,EA        0.94        0.92        0.99   
3            PYC_RT,Y10,GBA_AAA,EA        1.67        1.70        1.60   
4              PYC_RT,Y11,GBAAA,EA        1.03        1.01        1.09   

  curv_typ maturity    bonds geo\time  
0   PYC_RT       Y1    GBAAA       EA  
1   PYC_RT       Y1  GBA_AAA       EA  
2   PYC_RT      Y10    GBAAA       EA  
3   PYC_RT      Y10  GBA_AAA       EA  
4   PYC_RT      Y11    GBAAA       EA

编辑

对于pandas版本

0.16.0

及更高版本，则需要使用以下行：

df[cols] = df['curv_typ,maturity,bonds,geo\\time'].str.split(',').apply(pd.Series)

我也有同样的问题。从具有相同结构的欧盟统计局下载的数据。我尝试了@EdChum的解决方案，但我无法一蹴而就，因此我需要进一步的步骤：

vc.head() # The original DataFrame
Out[150]: 
  expend,unit,geo\time 2015    2014   2013   2012   2011    2010    2009   \
0       INV,MIO_EUR,AT  109    106.0   86.0  155.0  124.0   130.0   140.0   
1       INV,MIO_EUR,BE  722    664.0  925.0  522.0  590.0   476.0  1018.0   
2       INV,MIO_EUR,BG   16      1.0    2.0   65.0   11.0     5.0     6.0   
3       INV,MIO_EUR,CH  640   1237.0  609.0  662.0  640.0  1555.0   718.0   
4       INV,MIO_EUR,CZ   13     14.0   24.0   17.0  193.0    37.0    61.0   


cols = 'expend,unit,geo\time'.split(',') # Getting the columnns

clean = vc.iloc[:,0].str.split(',').apply(pd.Series) # Creating a clean version
clean = clean.rename(columns = lambda x: cols[x]) # Adding the column names to the clean version

vc = pd.concat([clean, vc.iloc[:,1:]], axis = 1) # Concatenating the two tables

vc.head()
Out[155]: 
  expend     unit geo\time 2015    2014   2013   2012   2011    2010    2009   \
0    INV  MIO_EUR       AT  109    106.0   86.0  155.0  124.0   130.0   140.0   
1    INV  MIO_EUR       BE  722    664.0  925.0  522.0  590.0   476.0  1018.0   
2    INV  MIO_EUR       BG   16      1.0    2.0   65.0   11.0     5.0     6.0   
3    INV  MIO_EUR       CH  640   1237.0  609.0  662.0  640.0  1555.0   718.0   
4    INV  MIO_EUR       CZ   13     14.0   24.0   17.0  193.0    37.0    61.0

检查此解决方案，谢谢。然而，当我在上面的代码后面添加

cols='curv\u-typ，maturity，bonds，geo\\time'，split（'，'）df[cols]=df[cols].str.split（'，'）

时，我似乎得到了一个错误。你知道这可能是为什么吗？很抱歉，在加载df后必须拆分和添加新列，在哪一点上会出现错误？错误出现在以下行中：

df[cols]=df['curv_-typ，maturity，bonds，geo\\time'].str.split（'，'）

Try

df[cols]=df[df.columns[0]].str.split（'，'））

您可以尝试升级您的pandas版本吗