Python 通过计算从另一个附加到数据帧_Python_Pandas

Python 通过计算从另一个附加到数据帧

python pandas

Python 通过计算从另一个附加到数据帧,python,pandas,Python,Pandas,我有2个csv文件，我读入并从中创建了2个数据帧我将获取第二个数据帧，并使用它进行一些计算，以附加到第一个数据帧。然而，我所附加到第一个数据帧的内容似乎并没有实际发生我需要做些什么来纠正这个错误以下是我正在使用的代码： import pandas as pd m = pd.read_csv('DailyHistoricData.csv', header = None, index_col=0) f = pd.read_csv('ImportFMP.csv', header = None)

我有2个csv文件，我读入并从中创建了2个数据帧

我将获取第二个数据帧，并使用它进行一些计算，以附加到第一个数据帧。然而，我所附加到第一个数据帧的内容似乎并没有实际发生

我需要做些什么来纠正这个错误

以下是我正在使用的代码：

import pandas as pd

m = pd.read_csv('DailyHistoricData.csv', header = None, index_col=0)
f = pd.read_csv('ImportFMP.csv', header = None)

for index in range(len(f)):
    a0 = f.ix[index, 0]
    a1 = f.ix[index, 1]
    a2 = f.ix[index, 7]
    a3 = f.ix[index, 8]-f.ix[index, 9]
    a4 = 100*f.ix[index, 2]
    a5 = f.ix[index, 10]-f.ix[index, 11]
    a6 = f.ix[index, 3]
    a7 = f.ix[index, 4]
    a8 = f.ix[index, 5]
    a9 = f.ix[index, 6]
    m.append([a0, a1, a2, a3, a4, a5, a6, a7, a8, a9])



print m.tail(3)

f的信息是：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 720 entries, 0 to 719
Data columns (total 12 columns):
0     720 non-null int64
1     720 non-null int64
2     720 non-null float64
3     720 non-null int64
4     720 non-null int64
5     720 non-null int64
6     720 non-null int64
7     720 non-null int64
8     720 non-null int64
9     720 non-null int64
10    720 non-null float64
11    720 non-null int64
dtypes: float64(2), int64(10)None

m信息是：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11544 entries, 1 to 11544
Data columns (total 19 columns):
1     11544 non-null int64
2     11544 non-null float64
3     11544 non-null int64
4     11544 non-null float64
5     11544 non-null float64
6     11544 non-null float64
7     11544 non-null float64
8     11544 non-null float64
9     11544 non-null int64
10    11544 non-null float64
11    11544 non-null float64
12    11544 non-null float64
13    11544 non-null int64
14    11544 non-null float64
15    11544 non-null float64
16    11544 non-null float64
17    11544 non-null int64
18    11544 non-null float64
19    11544 non-null int64
dtypes: float64(13), int64(6)None

您应该将其矢量化：

In [68]: a3 = f.iloc[:, 8] - f.iloc[:, 9]

In [69]: a4 = 100 * f.iloc[:, 2]

In [70]: a5 = f.iloc[:, 10] - f.iloc[:, 11]

In [71]: toappend = pd.concat([a3, a4, a5], axis=1).rename(columns=dict(zip(range(3), list('abc'))))

In [72]: toappend.tail()
Out[72]:
        a    b    c
715 -1147  100 -247
716 -1022   89 -200
717  1491  109  328
718   712   87  194
719  -335   97  -84

[5 rows x 3 columns]

In [73]: res = m.append(f.iloc[:, [0, 3, 4, 5, 1, 6, 7]].join(toappend))

In [74]: res.tail()[['a', 'b', 'c']]
Out[74]:
        a    b    c
715 -1147  100 -247
716 -1022   89 -200
717  1491  109  328
718   712   87  194
719  -335   97  -84

[5 rows x 3 columns]

通常，如果您发现自己编写的循环体包含元素算术运算，那么您很可能能够对整个vector/

系列

对象进行操作，并在这方面利用

numpy

的速度。

是否可以改用concat

m=m.concat（[a0，a1，a2，a3，a4，a5，a6，a7，a8，a9]，ignore_index=True）

I尝试使用此属性时出错……AttributeError:'DataFrame'对象没有属性'concat'不能正确操作。另外，您应该通过一次操作感兴趣的列来对其进行矢量化。您当前的算法对于包含10k-100k元素的小帧也会很慢。Philip，您能给我一些建议吗？谢谢您的回答。我在尝试上述操作时遇到此错误：ValueError:列重叠但未指定后缀：Int64Index（[0,1,3,4,5,6,7]，dtype='int64'）第一个是f:Int64Index:720个条目，0到719个数据列（共12列）：0 720非空int64 1 720非空int64 2 720非空浮点64 3 720非空int64 4 720非空int64 5 720非空int64 6 720非空int64 7 720非空int64 8 720非空int64 9 720非空int64 10 720非空浮点64 11 720非空int64数据类型：浮点64（2），int64（10）None和m:Int64Index:11544条目，1到11544个数据列（总共19列）：1 11544非空整数64 2 11544非空浮点64 3 11544非空整数64 4 11544非空浮点64 5 11544非空浮点64 6 11544非空浮点64 7 11544非空浮点64 8 11544非空浮点64 9 11544非空整数64 10 11544非空浮点64 11 11544非空浮点64 12 11544非空浮点64 1311544非空int64 14 11544非空float64 15 11544非空float64 16 11544非空float6418 11544非空float64 19 11544非空int64数据类型：float64（13），int64（6）none该错误消息意味着您需要重命名

三列中的一个或多个列，因为其中一个列的名称位于[0,1,7,3,4,5,6]。
          1       2    3   4     5   6   7   8   9   10  11  12  13  14  15  \
0                                                                            
1  19650302  507.99   70  56  1.77   0   0   0   0   0   0   0   0   0   0   
2  19650303  507.35   46  73  1.07   0   0   0   0   0   0   0   0   0   0   
3  19650304  505.94 -104  96 -0.39   0   0   0   0   0   0   0   0   0   0   
4  19650305  504.76 -200  66  0.14   0   0   0   0   0   0   0   0   0   0   
5  19650308  504.86  160  89  0.90   0   0   0   0   0   0   0   0   0   0   

    16  17  18  19  
0                  
1   0   0   0   1  
2   0   0   0   2  
3   0   0   0   3  
4   0   0   0   4  
5   0   0   0   5  

In [68]: a3 = f.iloc[:, 8] - f.iloc[:, 9]

In [69]: a4 = 100 * f.iloc[:, 2]

In [70]: a5 = f.iloc[:, 10] - f.iloc[:, 11]

In [71]: toappend = pd.concat([a3, a4, a5], axis=1).rename(columns=dict(zip(range(3), list('abc'))))

In [72]: toappend.tail()
Out[72]:
        a    b    c
715 -1147  100 -247
716 -1022   89 -200
717  1491  109  328
718   712   87  194
719  -335   97  -84

[5 rows x 3 columns]

In [73]: res = m.append(f.iloc[:, [0, 3, 4, 5, 1, 6, 7]].join(toappend))

In [74]: res.tail()[['a', 'b', 'c']]
Out[74]:
        a    b    c
715 -1147  100 -247
716 -1022   89 -200
717  1491  109  328
718   712   87  194
719  -335   97  -84

[5 rows x 3 columns]