Python 熊猫:将列添加到末尾的多索引数据帧
有人能帮我在多索引数据框中添加一列吗 我有以下多索引数据帧:Python 熊猫:将列添加到末尾的多索引数据帧,python,pandas,Python,Pandas,有人能帮我在多索引数据框中添加一列吗 我有以下多索引数据帧: price sym i_date MSFT 2017-04-04 100.78 2017-04-05 100.03 2017-04-06 100.76 2017-04-07 100.76 AAPL 2017-04-04 144.77 2017-04
price
sym i_date
MSFT 2017-04-04 100.78
2017-04-05 100.03
2017-04-06 100.76
2017-04-07 100.76
AAPL 2017-04-04 144.77
2017-04-05 144.02
2017-04-06 143.66
2017-04-07 143.66
我想在价格列之后添加一列,这是价格的自然对数:
price ln price
sym i_date
MSFT 2017-04-04 100.78 <ln (100.78)>
2017-04-05 100.03 <ln (100.03)>
2017-04-06 100.76 <ln (100.76)>
2017-04-07 100.76 <ln (100.76)>
AAPL 2017-04-04 144.77 <ln (144.77)>
2017-04-05 144.02 <ln (144.02)>
2017-04-06 143.66 <ln (143.66)>
2017-04-07 143.66 <ln (143.66)>
只需分配lnPrice列,因为它使用Price,所以它将作为分层列添加。对于循环,甚至是.loc
或.ix
调用,都不需要。要演示以下内容,请运行一个pivot_表
,该表可以重现您的多索引/分层列结构:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
# MOCK DATA OF 5 U.S. FREIGHT RAILROADS' NORMAL DISTRIBUTION POSITIVE PRICES FOR 10 DAYS
df = pd.DataFrame({'Company': 10*['UNP', 'BNI', 'CSX', 'NSC', 'KSU'],
'Date': [datetime(2017, 4, 15) - timedelta(days=i)
for i in range(10) for j in range(5)],
'Price': abs(np.random.randn(50))})
print(df.head(15))
# Company Date Price
# 0 UNP 2017-04-15 0.229032
# 1 BNI 2017-04-15 0.706309
# 2 CSX 2017-04-15 0.461901
# 3 NSC 2017-04-15 0.710630
# 4 KSU 2017-04-15 0.059535
# 5 UNP 2017-04-14 1.809960
# 6 BNI 2017-04-14 0.842595
# 7 CSX 2017-04-14 1.068346
# 8 NSC 2017-04-14 0.159422
# 9 KSU 2017-04-14 1.537328
# 10 UNP 2017-04-13 0.043753
# 11 BNI 2017-04-13 0.231418
# 12 CSX 2017-04-13 0.739565
# 13 NSC 2017-04-13 1.917282
# 14 KSU 2017-04-13 0.677055
pvtdf = pd.pivot_table(df, index=['Company', 'Date'], values=['Price'], aggfunc=sum)
print(pvtdf.head(15))
# Price
# Company Date
# BNI 2017-04-06 1.422330
# 2017-04-07 0.871719
# 2017-04-08 0.955532
# 2017-04-09 0.990747
# 2017-04-10 0.944047
# 2017-04-11 0.069089
# 2017-04-12 0.707484
# 2017-04-13 1.368786
# 2017-04-14 0.034902
# 2017-04-15 0.462375
# CSX 2017-04-06 0.676962
# 2017-04-07 1.528759
# 2017-04-08 0.038463
# 2017-04-09 0.387486
# 2017-04-10 0.652780
pvtdf['lnPrice'] = np.log(pvtdf['Price'])
print(pvtdf.head(15))
# Price lnPrice
# Company Date
# BNI 2017-04-06 1.422330 0.352297
# 2017-04-07 0.871719 -0.137288
# 2017-04-08 0.955532 -0.045487
# 2017-04-09 0.990747 -0.009296
# 2017-04-10 0.944047 -0.057579
# 2017-04-11 0.069089 -2.672364
# 2017-04-12 0.707484 -0.346040
# 2017-04-13 1.368786 0.313924
# 2017-04-14 0.034902 -3.355199
# 2017-04-15 0.462375 -0.771380
# CSX 2017-04-06 0.676962 -0.390141
# 2017-04-07 1.528759 0.424456
# 2017-04-08 0.038463 -3.258052
# 2017-04-09 0.387486 -0.948075
# 2017-04-10 0.652780 -0.426515
您可以将该值设置为:
代码:
df['ln price'] = np.log(df['price'])
df = pd.read_fwf(StringIO(u"""
sym i_date price
MSFT 2017-04-04 100.78
MSFT 2017-04-05 100.03
MSFT 2017-04-06 100.76
MSFT 2017-04-07 100.76
AAPL 2017-04-04 144.77
AAPL 2017-04-05 144.02
AAPL 2017-04-06 143.66
AAPL 2017-04-07 143.66"""),
header=1).set_index(['sym', 'i_date'])
df['ln price'] = np.log(df['price'])
print(df)
price ln price
sym i_date
MSFT 2017-04-04 100.78 4.612940
2017-04-05 100.03 4.605470
2017-04-06 100.76 4.612741
2017-04-07 100.76 4.612741
AAPL 2017-04-04 144.77 4.975146
2017-04-05 144.02 4.969952
2017-04-06 143.66 4.967449
2017-04-07 143.66 4.967449
测试代码:
df['ln price'] = np.log(df['price'])
df = pd.read_fwf(StringIO(u"""
sym i_date price
MSFT 2017-04-04 100.78
MSFT 2017-04-05 100.03
MSFT 2017-04-06 100.76
MSFT 2017-04-07 100.76
AAPL 2017-04-04 144.77
AAPL 2017-04-05 144.02
AAPL 2017-04-06 143.66
AAPL 2017-04-07 143.66"""),
header=1).set_index(['sym', 'i_date'])
df['ln price'] = np.log(df['price'])
print(df)
price ln price
sym i_date
MSFT 2017-04-04 100.78 4.612940
2017-04-05 100.03 4.605470
2017-04-06 100.76 4.612741
2017-04-07 100.76 4.612741
AAPL 2017-04-04 144.77 4.975146
2017-04-05 144.02 4.969952
2017-04-06 143.66 4.967449
2017-04-07 143.66 4.967449
结果:
df['ln price'] = np.log(df['price'])
df = pd.read_fwf(StringIO(u"""
sym i_date price
MSFT 2017-04-04 100.78
MSFT 2017-04-05 100.03
MSFT 2017-04-06 100.76
MSFT 2017-04-07 100.76
AAPL 2017-04-04 144.77
AAPL 2017-04-05 144.02
AAPL 2017-04-06 143.66
AAPL 2017-04-07 143.66"""),
header=1).set_index(['sym', 'i_date'])
df['ln price'] = np.log(df['price'])
print(df)
price ln price
sym i_date
MSFT 2017-04-04 100.78 4.612940
2017-04-05 100.03 4.605470
2017-04-06 100.76 4.612741
2017-04-07 100.76 4.612741
AAPL 2017-04-04 144.77 4.975146
2017-04-05 144.02 4.969952
2017-04-06 143.66 4.967449
2017-04-07 143.66 4.967449
谢谢。如果我想添加另一个只需要使用该股票/公司价格的列(比如移动平均线),我该如何添加它?我想出来了。我不得不使用groupby['stk_sym'],而没有多重索引。谢谢。如果我想添加另一个只需要使用该股票/公司价格的列(比如移动平均线),我该如何添加它?我想出来了。我不得不在没有多重索引的情况下使用groupby['stk_sym']。