Python 多维数据帧
我刚刚开始学习机器学习和Scikit。我一直在看一个教程,其中一个人使用Quandl获取谷歌股价数据。据我所知,Quandl.get返回一个数据帧。这个数据框架让我感到困惑的是,一段代码正在数据框架的第二维度添加列,而在另一行,导师正在使用数据框架的第一维度访问同一列。这怎么可能?这个数据帧是怎么回事Python 多维数据帧,python,list,pandas,scikit-learn,Python,List,Pandas,Scikit Learn,我刚刚开始学习机器学习和Scikit。我一直在看一个教程,其中一个人使用Quandl获取谷歌股价数据。据我所知,Quandl.get返回一个数据帧。这个数据框架让我感到困惑的是,一段代码正在数据框架的第二维度添加列,而在另一行,导师正在使用数据框架的第一维度访问同一列。这怎么可能?这个数据帧是怎么回事 df = quandl.get('WIKI/GOOGL') df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volu
df = quandl.get('WIKI/GOOGL')
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]
df['HCL_PCT'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] # how is df['Adj. Open'] working?? Wasn't 'Adj. Open' added in the second dimension of the dataframe in the second line of the code above??
我的目标是在深入研究Tensorflow之前学习Tensorflow,并对机器学习俚语和概念有一点了解。我添加了df.head()
来编写显示数据的输出:
#read data
df = quandl.get('WIKI/GOOGL')
print (df.head())
Open High Low Close Volume Ex-Dividend \
Date
2004-08-19 100.01 104.06 95.96 100.335 44659000.0 0.0
2004-08-20 101.01 109.08 100.50 108.310 22834300.0 0.0
2004-08-23 110.76 113.48 109.05 109.400 18256100.0 0.0
2004-08-24 111.24 111.60 103.57 104.870 15247300.0 0.0
2004-08-25 104.76 108.00 103.88 106.000 9188600.0 0.0
Split Ratio Adj. Open Adj. High Adj. Low Adj. Close \
Date
2004-08-19 1.0 50.159839 52.191109 48.128568 50.322842
2004-08-20 1.0 50.661387 54.708881 50.405597 54.322689
2004-08-23 1.0 55.551482 56.915693 54.693835 54.869377
2004-08-24 1.0 55.792225 55.972783 51.945350 52.597363
2004-08-25 1.0 52.542193 54.167209 52.100830 53.164113
Adj. Volume
Date
2004-08-19 44659000.0
2004-08-20 22834300.0
2004-08-23 18256100.0
2004-08-24 15247300.0
2004-08-25 9188600.0
选择列
Adj。关闭
:
print (df['Adj. Close'])
Date
2004-08-19 50.322842
2004-08-20 54.322689
2004-08-23 54.869377
2004-08-24 52.597363
2004-08-25 53.164113
2004-08-26 54.122070
2004-08-27 53.239345
2004-08-30 51.162935
2004-08-31 51.343492
2004-09-01 50.280210
2004-09-02 50.912161
2004-09-03 50.159839
2004-09-07 50.947269
2004-09-08 51.308384
2004-09-09 51.313400
2004-09-10 52.828075
2004-09-13 53.916435
2004-09-14 55.917612
2004-09-15 56.173402
2004-09-16 57.161452
2004-09-17 58.926902
2004-09-20 59.864797
2004-09-21 59.102444
2004-09-22 59.373280
2004-09-23 60.597057
2004-09-24 60.100525
2004-09-27 59.313094
2004-09-28 63.626409
2004-09-29 65.742942
2004-09-30 65.000651
2017-04-13 840.180000
2017-04-17 855.130000
2017-04-18 853.990000
2017-04-19 856.510000
2017-04-20 860.080000
2017-04-21 858.950000
2017-04-24 878.930000
2017-04-25 888.840000
2017-04-26 889.140000
2017-04-27 891.440000
2017-04-28 924.520000
2017-05-01 932.820000
2017-05-02 937.090000
2017-05-03 948.450000
2017-05-04 954.720000
2017-05-05 950.280000
2017-05-08 958.690000
2017-05-09 956.710000
2017-05-10 954.840000
2017-05-11 955.890000
2017-05-12 955.140000
2017-05-15 959.220000
2017-05-16 964.610000
2017-05-17 942.170000
2017-05-18 950.500000
2017-05-19 954.650000
2017-05-22 964.070000
2017-05-23 970.550000
2017-05-24 977.610000
2017-05-25 991.860000
Name: Adj. Close, Length: 3215, dtype: float64
编辑:
我添加了df.head()
来写入显示数据的输出:
#read data
df = quandl.get('WIKI/GOOGL')
print (df.head())
Open High Low Close Volume Ex-Dividend \
Date
2004-08-19 100.01 104.06 95.96 100.335 44659000.0 0.0
2004-08-20 101.01 109.08 100.50 108.310 22834300.0 0.0
2004-08-23 110.76 113.48 109.05 109.400 18256100.0 0.0
2004-08-24 111.24 111.60 103.57 104.870 15247300.0 0.0
2004-08-25 104.76 108.00 103.88 106.000 9188600.0 0.0
Split Ratio Adj. Open Adj. High Adj. Low Adj. Close \
Date
2004-08-19 1.0 50.159839 52.191109 48.128568 50.322842
2004-08-20 1.0 50.661387 54.708881 50.405597 54.322689
2004-08-23 1.0 55.551482 56.915693 54.693835 54.869377
2004-08-24 1.0 55.792225 55.972783 51.945350 52.597363
2004-08-25 1.0 52.542193 54.167209 52.100830 53.164113
Adj. Volume
Date
2004-08-19 44659000.0
2004-08-20 22834300.0
2004-08-23 18256100.0
2004-08-24 15247300.0
2004-08-25 9188600.0
选择列
Adj。关闭
:
print (df['Adj. Close'])
Date
2004-08-19 50.322842
2004-08-20 54.322689
2004-08-23 54.869377
2004-08-24 52.597363
2004-08-25 53.164113
2004-08-26 54.122070
2004-08-27 53.239345
2004-08-30 51.162935
2004-08-31 51.343492
2004-09-01 50.280210
2004-09-02 50.912161
2004-09-03 50.159839
2004-09-07 50.947269
2004-09-08 51.308384
2004-09-09 51.313400
2004-09-10 52.828075
2004-09-13 53.916435
2004-09-14 55.917612
2004-09-15 56.173402
2004-09-16 57.161452
2004-09-17 58.926902
2004-09-20 59.864797
2004-09-21 59.102444
2004-09-22 59.373280
2004-09-23 60.597057
2004-09-24 60.100525
2004-09-27 59.313094
2004-09-28 63.626409
2004-09-29 65.742942
2004-09-30 65.000651
2017-04-13 840.180000
2017-04-17 855.130000
2017-04-18 853.990000
2017-04-19 856.510000
2017-04-20 860.080000
2017-04-21 858.950000
2017-04-24 878.930000
2017-04-25 888.840000
2017-04-26 889.140000
2017-04-27 891.440000
2017-04-28 924.520000
2017-05-01 932.820000
2017-05-02 937.090000
2017-05-03 948.450000
2017-05-04 954.720000
2017-05-05 950.280000
2017-05-08 958.690000
2017-05-09 956.710000
2017-05-10 954.840000
2017-05-11 955.890000
2017-05-12 955.140000
2017-05-15 959.220000
2017-05-16 964.610000
2017-05-17 942.170000
2017-05-18 950.500000
2017-05-19 954.650000
2017-05-22 964.070000
2017-05-23 970.550000
2017-05-24 977.610000
2017-05-25 991.860000
Name: Adj. Close, Length: 3215, dtype: float64
编辑:
索引:索引或类似数组的
在数据帧结构中,使用索引获取列、使用数组或若干队列,相当于df[:,[](所有选定元素、列元素切片访问)索引:索引或类似数组
在Dataframe结构中,使用索引获取列、使用数组或大量队列(相当于df[:,[])(所有选定元素、列元素切片访问)不应使用df['Adj.Open']返回垃圾值或完全抛出错误,因为它是在代码第二行的Pandas DataFrame列表的第二维度中声明的?否,它仅选择data-return
Series
(列)。我可以多选。第二列只选择所有可能的列并分配回DF,第三列只分别选择3列进行减法和除法。我不明白<代码>df[['Adj.Open','Adj.High','Adj.Low','Adj.Close','Adj.Volume']正在声明Adj。在数据框的第二维度中打开
。那么我们怎样才能访问Adj。打开就像打开一样。它没有意义df['Adj.Open']不返回垃圾值或完全抛出错误,因为它是在代码第二行的Pandas DataFrame列表的第二维度中声明的?不,它只选择data-returnSeries
(列)。我可以多选。第二列只选择所有可能的列并分配回DF,第三列只分别选择3列进行减法和除法。我不明白<代码>df[['Adj.Open','Adj.High','Adj.Low','Adj.Close','Adj.Volume']正在声明Adj。在数据框的第二维度中打开
。那么我们怎样才能访问Adj。打开就像打开一样。这毫无意义