Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/reporting-services/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从多索引数据框中提取数据子集并计算列的差异_Python_Pandas - Fatal编程技术网

Python 从多索引数据框中提取数据子集并计算列的差异

Python 从多索引数据框中提取数据子集并计算列的差异,python,pandas,Python,Pandas,我有一个pandas数据框,其中第一行有多个条目,但第二行有重复列 A B C Date open r close open r close open r close 2000-07-03 19.7 5 17.1 66.26 4 6.22 23.26. 1 9.9 2000-07-05

我有一个pandas数据框,其中第一行有多个条目,但第二行有重复列

               A                    B                 C
Date           open    r    close   open    r  close  open    r   close      
2000-07-03     19.7    5    17.1    66.26   4  6.22   23.26.  1   9.9
2000-07-05     49.8    2    8.3     78.81   6  4.34   39.81   5   5.1
2000-07-15     89.5    3    4.1     43.45   7  2.45   29.3    8   1.2
2000-08-13     74.7    6    7.4     34.26   8  6.4    72.26   9   5.4
2000-08-25     39.84   1    8.4     95.43   3  4.3    69.81.  0   5.2
2000-08-28     61.8    4    4.2     43.81   1  2.2    129.81  6   1.3
2000-09-11     82.79   7    7.4     66.26   1  6.5    72.25   6   5.6
2000-09-16     64.8    8    8.7     73.45   5  4.7    69.45   4   5.4
2000-09-22     58.5    9    3.3     13.81   8  2.9    777.8   8   1.4
我想提取2000年第7个月的数据,从A、B或C中找出哪一个是最低的(开盘-收盘)

我的计划:

s=data.stack(level=0)
values = s[s.index.get_level_values(1)]['open', 'close'].reset_index()
values['Date'] = pd.to_datetime(values['Date'])
start_date = 2000-07-01
end_date = 2000-08-01
mask = (data['date'] > start_date) & (data['date'] <= end_date)
df = data.loc[mask]
df['Val_Diff'] = df['open'] - df['close']
print(df['Val_Diff'].max()) 

为什么多重索引是这段代码的问题?

我认为这是由于堆栈垂直变形时索引中未命名的列造成的。
工艺流程:

  • 展平多个索引的列名
  • 使用宽到长功能从水平转换为垂直
  • 将日期序列转换为“Datetime”格式以进行条件提取
  • 将熊猫作为pd导入
    将numpy作为np导入
    输入io
    导入日期时间
    数据=“”
    日期打开r关闭打开r关闭打开r关闭
    2000-07-03 19.7 5 17.1 66.26 4 6.22 23.26 1 9.9
    2000-07-05 49.8 2 8.3 78.81 6 4.34 39.81 5 5.1
    2000-07-15 89.5 3 4.1 43.45 7 2.45 29.3 8 1.2
    2000-08-13 74.7 6 7.4 34.26 8 6.4 72.26 9 5.4
    2000-08-25 39.84 1 8.4 95.43 3 4.3 69.81 0 5.2
    2000-08-28 61.8 4 4.2 43.81 1 2.2 129.81 6 1.3
    2000-09-11 82.79 7 7.4 66.26 1 6.5 72.25 6 5.6
    2000-09-16 64.8 8 8.7 73.45 5 4.7 69.45 4 5.4
    2000-09-22 58.5 9 3.3 13.81 8 2.9 777.8 8 1.4
    '''
    data=pd.read\u csv(io.StringIO(数据),sep='\s+'))
    idx=pd.MultiIndex.from_数组([[''A','A','A','B','B','B','C','C','C'],['Date','open','r','close','open','r','close','open','r','close','open','r','close']))
    data.columns=idx
    新的_cols=[k[1]+'''+k[0]表示数据中的k。列[1:]
    新列插入(0,“日期”)
    data.columns=新列
    data=pd.wide\u to_long(数据,['open','r','close'],i='Date',j='item',sep='uu',后缀='\\w+')
    data.reset_索引(就地=真)
    数据['Date']=pd.to_datetime(数据['Date'])
    开始日期=datetime.datetime(2000,7,1)
    end_date=datetime.datetime(2000,8,1)
    
    mask=(data.Date>start_Date)和(data.Date我认为这是由堆栈垂直变形时索引中未命名的列引起的。
    工艺流程:

  • 展平多个索引的列名
  • 使用宽到长功能从水平转换为垂直
  • 将日期序列转换为“Datetime”格式以进行条件提取
  • 将熊猫作为pd导入
    将numpy作为np导入
    输入io
    导入日期时间
    数据=“”
    日期打开r关闭打开r关闭打开r关闭
    2000-07-03 19.7 5 17.1 66.26 4 6.22 23.26 1 9.9
    2000-07-05 49.8 2 8.3 78.81 6 4.34 39.81 5 5.1
    2000-07-15 89.5 3 4.1 43.45 7 2.45 29.3 8 1.2
    2000-08-13 74.7 6 7.4 34.26 8 6.4 72.26 9 5.4
    2000-08-25 39.84 1 8.4 95.43 3 4.3 69.81 0 5.2
    2000-08-28 61.8 4 4.2 43.81 1 2.2 129.81 6 1.3
    2000-09-11 82.79 7 7.4 66.26 1 6.5 72.25 6 5.6
    2000-09-16 64.8 8 8.7 73.45 5 4.7 69.45 4 5.4
    2000-09-22 58.5 9 3.3 13.81 8 2.9 777.8 8 1.4
    '''
    data=pd.read\u csv(io.StringIO(数据),sep='\s+'))
    idx=pd.MultiIndex.from_数组([[''A','A','A','B','B','B','C','C','C'],['Date','open','r','close','open','r','close','open','r','close','open','r','close']))
    data.columns=idx
    新的_cols=[k[1]+'''+k[0]表示数据中的k。列[1:]
    新列插入(0,“日期”)
    data.columns=新列
    data=pd.wide\u to_long(数据,['open','r','close'],i='Date',j='item',sep='uu',后缀='\\w+')
    data.reset_索引(就地=真)
    数据['Date']=pd.to_datetime(数据['Date'])
    开始日期=datetime.datetime(2000,7,1)
    end_date=datetime.datetime(2000,8,1)
    掩码=(data.Date>开始日期)和(data.Date)
    
    KeyError: "None of [Index are in the [columns]"
    
    import pandas as pd
    import numpy as np
    import io
    import datetime
    
    data = '''
    Date open r close open r close open r close  
    2000-07-03 19.7 5 17.1 66.26 4 6.22 23.26 1 9.9
    2000-07-05 49.8 2 8.3 78.81 6 4.34 39.81 5 5.1
    2000-07-15 89.5 3 4.1 43.45 7 2.45 29.3 8 1.2
    2000-08-13 74.7 6 7.4 34.26 8 6.4 72.26 9 5.4
    2000-08-25 39.84 1 8.4 95.43 3 4.3 69.81 0 5.2
    2000-08-28 61.8 4 4.2 43.81 1 2.2 129.81 6 1.3
    2000-09-11 82.79 7 7.4 66.26 1 6.5 72.25 6 5.6
    2000-09-16 64.8 8 8.7 73.45 5 4.7 69.45 4 5.4
    2000-09-22 58.5 9 3.3 13.81 8 2.9 777.8 8 1.4
    '''
    
    data = pd.read_csv(io.StringIO(data), sep='\s+')
    idx = pd.MultiIndex.from_arrays([['','A','A','A','B','B','B','C','C','C'], ['Date','open','r','close','open','r','close','open','r','close']])
    data.columns = idx
    new_cols = [k[1]+'_'+k[0] for k in data.columns[1:]]
    new_cols.insert(0, 'Date')
    data.columns = new_cols
    data = pd.wide_to_long(data,['open','r','close'], i='Date', j='item', sep='_', suffix='\\w+')
    data.reset_index(inplace=True)
    data['Date'] = pd.to_datetime(data['Date'])
    start_date = datetime.datetime(2000,7,1)
    end_date = datetime.datetime(2000,8,1)
    mask = (data.Date > start_date) & (data.Date <= end_date)
    data = data.loc[mask]
    data
        Date    item    open    r   close
    0   2000-07-03  A   19.70   5   17.10
    1   2000-07-05  A   49.80   2   8.30
    2   2000-07-15  A   89.50   3   4.10
    9   2000-07-03  B   66.26   4   6.22
    10  2000-07-05  B   78.81   6   4.34
    11  2000-07-15  B   43.45   7   2.45
    18  2000-07-03  C   23.26   1   9.90
    19  2000-07-05  C   39.81   5   5.10
    20  2000-07-15  C   29.30   8   1.20
    
    data['Val_Diff'] = data['open'] - data['close']
    print(data['Val_Diff'].max()) 
    85.4