Python将qcut应用于多索引数据帧中多索引的0级分组_Python_Pandas_Apply_Multi Index

Python将qcut应用于多索引数据帧中多索引的0级分组

python pandas

Python将qcut应用于多索引数据帧中多索引的0级分组,python,pandas,apply,multi-index,Python,Pandas,Apply,Multi Index,我在pandas（日期和实体id）中有一个多索引数据框，对于每个日期/实体，我有许多变量（a、B…）的观测值。我的目标是创建一个具有相同形状的数据框，但其中的值被其十分位数替换我的测试数据如下所示：我想将qcut应用于按多索引的级别0分组的每个列——我遇到的问题是创建一个结果数据框此代码 def qcut_sub_index(df_with_sub_index): # create empty return value same shape as passed dataframe

我在pandas（日期和实体id）中有一个多索引数据框，对于每个日期/实体，我有许多变量（a、B…）的观测值。我的目标是创建一个具有相同形状的数据框，但其中的值被其十分位数替换

我的测试数据如下所示：

我想将qcut应用于按多索引的级别0分组的每个列——我遇到的问题是创建一个结果数据框

此代码

def qcut_sub_index(df_with_sub_index):
#     create empty return value same shape as passed dataframe
    df_return=pd.DataFrame()
    
    for date, sub_df in df_with_sub_index.groupby(level=0):
            df_return=df_return.append(pd.DataFrame(pd.qcut(sub_df, 10, labels=False, duplicates='drop')))
    print(df_return)
    return df_return
    
print(df_values.apply(lambda x: qcut_sub_index(x), axis=0))

                      A
as_at_date entity_id   
2008-01-27 2928       0
           2932       3
           3083       6
           3333       9
2008-02-27 2928       3
           2935       9
           3333       0
           3874       6
2008-03-27 2928       1
           2932       2
           2934       0
           2936       9
           2937       4
           2939       9
           2940       7
           2943       3
           2944       0
           2945       8
           2946       6
           2947       5
           2949       4
                      B
as_at_date entity_id   
2008-01-27 2928       9
           2932       6
           3083       0
           3333       3
2008-02-27 2928       6
           2935       0
           3333       3
           3874       9
2008-03-27 2928       0
           2932       9
           2934       2
           2936       8
           2937       7
           2939       6
           2940       3
           2943       1
           2944       4
           2945       9
           2946       5
           2947       4
           2949       0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-104-72ff0e6da288> in <module>
     11 
     12 
---> 13 print(df_values.apply(lambda x: qcut_sub_index(x), axis=0))

~\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7546             kwds=kwds,
   7547         )
-> 7548         return op.get_result()
   7549 
   7550     def applymap(self, func) -> "DataFrame":

~\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    178             return self.apply_raw()
    179 
--> 180         return self.apply_standard()
    181 
    182     def apply_empty_result(self):

~\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    272 
    273         # wrap results
--> 274         return self.wrap_results(results, res_index)
    275 
    276     def apply_series_generator(self) -> Tuple[ResType, "Index"]:

~\Anaconda3\lib\site-packages\pandas\core\apply.py in wrap_results(self, results, res_index)
    313         # see if we can infer the results
    314         if len(results) > 0 and 0 in results and is_sequence(results[0]):
--> 315             return self.wrap_results_for_axis(results, res_index)
    316 
    317         # dict of scalars

~\Anaconda3\lib\site-packages\pandas\core\apply.py in wrap_results_for_axis(self, results, res_index)
    369 
    370         try:
--> 371             result = self.obj._constructor(data=results)
    372         except ValueError as err:
    373             if "arrays must all be same length" in str(err):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    466 
    467         elif isinstance(data, dict):
--> 468             mgr = init_dict(data, index, columns, dtype=dtype)
    469         elif isinstance(data, ma.MaskedArray):
    470             import numpy.ma.mrecords as mrecords

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
    281             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    282         ]
--> 283     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    284 
    285 

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     76         # figure out the index, if necessary
     77         if index is None:
---> 78             index = extract_index(arrays)
     79         else:
     80             index = ensure_index(index)

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
    385 
    386         if not indexes and not raw_lengths:
--> 387             raise ValueError("If using all scalar values, you must pass an index")
    388 
    389         if have_series:

ValueError: If using all scalar values, you must pass an index

A
截至日期实体id
2008-01-27 2928       0
2932       3
3083       6
3333       9
2008-02-27 2928       3
2935       9
3333       0
3874       6
2008-03-27 2928       1
2932       2
2934       0
2936       9
2937       4
2939       9
2940       7
2943       3
2944       0
2945       8
2946       6
2947       5
2949       4
B
截至日期实体id
2008-01-27 2928       9
2932       6
3083       0
3333       3
2008-02-27 2928       6
2935       0
3333       3
3874       9
2008-03-27 2928       0
2932       9
2934       2
2936       8
2937       7
2939       6
2940       3
2943       1
2944       4
2945       9
2946       5
2947       4
2949       0
---------------------------------------------------------------------------
ValueError回溯（最近一次调用上次）
在里面
11
12
--->13打印（df_值。应用（λx:qcut_子索引（x），轴=0））
应用中的~\Anaconda3\lib\site packages\pandas\core\frame.py（self、func、axis、raw、result\u type、args、**kwds）
7546科威特第纳尔=科威特第纳尔，
7547         )
->7548返回操作获取结果（）
7549
7550 def applymap（self，func）->“数据帧”：
获取结果（self）中的~\Anaconda3\lib\site packages\pandas\core\apply.py
178返回自我。应用_原始（）
179
-->180返回自我应用标准（）
181
182 def应用\空\结果（自身）：
应用标准中的~\Anaconda3\lib\site packages\pandas\core\apply.py（self）
272
273#总结结果
-->274返回self.wrap\u结果（结果、res\u索引）
275
276 def apply_series_生成器（self）->元组[ResType，“Index”]：
包装结果中的~\Anaconda3\lib\site packages\pandas\core\apply.py（self、results、res\u索引）
看看我们能否推断出结果
314如果len（results）>0且结果中为0且为_序列（results[0]）：
-->315返回自身。为_轴（结果、分辨率索引）换行_结果
316
317标量表
~\Anaconda3\lib\site packages\pandas\core\apply.py in wrap\u results\u for\u axis（self、results、res\u index）
369
370尝试：
-->371结果=self.obj.\u构造函数（数据=结果）
372除ValueError作为错误外：
373如果str中的“数组长度必须相同”（err）：
~\Anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuu init\uuuuuuu（self、data、index、columns、dtype、copy）
466
467 elif isinstance（数据、指令）：
-->468 mgr=init_dict（数据、索引、列、数据类型=dtype）
469 elif isinstance（数据，ma.MaskedArray）：
470导入numpy.ma.mrecords作为mrecords
init_dict中的~\Anaconda3\lib\site packages\pandas\core\internals\construction.py（数据、索引、列、数据类型）
281 arr如果不是，则为数组中的arr的\u datetime64tz\u数据类型（arr）else arr.copy（）
282         ]
-->283将数组\u返回给\u mgr（数组、数据\u名称、索引、列，dtype=dtype）
284
285
~\Anaconda3\lib\site packages\pandas\core\internals\construction.py in arrays\u to\u mgr（数组、arr\u名称、索引、列、数据类型、验证\u完整性）
76#如有必要，找出索引
77如果索引为无：
--->78索引=提取索引（数组）
79.其他：
80指数=确保指数（指数）
提取索引（数据）中的~\Anaconda3\lib\site packages\pandas\core\internals\construction.py
385
386如果不是索引和非原始长度：
-->387 raise VALUERROR（“如果使用所有标量值，则必须传递索引”）
388
389如果有_系列：
ValueError：如果使用所有标量值，则必须传递索引

所以有什么东西阻止了lambda函数的第二次应用

谢谢你的帮助，谢谢你看一看

p、如果这可以在不使用apply的情况下简单地实现，我很乐意听到。谢谢

在原始数据帧上的迭代中使用

concat

可以实现这一点，但是有更聪明的方法吗

谢谢

def qcut_sub_index(df_with_sub_index):
#     create empty return value same shape as passed dataframe
    df_return=pd.DataFrame()
    
    for date, sub_df in df_with_sub_index.groupby(level=0):
        df_return=df_return.append(pd.DataFrame(pd.qcut(sub_df, 10, labels=False, 
                                                        duplicates='drop')))
    return df_return


df_x=pd.DataFrame()
for (columnName, columnData) in df_values.iteritems():  
    df_x=pd.concat([df_x, qcut_sub_index(columnData)], axis=1, join="outer")
df_x

在原始数据帧上的迭代中使用

concat

可以达到这个目的，但是有没有更聪明的方法呢

谢谢

def qcut_sub_index(df_with_sub_index):
#     create empty return value same shape as passed dataframe
    df_return=pd.DataFrame()
    
    for date, sub_df in df_with_sub_index.groupby(level=0):
        df_return=df_return.append(pd.DataFrame(pd.qcut(sub_df, 10, labels=False, 
                                                        duplicates='drop')))
    return df_return


df_x=pd.DataFrame()
for (columnName, columnData) in df_values.iteritems():  
    df_x=pd.concat([df_x, qcut_sub_index(columnData)], axis=1, join="outer")
df_x

您的解决方案似乎过于复杂。您的术语不标准，多索引有级别。由多索引的级别0表示为

qcut（）

（不讨论不属于概念的子帧）

把一切都恢复到一起

对于数据框中的所有列，使用
```
**kwargs
```
方法将参数传递给
```
assign（）
```
```
groupby（level=0）
```
与日期一致
```
transform（）
```
为索引中的每个条目返回一行

df df2

你的解决方案似乎过于复杂。您的术语不标准，多索引有级别。由多索引的级别0表示为

qcut（）

（不讨论不属于概念的子帧）

带来

                             A             B
as_at_date entity_id                        
2020-01-31 2926       0.770121  2.883519e+07
           2943       0.187747  1.167975e+08
           2973       0.371721  3.133071e+07
           3104       0.243347  4.497294e+08
           3253       0.591022  7.796131e+08
           3362       0.810001  6.438441e+08
2020-02-29 3185       0.690875  4.513044e+08
           3304       0.311436  4.561929e+07
2020-03-31 2953       0.325846  7.770111e+08
           2981       0.918461  7.594753e+08
           3034       0.133053  6.767501e+08
           3355       0.624519  6.318104e+07

                      A  B
as_at_date entity_id      
2020-01-31 2926       7  0
           2943       0  3
           2973       3  1
           3104       1  5
           3253       5  9
           3362       9  7
2020-02-29 3185       9  9
           3304       0  0
2020-03-31 2953       3  9
           2981       9  6
           3034       0  3
           3355       6  0