Numpy 清洁熊猫应用了无法使用熊猫系列和非唯一索引的功能_Numpy_Pandas

Numpy 清洁熊猫应用了无法使用熊猫系列和非唯一索引的功能

numpy pandas

Numpy 清洁熊猫应用了无法使用熊猫系列和非唯一索引的功能,numpy,pandas,Numpy,Pandas,在下文中，func表示一个使用多列（跨组耦合）且不能直接在pandas.Series上操作的函数。0*d['x']语法是我所能想到的最轻的强制转换语法，但我认为这很尴尬此外，生成的pandas.Series（s）仍然包含组索引，在作为列添加到pandas.DataFrame之前，必须删除该索引。s.reset\u索引（…）索引操作似乎很脆弱，而且容易出错，所以我很好奇它是否可以避免。有这样做的成语吗 import pandas import numpy df = pandas.DataFra

在下文中，

func

表示一个使用多列（跨组耦合）且不能直接在

pandas.Series

上操作的函数。

0*d['x']

语法是我所能想到的最轻的强制转换语法，但我认为这很尴尬

此外，生成的

pandas.Series

（

）仍然包含组索引，在作为列添加到

pandas.DataFrame

之前，必须删除该索引。

s.reset\u索引（…）

索引操作似乎很脆弱，而且容易出错，所以我很好奇它是否可以避免。有这样做的成语吗

import pandas
import numpy

df = pandas.DataFrame(dict(i=[1]*8,j=[1]*4+[2]*4,x=list(range(4))*2))
df['y'] = numpy.sin(df['x']) + 1000*df['j']
df = df.set_index(['i','j'])
print('# df\n', df)

def func(d):
    x = numpy.array(d['x'])
    y = numpy.array(d['y'])
    # I want to do math with x,y that cannot be applied to
    # pandas.Series, so explicitly convert to numpy arrays.
    #
    # We have to return an appropriately-indexed pandas.Series
    # in order for it to be admissible as a column in the
    # pandas.DataFrame.  Instead of simply "return x + y", we
    # have to make the conversion.
    return 0*d['x'] + x + y

s = df.groupby(df.index).apply(func)

# The Series is still adorned with the (unnamed) group index,
# which will prevent adding as a column of df due to
# Exception: cannot handle a non-unique multi-index!
s = s.reset_index(level=0, drop=True)
print('# s\n', s)

df['z'] = s
print('# df\n', df)

而不是

0*d['x'] + x + y

你可以用

pd.Series(x+y, index=d.index)

使用

groupy时，应用，而不是使用以下方法删除组键索引：
s = df.groupby(df.index).apply(func)
s = s.reset_index(level=0, drop=True)
df['z'] = s

您可以使用以下命令告诉groupby
放下按键：

屈服
     x            y            z
i j                             
1 1  0  1000.000000  1000.000000
  1  1  1000.841471  1001.841471
  1  2  1000.909297  1002.909297
  1  3  1000.141120  1003.141120
  2  0  2000.000000  2000.000000
  2  1  2000.841471  2001.841471
  2  2  2000.909297  2002.909297
  2  3  2000.141120  2003.141120

您不需要转换数组，它们已经存在于类成员的数据框中。values
：x=d['x']。values
等，谢谢。这将保存一个副本，但不会更改使用结果的语义。我认为通过显式引用numpy.array可以更容易地访问该示例。
import pandas as pd
import numpy as np

df = pd.DataFrame(dict(i=[1]*8,j=[1]*4+[2]*4,x=list(range(4))*2))
df['y'] = np.sin(df['x']) + 1000*df['j']
df = df.set_index(['i','j'])

def func(d):
    x = np.array(d['x'])
    y = np.array(d['y'])
    return pd.Series(x+y, index=d.index)

df['z'] = df.groupby(df.index, group_keys=False).apply(func)
print(df)

     x            y            z
i j                             
1 1  0  1000.000000  1000.000000
  1  1  1000.841471  1001.841471
  1  2  1000.909297  1002.909297
  1  3  1000.141120  1003.141120
  2  0  2000.000000  2000.000000
  2  1  2000.841471  2001.841471
  2  2  2000.909297  2002.909297
  2  3  2000.141120  2003.141120