Python 将选定列与函数一起使用以创建矩阵_Python_Pandas_Numpy_Array Broadcasting_Numpy Ndarray

Python 将选定列与函数一起使用以创建矩阵

python pandas numpy

Python 将选定列与函数一起使用以创建矩阵,python,pandas,numpy,array-broadcasting,numpy-ndarray,Python,Pandas,Numpy,Array Broadcasting,Numpy Ndarray,我正在尝试创建一个函数结果矩阵，它涉及数据帧列的交叉表。该函数依次对一对数据帧列进行操作，因此最终结果是应用于每对数据帧列的结果矩阵。要操作pd.crosstab的列的列索引位于列表cols\u index中。这是我的密码： cols_index # list of dataframe column indices. All fine. res_matrix = np.zeros([len(cols_index),len(cols_index)]) # square matrix of ze

我正在尝试创建一个函数结果矩阵，它涉及数据帧列的交叉表。该函数依次对一对数据帧列进行操作，因此最终结果是应用于每对数据帧列的结果矩阵。要操作

pd.crosstab

的列的列索引位于列表

cols\u index

中。这是我的密码：

cols_index # list of dataframe column indices. All fine. 

res_matrix = np.zeros([len(cols_index),len(cols_index)]) # square matrix of zeros, each dimension is the length of the number of columns

for i in cols_index:
    for j in cols_index:
        confusion_matrix = pd.crosstab(df.columns.get_values()[i], df.columns.get_values()[j]) # df.columns.get_values()[location]
        result = my_function(confusion_matrix) # a scalar
        res_matrix[i, j] = result
return res_matrix

但是，我得到以下错误：

ValueError：如果使用所有标量值，则必须传递索引
my_函数没有问题，因为如果我在数据帧的两列上运行my_函数
，则没有问题：
confusion_matrix = pd.crosstab(df['colA'], df['colB'])
result = my_function(confusion_matrix) # returns 0.29999 which is fine

我已经尝试了各种方法来解决这个问题，包括查看以下帖子：

但在这种情况下，我看不出如何通过熊猫栏目进行广播
欢迎提出任何想法，谢谢。
代码中的一些问题-
i
和j
应为数字，因为您将其用作索引
您需要为交叉表提供pandas.Series
，您提供的是字符串（即使i和j的值正确）

请参阅以下代码中的更改-
def fun():
cols_index # list of dataframe column indices. All fine. 
res_matrix = np.zeros([len(cols_index),len(cols_index)]) # square matrix of zeros, each dimension is the length of the number of columns
for i in range(len(cols_index)):
    for j in range(i+1,len(cols_index)):
        confusion_matrix = pd.crosstab(df[df.columns[cols_index[i]]], df[df.columns[cols_index[j]]]) # df.columns.get_values()[location]
        result = my_function(confusion_matrix) # a scalar
        res_matrix[i, j] = result
return res_matrix

我已经根据OPs注释修改了代码，col_index是列的索引列表。此外，我假设my_函数
是可交换的，因此我只填充顶部对角矩阵。这将节省计算时间，并且不会产生i==j
的问题，谢谢@Aritesh的帮助。范围内的i（len（cols_index））
的问题是，它从零开始i，而cols_index列表是从数据帧中选择的列，例如[10,17,23,24,26,52,56]。因此，我认为我确实需要作为cols_索引中的I，因为我需要我是[10,17,23,24,26,52,56]，而不是[0,1,2,3,4,5,6]，当我调用交叉表时，它将返回错误的数据帧列。需要明确的是，cols_索引是一个整数列表。我的下一个问题是pd.crosstab
似乎不喜欢在同一列上被调用：conflusion\u matrix=pd.crosstab（df[df.columns[i]]，df[df.columns[j]
如果i==j@LucieCBurgess，然后，如果（I！=j），我将添加一个条件语句。此外，如果您的函数是可交换的（即，您的结果不随操作数的顺序而改变），则只在j>i时运行它