Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/wordpress/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 如何摆脱';ValueError:串联轴的所有输入数组维度必须完全匹配';皮尔逊相关计算期间?_Pandas_Numpy_Scipy_Feature Selection_Pearson Correlation - Fatal编程技术网

Pandas 如何摆脱';ValueError:串联轴的所有输入数组维度必须完全匹配';皮尔逊相关计算期间?

Pandas 如何摆脱';ValueError:串联轴的所有输入数组维度必须完全匹配';皮尔逊相关计算期间?,pandas,numpy,scipy,feature-selection,pearson-correlation,Pandas,Numpy,Scipy,Feature Selection,Pearson Correlation,我试图根据提供的要点计算Pearson相关性。奇怪的是,获取ValueError:串联轴的所有输入数组维度必须精确匹配,但沿维度1,索引0处的数组大小为52,索引1处的数组大小为1error(数据帧有52条记录) 以下是提供的功能: def cor_selector(X, y, num_feats): cor_list = [] feature_name = X.columns.tolist() # calculate the correlation with y for

我试图根据提供的要点计算Pearson相关性。奇怪的是,获取
ValueError:串联轴的所有输入数组维度必须精确匹配,但沿维度1,索引0处的数组大小为52,索引1处的数组大小为1
error(数据帧有52条记录)

以下是提供的功能:

def cor_selector(X, y, num_feats):
    cor_list = []
    feature_name = X.columns.tolist()
    # calculate the correlation with y for each feature
    for i in X.columns.tolist():
        cor = np.corrcoef(X[i], y)[0, 1] # error happens during the 2nd call to here
        cor_list.append(cor)
    # replace NaN with 0
    cor_list = [0 if np.isnan(i) else i for i in cor_list]
    # feature name
    cor_feature = X.iloc[:, np.argsort(np.abs(cor_list))[-num_feats:]].columns.tolist()
    # feature selection? 0 for not select, 1 for select
    cor_support = [True if i in cor_feature else False for i in feature_name]
    return cor_support, cor_feature
这是我的剧本:

df = pd.read_csv(DATA_CSV) # shape: (52, 5)
X = df[['a', 'b', 'c']]
y = df[['d']]
num_feats = 3
cor_support, cor_feature = cor_selector(X, y, num_feats)
print(str(len(cor_feature)), 'selected features')
完整堆栈跟踪:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1438, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/talha/PycharmProjects/covid19/store_data.py", line 275, in <module>
    cor_support, cor_feature = cor_selector(X, y, num_feats)
  File "/Users/talha/PycharmProjects/covid19/store_data.py", line 254, in cor_selector
    cor = np.corrcoef(X[i], y)[0, 1]
  File "<__array_function__ internals>", line 6, in corrcoef
  File "/Users/talha/.local/share/virtualenvs/covid19-g87yyZJK/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2526, in corrcoef
    c = cov(x, y, rowvar)
  File "<__array_function__ internals>", line 6, in cov
  File "/Users/talha/.local/share/virtualenvs/covid19-g87yyZJK/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2390, in cov
    X = np.concatenate((X, y), axis=0)
  File "<__array_function__ internals>", line 6, in concatenate
回溯(最近一次呼叫最后一次):
文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py”,第1438行,在
pydev_imports.execfile(文件、全局、局部)#执行脚本
文件“/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py”,execfile中第18行
exec(编译(内容+“\n”,文件,'exec'),全局,loc)
文件“/Users/talha/PycharmProjects/covid19/store_data.py”,第275行,在
cor_支持,cor_功能=cor_选择器(X,y,num_专长)
文件“/Users/talha/PycharmProjects/covid19/store_data.py”,第254行,在cor_选择器中
cor=np.corrcoef(X[i],y)[0,1]
文件“”,第6行,corrcoef格式
文件“/Users/talha/.local/share/virtualenvs/covid19-g87yyZJK/lib/python3.7/site packages/numpy/lib/function_base.py”,第2526行,对应
c=cov(x,y,rowvar)
文件“”,第6行,cov格式
文件“/Users/talha/.local/share/virtualenvs/covid19-g87yyZJK/lib/python3.7/site packages/numpy/lib/function_base.py”,第2390行,cov
X=np。连接((X,y),轴=0)
文件“”,第6行,串联

似乎您正在将索引0处的序列和索引1处的数据帧传递给
np.corrcoef
。在脚本中,将
y=df[['d']]
更改为
y=df['d']
,它应该可以工作。

第一步是确定哪一行产生了问题,以及调用的输入是什么。你知道我们所说的完全回溯是什么意思吗?发生在cor=np.corrcoef(X[i],y)[0,1]的第二次执行期间。将完整堆栈跟踪添加到OP中。尝试连接((X[i],y),轴=1),并检查
X[i]
y
的形状。