Python 使用数组重塑数组。重塑(-1,1)
我有一个名为Python 使用数组重塑数组。重塑(-1,1),python,pandas,numpy,scikit-learn,Python,Pandas,Numpy,Scikit Learn,我有一个名为data的数据框架,我试图从中识别任何异常价格 数据帧头看起来像: Date Last Price 0 29/12/2017 487.74 1 28/12/2017 422.85 2 27/12/2017 420.64 3 22/12/2017 492.76 4 21/12/2017 403.95 data = pd.read_csv(path) data = rawData['Last Pri
data
的数据框架,我试图从中识别任何异常价格
数据帧头看起来像:
Date Last Price
0 29/12/2017 487.74
1 28/12/2017 422.85
2 27/12/2017 420.64
3 22/12/2017 492.76
4 21/12/2017 403.95
data = pd.read_csv(path)
data = rawData['Last Price']
data = data['Last Price']
scaler = StandardScaler()
np_scaled = scaler.fit_transform(data)
data = pd.DataFrame(np_scaled)
# train oneclassSVM
outliers_fraction = 0.01
model = OneClassSVM(nu=outliers_fraction, kernel="rbf", gamma=0.01)
model.fit(data)
data['anomaly3'] = pd.Series(model.predict(data))
fig, ax = plt.subplots(figsize=(10,6))
a = data.loc[data['anomaly3'] == -1, ['date_time_int', 'Last Price']] #anomaly
ax.plot(data['date_time_int'], data['Last Price'], color='blue')
ax.scatter(a['date_time_int'],a['Last Price'], color='red')
plt.show();
def getDistanceByPoint(data, model):
distance = pd.Series()
for i in range(0,len(data)):
Xa = np.array(data.loc[i])
Xb = model.cluster_centers_[model.labels_[i]-1]
distance.set_value(i, np.linalg.norm(Xa-Xb))
return distance
我发现了一些代码,我需要对这些代码稍作调整,以便加载数据,然后将timeseries与定标器进行比较。代码如下所示:
Date Last Price
0 29/12/2017 487.74
1 28/12/2017 422.85
2 27/12/2017 420.64
3 22/12/2017 492.76
4 21/12/2017 403.95
data = pd.read_csv(path)
data = rawData['Last Price']
data = data['Last Price']
scaler = StandardScaler()
np_scaled = scaler.fit_transform(data)
data = pd.DataFrame(np_scaled)
# train oneclassSVM
outliers_fraction = 0.01
model = OneClassSVM(nu=outliers_fraction, kernel="rbf", gamma=0.01)
model.fit(data)
data['anomaly3'] = pd.Series(model.predict(data))
fig, ax = plt.subplots(figsize=(10,6))
a = data.loc[data['anomaly3'] == -1, ['date_time_int', 'Last Price']] #anomaly
ax.plot(data['date_time_int'], data['Last Price'], color='blue')
ax.scatter(a['date_time_int'],a['Last Price'], color='red')
plt.show();
def getDistanceByPoint(data, model):
distance = pd.Series()
for i in range(0,len(data)):
Xa = np.array(data.loc[i])
Xb = model.cluster_centers_[model.labels_[i]-1]
distance.set_value(i, np.linalg.norm(Xa-Xb))
return distance
但是,请获取错误消息:
ValueError: Expected 2D array, got 1D array instead:
array=[487.74 422.85 420.64 ... 461.57 444.33 403.84].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
我不确定需要在哪里调整阵列的大小
有关信息,请参阅以下追溯:
File "<ipython-input-23-628125407694>", line 1, in <module>
runfile('C:/Users/stacey/Downloads/techJob.py', wdir='C:/Users/stacey/Downloads')
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/staceyDownloads/techJob.py", line 92, in <module>
main()
File "C:/Users/stacey/Downloads/techJob.py", line 56, in main
np_scaled = scaler.fit_transform(data)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\sklearn\base.py", line 464, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\sklearn\preprocessing\data.py", line 645, in fit
return self.partial_fit(X, y)
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\sklearn\preprocessing\data.py", line 669, in partial_fit
force_all_finite='allow-nan')
File "C:\Anaconda_Python 3.7\2019.03\lib\site-packages\sklearn\utils\validation.py", line 552, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[7687.77 7622.88 7620.68 ... 5261.57 5244.37 5203.89].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
文件“”,第1行,在
运行文件('C:/Users/stacey/Downloads/techJob.py',wdir='C:/Users/stacey/Downloads')
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\spyder\u kernels\customize\spydercurcstomize.py”,第786行,在runfile中
execfile(文件名、命名空间)
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\spyder\u kernels\customize\spydercurcstomize.py”,第110行,在execfile中
exec(编译(f.read(),文件名,'exec'),命名空间)
文件“C:/Users/staceyDownloads/techJob.py”,第92行,在
main()
文件“C:/Users/stacey/Downloads/techJob.py”,第56行,主目录
np_scaled=缩放器。拟合_变换(数据)
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\sklearn\base.py”,第464行,在fit\u转换中
返回self.fit(X,**fit_参数).transform(X)
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\sklearn\preprocessing\data.py”,第645行
返回自我部分拟合(X,y)
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\sklearn\preprocessing\data.py”,第669行,部分匹配
force_all_finite='allow-nan')
文件“C:\Anaconda\u Python 3.7\2019.03\lib\site packages\sklearn\utils\validation.py”,第552行,在check\u数组中
“如果它包含单个样本。”。格式(数组))
ValueError:应为2D数组,而应为1D数组:
数组=[7687.77 7622.88 7620.68…5261.57 5244.37 5203.89]。
使用数组重塑数据。如果数据具有单个特征或数组,则重塑(-1,1)。如果数据包含单个样本,则重塑(1,-1)。
您应该能够通过更改此行来修复错误:
np_scaled = scaler.fit_transform(data)
为此:
np_scaled = scaler.fit_transform(data.values.reshape(-1,1))
Python总是提供一个回溯,显示问题的根源。请把它复制到你的问题中。这是你的问题。您可以在回溯中找到确切的一行:
文件“C:/Users/stacey/Downloads/SIGtechJob.py”,第56行,在主np_scaled=scaler.fit_transform(data)
中,您是否理解sklearn
功能的含义?它对数据输入有一定的约定。研究这些是个好主意;否则,您最终可能会修补代码,一次一个错误,而不了解原因。