Python：我们如何并行化Python程序以利用GPU服务器？_Python_Python 3.x_Gpu_Multi Gpu_Tesla

Python：我们如何并行化Python程序以利用GPU服务器？

python python-3.x

Python：我们如何并行化Python程序以利用GPU服务器？,python,python-3.x,gpu,multi-gpu,tesla,Python,Python 3.x,Gpu,Multi Gpu,Tesla,在我们的实验室中，我们拥有具有以下特征的NVIDIA Tesla K80 GPU加速器计算：Intel（R）Xeon（R）CPU E5-2670 v3@2.30GHz，48个CPU处理器，128GB RAM，12个CPU核，在Linux 64位下运行我正在运行以下代码，在将不同的数据帧集垂直附加到单个系列的随机森林回归器模型中之后，它会执行GridSearchCV。我正在考虑的两个样本数据集位于当我为一个巨大的数据集（大约200万行）运行这个程序时，执行GridSearchCV需要3天以上的

在我们的实验室中，我们拥有具有以下特征的NVIDIA Tesla K80 GPU加速器计算：

Intel（R）Xeon（R）CPU E5-2670 v3@2.30GHz，48个CPU处理器，128GB RAM，12个CPU核，在Linux 64位下运行
我正在运行以下代码，在将不同的数据帧集垂直附加到单个系列的随机森林回归器
模型中之后，它会执行GridSearchCV
。我正在考虑的两个样本数据集位于
当我为一个巨大的数据集（大约200万行）运行这个程序时，执行GridSearchCV
需要3天以上的时间。因此，我想知道Python
线程是否可以使用多个CPU。我们如何使这个（或其他Python
程序）利用多个CPU，以便它在短时间内更快地完成任务？谢谢你的提示
 您可以将concurrent.futures
用于多处理或多线程，还有PyCuda
用于使用GPUOK，谢谢，我现在就来阅读。您是否有一个可重复使用的示例？用于哪种方法？我列出了3个，几乎没有使用PyCuda
的经验，我认为使用多处理
听起来更好。我在这里找到了一个简单的例子：。我们如何将我在问题中发布的所有任务作为一个或两个函数传递？。我试图将我所有的gridSearchCV任务放在一个函数中并调用它，但它给了我一个错误TypeError:“function”对象不可iterable
并发。futures与多处理相同。无论创建什么函数，通常都会将单个参数作为输入，并传入一个iterable作为参数。我没有任何关于机器学习的例子，因为我不在那个领域工作。
import sys
import imp
import glob
import os
import pandas as pd
import math
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
import matplotlib
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score, mean_squared_error, make_scorer
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.cross_validation import train_test_split


df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "cubic*.csv"))), ignore_index=True)
#df = pd.read_csv('cubic31.csv')

for i in range(1,3):
    df['X_t'+str(i)] = df['X'].shift(i)

print(df)

df.dropna(inplace=True)

X = (pd.DataFrame({ 'X_%d'%i : df['X'].shift(i) for i in range(3)}).apply(np.nan_to_num, axis=0).values)

X = df.drop('Y', axis=1)
y = df['Y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

X_train = X_train.drop('time', axis=1)
X_test = X_test.drop('time', axis=1)

#Fit models with some grid search CV=5 (not to low), use the best model
parameters = {'n_estimators': [10,30,100,500,1000]}
clf_rf = RandomForestRegressor(random_state=1)
clf = GridSearchCV(clf_rf, parameters, cv=5, scoring='neg_mean_squared_error')
model = clf.fit(X_train, y_train)
model.cv_results_['params'][model.best_index_]
math.sqrt(model.best_score_*-1)
model.grid_scores_

#####
print()
print(model.grid_scores_)

print(math.sqrt(model.best_score_*-1))

#reg = RandomForestRegressor(criterion='mse')
clf_rf.fit(X_train,y_train)
modelPrediction = clf_rf.predict(X_test)
print(modelPrediction)

print("Number of predictions:",len(modelPrediction))

meanSquaredError=mean_squared_error(y_test, modelPrediction)
print("Mean Square Error (MSE):", meanSquaredError)
rootMeanSquaredError = sqrt(meanSquaredError)
print("Root-Mean-Square Error (RMSE):", rootMeanSquaredError)


####### to add the trendline
fig, ax = plt.subplots()
#df.plot(x='time', y='Y', ax=ax)
ax.plot(df['time'].values, df['Y'].values)


fig, ax = plt.subplots()
index_values=range(0,len(y_test))

y_test.sort_index(inplace=True)
X_test.sort_index(inplace=True)

modelPred_test = clf_rf.predict(X_test)
ax.plot(pd.Series(index_values), y_test.values)


PlotInOne=pd.DataFrame(pd.concat([pd.Series(modelPred_test), pd.Series(y_test.values)], axis=1))

plt.figure(); PlotInOne.plot(); plt.legend(loc='best')