Dask XGBoost无法扩展/性能非常差_Dask_Xgboost

Dask XGBoost无法扩展/性能非常差

dask

Dask XGBoost无法扩展/性能非常差,dask,xgboost,Dask,Xgboost,在具有32个内核的单个节点上，本机XGBoost可能需要30秒才能运行，Dask XGBoost的速度大约慢（工作进程数）倍。使用Dask匹配本机XGBoost性能的唯一方法是使用一个工作线程和该工作线程的32个线程创建集群我的问题是：为什么？8个工作线程（每个线程有4个）是否应该执行类似的操作，而不是慢8倍本地XGBoost dtrain = xgb.DMatrix(X_train, y_train) dtest = xgb.DMatrix(X_test, y_test) model =

在具有32个内核的单个节点上，本机XGBoost可能需要30秒才能运行，Dask XGBoost的速度大约慢（工作进程数）倍。使用Dask匹配本机XGBoost性能的唯一方法是使用一个工作线程和该工作线程的32个线程创建集群

我的问题是：为什么？8个工作线程（每个线程有4个）是否应该执行类似的操作，而不是慢8倍

本地XGBoost

dtrain = xgb.DMatrix(X_train, y_train)
dtest = xgb.DMatrix(X_test, y_test)

model = xgb.XGBClassifier(
    objective='multi:softmax',
    tree_method='hist',
    booster='gbtree',
    n_estimators=5,
    max_depth=5
)

model.fit(
    X_train,
    y_train
)

Dask XGBoost

cluster = LocalCluster(
    n_workers=1,
    threads_per_worker=32
)
client = Client(cluster)


dtrain = xgb.dask.DaskDMatrix(client, X_train, y_train)
dtest = xgb.dask.DaskDMatrix(client, X_test, y_test)

model = xgb.XGBClassifier(
    objective='multi:softmax',
    tree_method='hist',
    booster='gbtree',
    n_estimators=5,
    max_depth=5
)

model.client = client

model.fit(
    X_train,
    y_train
)

我运行的另一个测试是在Dask上使用1个工作线程对2个工作线程（每个工作线程数相同）训练XGBoost。我发现一个工人只花了28秒来运行上面的测试，而两个工人只花了141秒。