Python并行库比顺序执行耗时更长

Python并行库比顺序执行耗时更长,python,parallel-processing,python-parallel,Python,Parallel Processing,Python Parallel,我试图通过使用python的并行库来利用多处理。然而奇怪的是,我发现顺序执行比并行版本花费的时间更长。下面是我运行以进行比较的代码 import time from joblib import Parallel, delayed def compute_features(summary, article): feature_dict = {} feature_dict["f1"] = summary feature_dict["f2&qu

我试图通过使用python的并行库来利用多处理。然而奇怪的是,我发现顺序执行比并行版本花费的时间更长。下面是我运行以进行比较的代码

import time
from joblib import Parallel, delayed


def compute_features(summary, article):
    feature_dict = {}
    feature_dict["f1"] = summary
    feature_dict["f2"] = article
    return feature_dict


def construct_input(n):
    summaries = []
    articles = []
    for i in range(n):
        summaries.append("summary_" + str(i))
        articles.append("articles_" + str(i))
    return summaries, articles


def sequential_test(n):
    print("Sequential test")
    start_time = time.time()
    summaries, articles = construct_input(n)
    feature_list = []
    for i in range(n):
        feature_list.append(compute_features(summaries[i], articles[i]))
    total_time = time.time() - start_time
    print("Total Time Sequential : %s" % total_time)

    # print(feature_list)


def parallel_test(n):
    print("Parallel test")
    start_time = time.time()
    summaries, articles = construct_input(n)
    feature_list = []
    executor = Parallel(n_jobs=8, backend="multiprocessing", prefer="processes", verbose=True)
    # executor = Parallel(n_jobs=4, prefer="threads")
    tasks = (delayed(compute_features)(summaries[i], articles[i]) for i in range(n))
    results = executor(tasks)

    for result in results:
        feature_list.append(result)

    total_time = time.time() - start_time
    print("Total Time Parallel : %s" % total_time)

    # print(feature_list)


if __name__ == "__main__":
    n = 500000
    sequential_test(n)
    parallel_test(n)
当我运行上面的代码时,我得到以下输出

Sequential test
Total Time Sequential : 1.200118064880371
Parallel test
[Parallel(n_jobs=8)]: Using backend MultiprocessingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  56 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done 49136 tasks      | elapsed:    1.0s
[Parallel(n_jobs=8)]: Done 500000 out of 500000 | elapsed:    4.7s finished
Total Time Parallel : 5.427206039428711
我使用以下配置在mac上运行此代码


你们能帮我理解为什么会这样吗?如果硬件改变,比如说使用GPU,代码会更快吗?感谢您的回复。提前感谢。

启动流程需要一些时间,这可能就是为什么您会有这样的开销,这里是一个开销示例。您的任务太小、太短。因此,任务分配的开销占主导地位。感谢您的回复。我尝试从compute特性中模拟计算密集型方法,它似乎抵消了并行开销。