Python 3.x 为什么这个CPU计算测试在同一台机器上运行Python 3的Windows和Linux（在Windows上使用WSL）上给出不同的结果？_Python 3.x_Windows_Cpu_Windows Subsystem For Linux

Python 3.x 为什么这个CPU计算测试在同一台机器上运行Python 3的Windows和Linux（在Windows上使用WSL）上给出不同的结果？

python-3.x windows

Python 3.x 为什么这个CPU计算测试在同一台机器上运行Python 3的Windows和Linux（在Windows上使用WSL）上给出不同的结果？,python-3.x,windows,cpu,windows-subsystem-for-linux,Python 3.x,Windows,Cpu,Windows Subsystem For Linux,我编写了一个简单的测试来理解Python3中的单线程、多线程和多处理。代码如下： #import libraries from multiprocessing import Pool import time import threading def calculate_sum_upto(n): sum = 0 for i in range(n): sum += i # print("Sum : " + str(sum)) def test_all(l

我编写了一个简单的测试来理解Python3中的单线程、多线程和多处理。代码如下：

#import libraries
from multiprocessing import Pool
import time
import threading

def calculate_sum_upto(n):
    sum = 0
    for i in range(n):
        sum += i
    # print("Sum : " + str(sum))

def test_all(limit):
    print("\nFor sum of series upto : " + str(limit))
    # Define input case, that is an array of numbers
    array_of_numbers = [limit for i in range(8)]

    # Adding time for performace calculation
    start_time_1 = time.time()

    # First, let's try using raw approach
    # print("\nStarting Raw approach...\n")
    for num in array_of_numbers:
        calculate_sum_upto(num)
    # print("result obtained using raw approach : " + str(super_sum_raw))
    # print("\nRaw approach finished.")

    end_time_1 = time.time()

    start_time_2 = time.time()

    # Now trying using parallel processing
    # print("\n\nStarting multiprocessing approach...\n")
    pool = Pool()
    super_sum_optimized_values = pool.map(calculate_sum_upto, array_of_numbers)
    pool.close()
    pool.join()
    # print("result obtained using parallel processing approach : " + str(super_sum_optimized))
    # print("\nParallel Processing approach finished.")

    end_time_2 = time.time()

    start_time_3 = time.time()
    # Trying using general threading approach
    # print("\n\nStarting Threading approach...\n")
    thread_array = [threading.Thread(target=calculate_sum_upto, args=(num,)) for num in array_of_numbers]
    for thread in thread_array:
        thread.start()

    for thread in thread_array:
        thread.join()
    # print("\nThreading approach finished.\n\n")
    end_time_3 = time.time()

    # Printing results
    print("\nRaw approach : {:10.5f}".format(end_time_1 - start_time_1))
    print("Multithreading approach : {:10.5f}".format(end_time_3 - start_time_3))
    print("Multiprocessing approach : {:10.5f}".format(end_time_2 - start_time_2))

if __name__ == "__main__":
    # print("This test bench records time for calculating sum of series upto n terms for 4 numbers using 3 approaches : \n1 : Linear calculation for each number one after the other.\n2 : Calculating sum of series for 4 numbers on 4 different threads.\n3 : Calculating sum of series for 4 numbers on 4 different processes.")
    # print("For simplicity, all 4 numbers have the same value, i.e. sum of series upto n terms for m, 4 times.")
    n = 10000
    # for i in range(5):
    #     test_all(n)
    #     n *= 10
    test_all(10000000)

    print("\n\nEnd of test.")

但是，我尝试通过两种方式运行此测试：

直接从windows 10上的Powershell

在同一台机器上使用WSL上的Ubuntu 18.04终端

然而，我在使用Ubuntu时获得了超过1秒的性能提升。为什么呢？既然是同一台机器，它们不应该是一样的吗

TESTING ON A QUAD CORE CPU
[AMD Ryzen 3 3200G 3.6 Ghz, 4 Core(s), 4 Logical Processor(s)]

Windows :

For sum of series upto : 10000000
Raw approach :    5.08537
Multithreading approach :    5.52041
Multiprocessing approach :    1.40911

Ubuntu Linux using WSL :

For sum of series upto : 10000000
Raw approach :    3.60763
Multithreading approach :    3.70080
Multiprocessing approach :    0.93371

其中一些差异可能是因为linux有不同类型的线程，linux在windoze启动进程时执行fork。相关：，，

确切地知道您在Windows和WSL中测试的Python版本可能很重要。如上所述，它是Python 3。两者的确切版本均为3.7。但这又有什么关系呢？特别是在单线程性能方面？可能线程的某些实现细节发生了变化，但单线程性能应该保持不变，因为计算是简单的数学，没有奇特的函数？如果可能，过去有一种优化，用于将值作为本机CPU数据类型取消绑定的整数求和。IIRC，优化使用的数据类型是C

long

，在Windows中始终为32位，无论操作系统是32位还是64位。但在64位Linux中，它是一个64位值。因此，在Linux中，优化将对

sum+=i

操作中的所有值起作用，但在Windows中，它将不得不使用更慢的变量size

int

类型的求和函数。因此，如果我用乘法函数替换求和函数并减小输入值的大小，也许我应该看到类似的结果？或者对求和的优化也会在后端用于乘法吗？这里的关键词是“过去”是沿着这些路线进行的优化。一个核心开发人员删除了它，但我不记得它是在3.8还是3.7中。我在windows执行和linux执行期间运行过任务管理器。这两次，所有cpu核心的使用都是相似的，因此它们似乎确实创建了4个计算进程。此外，即使Fork和CreateProcess的工作方式不同，也只会影响加载时间。创建流程后，将继续执行相同的操作。因此，这不应影响执行时间。