Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 对于大型数据,查找一致性的For循环需要花费大量时间。(0.15百万*36k行14小时)_Python_Performance_Python 3.x_For Loop - Fatal编程技术网

Python 对于大型数据,查找一致性的For循环需要花费大量时间。(0.15百万*36k行14小时)

Python 对于大型数据,查找一致性的For循环需要花费大量时间。(0.15百万*36k行14小时),python,performance,python-3.x,for-loop,Python,Performance,Python 3.x,For Loop,我在python3.5中运行这段代码以找到一致性(逻辑回归) 在zeros2(熊猫数据帧)中有0.15mln行,在ones2(熊猫数据帧)中有36k行。两个表都有两个变量 [i]响应程序(0中的响应程序0=0,1中的响应程序1=1) [ii]概率(0中的概率0和1中的概率1) 我的问题是:for循环花了12个小时,在提出此问题时仍在运行。我需要帮助。如何更快地执行此操作。我正在一台8GB RAM的windows 64位计算机上运行此程序。由于两个for循环(0.15 mil*36k),您的代码正

我在python3.5中运行这段代码以找到一致性(逻辑回归)

zeros2(熊猫数据帧)中有0.15mln行,在ones2(熊猫数据帧)中有36k行。两个表都有两个变量

[i]响应程序(0中的响应程序0=0,1中的响应程序1=1)

[ii]概率(0中的概率0和1中的概率1)


我的问题是:for循环花了12个小时,在提出此问题时仍在运行。我需要帮助。如何更快地执行此操作。我正在一台8GB RAM的windows 64位计算机上运行此程序。

由于两个for循环(0.15 mil*36k),您的代码正在进行54亿次计算:

我会这样做:(感谢@Leon帮助我更好地回答这个问题)

或者反过来,像这样:

zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)

for i in zeros2.index:
    cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
    cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
    conc += cur_conc
    ties += cur_ties
    disc += ones2_length - cur_ties - cur_conc

# We could also achieve the above like this too:
# for i in zeros2_list:
#    cur_conc = bisect_left(ones2_list, i)
#    cur_ties = bisect_right(ones2_list, i) - cur_conc
#    conc += cur_conc
#    ties += cur_ties
#    disc += ones2_length - cur_ties - cur_conc

pairs_tested = zeros2_length * ones2_length

concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested

由于两个for循环(0.15 mil*36k),您的代码正在进行54亿次计算:

我会这样做:(感谢@Leon帮助我更好地回答这个问题)

或者反过来,像这样:

zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)

for i in zeros2.index:
    cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
    cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
    conc += cur_conc
    ties += cur_ties
    disc += ones2_length - cur_ties - cur_conc

# We could also achieve the above like this too:
# for i in zeros2_list:
#    cur_conc = bisect_left(ones2_list, i)
#    cur_ties = bisect_right(ones2_list, i) - cur_conc
#    conc += cur_conc
#    ties += cur_ties
#    disc += ones2_length - cur_ties - cur_conc

pairs_tested = zeros2_length * ones2_length

concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
我遵循了Sreyantha Chary的回答,回答很优雅,但在回答的第一部分,一致性百分比和不一致性百分比混淆了


我遵循Sreyantha Chary的回答,回答很优雅,但在回答的第一部分,一致性百分比和不一致性百分比混淆了。

请给出可运行代码(例如添加虚拟数据,这样它就可以实际运行)。请给出可运行代码(例如添加虚拟数据,这样它就可以实际运行).您为什么不利用
zeros2\u list
的“分类性”来计算
光盘
?在C++中,我将使用<代码> STD::LoWiRuxBuffy/Cuth>。code>disc+=bisect\u left(zeros2\u list,ones2.iloc[i,1])将对
ties==0
ties正常工作=0
您还可以按如下方式在O(log(N))时间而不是O(N)时间内计算
ties
ties=bisect_right(zeros2_list,ones2.iloc[i,1])-bisect_left(zeros2_list,ones2.iloc[i,1])
是的+1.如果没有if条件,这将使生活更轻松。您为什么不利用
zeros2\u list
的“分类性”来计算
disc
?在C++中,我将使用<代码> STD::LoWiRuxBuffy/Cuth>。code>disc+=bisect\u left(zeros2\u list,ones2.iloc[i,1])将对
ties==0
ties正常工作=0
您还可以按如下方式在O(log(N))时间而不是O(N)时间内计算
ties
ties=bisect_right(zeros2_list,ones2.iloc[i,1])-bisect_left(zeros2_list,ones2.iloc[i,1])
是的+1.如果没有if条件,生活会更轻松。
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)

for i in zeros2.index:
    cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
    cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
    conc += cur_conc
    ties += cur_ties
    disc += ones2_length - cur_ties - cur_conc

# We could also achieve the above like this too:
# for i in zeros2_list:
#    cur_conc = bisect_left(ones2_list, i)
#    cur_ties = bisect_right(ones2_list, i) - cur_conc
#    conc += cur_conc
#    ties += cur_ties
#    disc += ones2_length - cur_ties - cur_conc

pairs_tested = zeros2_length * ones2_length

concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
Probability = model.predict_proba(data[predictors])
Probability1 = pd.DataFrame(Probability)
Probability1.columns = ['Prob_LoanStatus_0','Prob_LoanStatus_1']
TruthTable = pd.merge(data[[outcome]], Probability1[['Prob_LoanStatus_1']], how='inner', left_index=True, right_index=True)
zeros = TruthTable[(TruthTable['Loan_Status']==0)].reset_index().drop(['index'], axis = 1)
ones = TruthTable[(TruthTable['Loan_Status']==1)].reset_index().drop(['index'], axis = 1)

from bisect import bisect_left, bisect_right

zeros_list = sorted([zeros.iloc[j,1] for j in zeros.index])
zeros_length = len(zeros_list)
disc = 0
ties = 0
conc = 0
for i in ones.index:
    cur_conc = bisect_left(zeros_list, ones.iloc[i,1])
    cur_ties = bisect_right(zeros_list, ones.iloc[i,1]) - cur_conc
    conc += cur_conc
    ties += cur_ties
pairs_tested = zeros_length * len(ones.index)
disc = pairs_tested - conc - ties

print("Pairs = ", pairs_tested)
print("Conc = ", conc)
print("Disc = ", disc)
print("Tied = ", ties)
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested

print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)