Python 对于大型数据,查找一致性的For循环需要花费大量时间。(0.15百万*36k行14小时)
我在python3.5中运行这段代码以找到一致性(逻辑回归) 在zeros2(熊猫数据帧)中有0.15mln行,在ones2(熊猫数据帧)中有36k行。两个表都有两个变量 [i]响应程序(0中的响应程序0=0,1中的响应程序1=1) [ii]概率(0中的概率0和1中的概率1)Python 对于大型数据,查找一致性的For循环需要花费大量时间。(0.15百万*36k行14小时),python,performance,python-3.x,for-loop,Python,Performance,Python 3.x,For Loop,我在python3.5中运行这段代码以找到一致性(逻辑回归) 在zeros2(熊猫数据帧)中有0.15mln行,在ones2(熊猫数据帧)中有36k行。两个表都有两个变量 [i]响应程序(0中的响应程序0=0,1中的响应程序1=1) [ii]概率(0中的概率0和1中的概率1) 我的问题是:for循环花了12个小时,在提出此问题时仍在运行。我需要帮助。如何更快地执行此操作。我正在一台8GB RAM的windows 64位计算机上运行此程序。由于两个for循环(0.15 mil*36k),您的代码正
我的问题是:for循环花了12个小时,在提出此问题时仍在运行。我需要帮助。如何更快地执行此操作。我正在一台8GB RAM的windows 64位计算机上运行此程序。由于两个for循环(0.15 mil*36k),您的代码正在进行54亿次计算: 我会这样做:(感谢@Leon帮助我更好地回答这个问题) 或者反过来,像这样:
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)
for i in zeros2.index:
cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
disc += ones2_length - cur_ties - cur_conc
# We could also achieve the above like this too:
# for i in zeros2_list:
# cur_conc = bisect_left(ones2_list, i)
# cur_ties = bisect_right(ones2_list, i) - cur_conc
# conc += cur_conc
# ties += cur_ties
# disc += ones2_length - cur_ties - cur_conc
pairs_tested = zeros2_length * ones2_length
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
由于两个for循环(0.15 mil*36k),您的代码正在进行54亿次计算: 我会这样做:(感谢@Leon帮助我更好地回答这个问题) 或者反过来,像这样:
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)
for i in zeros2.index:
cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
disc += ones2_length - cur_ties - cur_conc
# We could also achieve the above like this too:
# for i in zeros2_list:
# cur_conc = bisect_left(ones2_list, i)
# cur_ties = bisect_right(ones2_list, i) - cur_conc
# conc += cur_conc
# ties += cur_ties
# disc += ones2_length - cur_ties - cur_conc
pairs_tested = zeros2_length * ones2_length
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
我遵循了Sreyantha Chary的回答,回答很优雅,但在回答的第一部分,一致性百分比和不一致性百分比混淆了
我遵循Sreyantha Chary的回答,回答很优雅,但在回答的第一部分,一致性百分比和不一致性百分比混淆了。请给出可运行代码(例如添加虚拟数据,这样它就可以实际运行)。请给出可运行代码(例如添加虚拟数据,这样它就可以实际运行).您为什么不利用
zeros2\u list
的“分类性”来计算光盘
?在C++中,我将使用<代码> STD::LoWiRuxBuffy/Cuth>。code>disc+=bisect\u left(zeros2\u list,ones2.iloc[i,1])将对ties==0
和ties正常工作=0
您还可以按如下方式在O(log(N))时间而不是O(N)时间内计算ties
:ties=bisect_right(zeros2_list,ones2.iloc[i,1])-bisect_left(zeros2_list,ones2.iloc[i,1])
是的+1.如果没有if条件,这将使生活更轻松。您为什么不利用zeros2\u list
的“分类性”来计算disc
?在C++中,我将使用<代码> STD::LoWiRuxBuffy/Cuth>。code>disc+=bisect\u left(zeros2\u list,ones2.iloc[i,1])将对ties==0
和ties正常工作=0
您还可以按如下方式在O(log(N))时间而不是O(N)时间内计算ties
:ties=bisect_right(zeros2_list,ones2.iloc[i,1])-bisect_left(zeros2_list,ones2.iloc[i,1])
是的+1.如果没有if条件,生活会更轻松。
zeros_list = sorted([zeros2.iloc[j,1] for j in zeros2.index])
ones2_list = sorted([ones2.iloc[i,1] for i in ones2.index])
zeros2_length = len(zeros2_list)
ones2_length = len(ones2_list)
for i in zeros2.index:
cur_conc = bisect_left(ones2_list, zeros2.iloc[i,1])
cur_ties = bisect_right(ones2_list, zeros2.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
disc += ones2_length - cur_ties - cur_conc
# We could also achieve the above like this too:
# for i in zeros2_list:
# cur_conc = bisect_left(ones2_list, i)
# cur_ties = bisect_right(ones2_list, i) - cur_conc
# conc += cur_conc
# ties += cur_ties
# disc += ones2_length - cur_ties - cur_conc
pairs_tested = zeros2_length * ones2_length
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)
print("Pairs = %r", pairs_tested
Probability = model.predict_proba(data[predictors])
Probability1 = pd.DataFrame(Probability)
Probability1.columns = ['Prob_LoanStatus_0','Prob_LoanStatus_1']
TruthTable = pd.merge(data[[outcome]], Probability1[['Prob_LoanStatus_1']], how='inner', left_index=True, right_index=True)
zeros = TruthTable[(TruthTable['Loan_Status']==0)].reset_index().drop(['index'], axis = 1)
ones = TruthTable[(TruthTable['Loan_Status']==1)].reset_index().drop(['index'], axis = 1)
from bisect import bisect_left, bisect_right
zeros_list = sorted([zeros.iloc[j,1] for j in zeros.index])
zeros_length = len(zeros_list)
disc = 0
ties = 0
conc = 0
for i in ones.index:
cur_conc = bisect_left(zeros_list, ones.iloc[i,1])
cur_ties = bisect_right(zeros_list, ones.iloc[i,1]) - cur_conc
conc += cur_conc
ties += cur_ties
pairs_tested = zeros_length * len(ones.index)
disc = pairs_tested - conc - ties
print("Pairs = ", pairs_tested)
print("Conc = ", conc)
print("Disc = ", disc)
print("Tied = ", ties)
concordance = conc/pairs_tested
discordance = disc/pairs_tested
ties_perc = ties/pairs_tested
print("Concordance = %r", concordance)
print("Discordance = %r", discordance)
print("Tied = %r", ties_perc)