Python 过滤大数据帧的更快方法是（=.at），（=.loc），（.drop）还是（.append）？_Python_Performance_Pandas_Dataframe

Python 过滤大数据帧的更快方法是（=.at），（=.loc），（.drop）还是（.append）？

python performance pandas dataframe

Python 过滤大数据帧的更快方法是（=.at），（=.loc），（.drop）还是（.append）？,python,performance,pandas,dataframe,Python,Performance,Pandas,Dataframe,我想对一个大约40万行、4列的数据帧进行排序，用if语句将其中的大约一半取出： for a in range (0, howmanytimestorunthrough): if ('Primary' not in DataFrameexample[a]): #take out row 到目前为止，我一直在测试以下4项中的任何一项： newdf.append(emptyline,) nefdf.at[b,'column1'] = DataFram

我想对一个大约40万行、4列的数据帧进行排序，用if语句将其中的大约一半取出：

    for a in range (0, howmanytimestorunthrough): 
        if ('Primary' not in DataFrameexample[a]):
            #take out row

到目前为止，我一直在测试以下4项中的任何一项：

newdf.append(emptyline,)
nefdf.at[b,'column1'] = DataFrameexample.at[a,'column1']
nefdf.at[b,'column2'] = DataFrameexample.at[a,'column2']
nefdf.at[b,'column3'] = DataFrameexample.at[a,'column3']
nefdf.at[b,'column4'] = DataFrameexample.at[a,'column4']
b = b + 1

或与.loc相同

newdf.append(emptyline,)
nefdf.loc[b,:] = DataFrameexample.loc[a,:]
b = b + 1

或将if（不在）更改为if（在），并使用：

DataFrameexample = DataFrameexample.drop([k])

或者尝试将emptyline设置为具有值，然后将其追加：

notemptyline = pd.Series(DataFrameexample.loc[a,:].values, index = ['column1', 'column2', ...) 
newdf.append(notemptyline, ignore_index=True)

因此，从我迄今为止所做的测试来看，它们在少量行（2000行）上似乎都可以正常工作，但一旦我开始获得更多的行，它们所用的时间就会呈指数增长。at似乎比.loc快一些，即使我需要它运行4次，但仍然会变慢（行数的10倍，耗时超过10倍）。drop我想每次都会尝试复制数据帧，所以真的不起作用吗？我似乎无法获取.append（notemptyline）以正常工作，它只是一次又一次地替换索引0

我知道必须有一个有效的方法来做这件事，我只是似乎无法达到目的。有什么帮助吗？

你的速度问题与.loc vs.at vs.没有任何关系。。。（对于.loc和.at look之间的比较，请看一下这一点），但它来自于显式循环数据帧的每一行。熊猫就是要引导你的行动

您希望根据比较筛选数据帧。您可以将其转换为布尔索引器

indexer = df!='Primary'

这将为您提供一个带有布尔值的4×n行数据帧。现在，您希望将维度减少到1 x n行，这样，如果行（轴1）中的所有值均为真，则该值为真

indexer = indexer.all(axis=1)

现在我们可以使用.loc仅获取行，索引器is

True

df = df.loc[indexer]

这将比在行上迭代快得多

编辑：

要检查df条目是否包含字符串，可以替换第一行：

indexer = df.apply(lambda x: x.str.contains('Primary'))

请注意，您通常不希望使用apply语句（在内部，它使用for循环用于自定义函数）来迭代许多元素。在这种情况下，我们将在这些列上循环，如果您有两列就可以了。

谢谢，我想我正在慢慢地到达那里。如何更改第一行以测试“Primary”是否是字符串的一部分？我有这样的值：testprimary，Test，其他一些，一些Primary，这行似乎只是测试Primary是否是整个字符串？