Python 根据熊猫中的字符串列表筛选出行_Python_Pandas_Filter

Python 根据熊猫中的字符串列表筛选出行

python pandas filter

Python 根据熊猫中的字符串列表筛选出行,python,pandas,filter,Python,Pandas,Filter,我有一个大的时间序列数据帧（称为df），前5条记录如下所示： df stn years_of_data total_minutes avg_daily TOA_daily K_daily date 1900-01-14 AlberniElementary 4 5745 34.100 114.600 0.298 1900-01-14 AlberniWeather 6

我有一个大的时间序列数据帧（称为df），前5条记录如下所示：

df

         stn     years_of_data  total_minutes avg_daily TOA_daily   K_daily
date                        
1900-01-14  AlberniElementary      4    5745    34.100  114.600 0.298
1900-01-14  AlberniWeather         6    7129    29.500  114.600 0.257
1900-01-14  Arbutus                8    11174   30.500  114.600 0.266
1900-01-14  Arrowview              7    10080   27.600  114.600 0.241
1900-01-14  Bayside                7    9745    33.800  114.600 0.295

df[~df['stn'].isin(another_df['stn_to_remove_column_there'])]

目标：

remove_list = ['Arbutus','Bayside']

cleaned = df[df['stn'].str.contains('remove_list')]

我试图删除列表中任何字符串所在的行出现在“stn”列中。所以，我基本上是试图过滤这个数据集，使其不包括包含以下列表中任何字符串的行
尝试：

remove_list = ['Arbutus','Bayside'] cleaned = df[df['stn'].str.contains('remove_list')]
返回：

remove_list = ['Arbutus','Bayside'] cleaned = df[df['stn'].str.contains('remove_list')]
出[78]：

stn years_of_data total_minutes avg_daily TOA_daily K_daily date
没什么
我尝试了一些引号、括号甚至lambda函数的组合；虽然我是新手，但可能没有正确使用语法。
使用：

有一个类似的问题，发现了这个老线索，我认为还有其他方法可以得到同样的结果。对于@EdChum针对我的特定应用程序的解决方案，我的问题是我没有一个完全匹配的列表。如果您有相同的问题，
.isin
不适用于该应用程序
相反，您也可以尝试一些选项，包括numpy。其中：

removelist = ['ayside','rrowview'] df['flagCol'] = numpy.where(df.stn.str.contains('|'.join(remove_list)),1,0)
请注意，此解决方案实际上并没有删除匹配的行，只是标记它们。你可以随意复制/切片/删除

如果您不知道站点名称是否大写，并且不想事先对文本进行标准化，则此解决方案将非常有用
numpy.where
通常也相当快，可能与
.isin
没有太大区别
我只想把我的2美分添加到这个非常重要的用例中（过滤掉一个项目列表，按字符串值索引）。
.isin（）
方法的参数不需要是列表！它可以是一个pd系列！然后你可以这样做：

df stn years_of_data total_minutes avg_daily TOA_daily K_daily date 1900-01-14 AlberniElementary 4 5745 34.100 114.600 0.298 1900-01-14 AlberniWeather 6 7129 29.500 114.600 0.257 1900-01-14 Arbutus 8 11174 30.500 114.600 0.266 1900-01-14 Arrowview 7 10080 27.600 114.600 0.241 1900-01-14 Bayside 7 9745 33.800 114.600 0.295

df[~df['stn'].isin(another_df['stn_to_remove_column_there'])]

明白我的意思吗？您可以在不使用
.to_list（）
方法的情况下使用此构造。
还有其他方法吗？使用
lambda x:…
什么或者更确切地说，设置一些函数怎么样？我尝试的方法怎么样。我是不是很接近，或者我能做我想做的事？教我钓鱼，别只递给我一条鲷鱼！：）