Python 如果行值包含列表中的项作为子字符串，请将行值保存到其他数据帧_Python_Pandas

Python 如果行值包含列表中的项作为子字符串，请将行值保存到其他数据帧

python pandas

Python 如果行值包含列表中的项作为子字符串，请将行值保存到其他数据帧,python,pandas,Python,Pandas,如果行值包含列表中的项作为子字符串，请将行值保存到其他数据帧输入数据帧： index link 1 https://zeewhois.com/en/ 2 https://www.phpfk.de/domain 3 https://www.phpfk.de/de/domain 4 https://laseguridad.online/questions/1040/pued list=['verizon'、'zeewois'、'idad'] 如果df

如果行值包含列表中的项作为子字符串，请将行值保存到其他数据帧

输入数据帧：

index    link
1      https://zeewhois.com/en/
2      https://www.phpfk.de/domain
3      https://www.phpfk.de/de/domain
4      https://laseguridad.online/questions/1040/pued

list=['verizon'、'zeewois'、'idad']

如果df['link']有任何

列表

项作为子字符串，我们需要将特定的

链接

放在不同的新数据帧中

到目前为止，我已经预处理了

链接

列并购买了以下格式：

index    link
1      httpszeewhoiscomenwww
2      httpswwwphpfkdedomain
3      httpswwwphpfkdededomain
4      httpslaseguridadonlinequestions1040pued

查找哪些行值包含

列表中的项作为子字符串
df[“TRUEFALSE”]=df['link']。应用（lambda x:1，如果有（列表中的i在x中表示i在x中）其他0）

但我得到了一个错误：
TypeError: 'in <string>' requires string as left operand, not float

TypeError:'in'需要字符串作为左操作数，而不是浮点
您不需要处理链接。您可以简单地执行以下操作：
In [51]: import numpy as np

In [47]: df                                                                                                                                                                                                 
Out[47]: 
                                                 link
index                                                
1                            https://zeewhois.com/en/
2                         https://www.phpfk.de/domain
3                      https://www.phpfk.de/de/domain
4      https://laseguridad.online/questions/1040/pued

l =['verizon','zeewhois','idad'] ## It's not nice to have variable with names like list,dict etc.

In [50]: def match(x): 
    ...:     for i in l: 
    ...:         if i.lower() in x.lower(): 
    ...:             return i 
    ...:     else: 
    ...:         return np.nan 
    ...:                     

In [48]: new_df = df[df['link'].apply(match).notna()] 

In [49]: new_df                                                                                                                                                                                             
Out[49]: 
                                                 link
index                                                
1                            https://zeewhois.com/en/
4      https://laseguridad.online/questions/1040/pued

您不需要处理链接
。您可以简单地执行以下操作：
In [51]: import numpy as np

In [47]: df                                                                                                                                                                                                 
Out[47]: 
                                                 link
index                                                
1                            https://zeewhois.com/en/
2                         https://www.phpfk.de/domain
3                      https://www.phpfk.de/de/domain
4      https://laseguridad.online/questions/1040/pued

l =['verizon','zeewhois','idad'] ## It's not nice to have variable with names like list,dict etc.

In [50]: def match(x): 
    ...:     for i in l: 
    ...:         if i.lower() in x.lower(): 
    ...:             return i 
    ...:     else: 
    ...:         return np.nan 
    ...:                     

In [48]: new_df = df[df['link'].apply(match).notna()] 

In [49]: new_df                                                                                                                                                                                             
Out[49]: 
                                                 link
index                                                
1                            https://zeewhois.com/en/
4      https://laseguridad.online/questions/1040/pued

您可以使用str.contains
list_strings =['verizon','zeewhois','idad']

df.loc[df.link.str.contains('|'.join(list_strings),case=False), 'TRUE_FALSE'] = True



 index             link                                TRUE_FALSE
    1   https://zeewhois.com/en/                        True
    2   https://www.phpfk.de/domain                     NaN
    3   https://www.phpfk.de/de/domain                  NaN
    4   https://laseguridad.online/questions/1040/pued  True

然后只需过滤True就可以得到新的数据帧
new_df = df.loc[df.TRUE_FALSE == True].copy()

index               link                        TRUE_FALSE
1   https://zeewhois.com/en/                        True
4   https://laseguridad.online/questions/1040/pued  True

您可以使用str.contains
list_strings =['verizon','zeewhois','idad']

df.loc[df.link.str.contains('|'.join(list_strings),case=False), 'TRUE_FALSE'] = True



 index             link                                TRUE_FALSE
    1   https://zeewhois.com/en/                        True
    2   https://www.phpfk.de/domain                     NaN
    3   https://www.phpfk.de/de/domain                  NaN
    4   https://laseguridad.online/questions/1040/pued  True

然后只需过滤True就可以得到新的数据帧
new_df = df.loc[df.TRUE_FALSE == True].copy()

index               link                        TRUE_FALSE
1   https://zeewhois.com/en/                        True
4   https://laseguridad.online/questions/1040/pued  True

在def match（）中：
l是列表吗？是l
是列表。使用名为list、dict
等的变量是不好的。因此我保留了它l
。您的解决方案为我提供了一个空df，即使列表中有子字符串的stings..奇怪。我把我测试过的溶液放进去。它在我的机器里工作得很好。请确保复制粘贴正确。在def match（）中：
l是列表吗？是l
是列表。使用名为list、dict
等的变量是不好的。因此我保留了它l
。您的解决方案为我提供了一个空df，即使列表中有子字符串的stings..奇怪。我把我测试过的溶液放进去。它在我的机器里工作得很好。请确保复制粘贴正确。如果使用此选项，我会出现以下错误：AttributeError:“DataFrame”对象没有属性“link”
哦，我认为您需要重置索引，请尝试使用df.reset_index（inplace=True）然后使用df.drop（columns='index'，inplace=True），以及打印（df.columns）时看到什么？我没有索引列..尽管如此，我还是尝试了您要求的..不起作用。打印（df.columns）时是否显示“链接”？您的列中可能有一些隐藏的空格，在这种情况下，您可以尝试df.columns=df.columns.str.strip（）来修复此问题是的，错误是由于链接前面的空格造成的。但是即使是特定的错误也没有出现，我得到了一个新的错误。TypeError:sequence item 350:expected str instance，float found
如果我使用这个，我会得到以下错误：AttributeError:'DataFrame'对象没有属性'link'
哦，我想你需要重置你的索引，试着使用df.reset\u index（inplace=True）然后使用df.drop（columns='index'，inplace=True），打印（df.columns）时你会看到什么？我没有索引列..仍然，我试过你问的…不起作用。打印（df.columns）时是否显示“链接”？您的列中可能有一些隐藏的空格，在这种情况下，您可以尝试df.columns=df.columns.str.strip（）来修复此问题是的，错误是由于链接
之前的空格造成的。但即使是特定的错误也没有出现，我得到了一个新的错误。TypeError:sequence item 350:expected str instance，float found