Python 如何选择列表中字符串项的最细粒度_Python_List

Python 如何选择列表中字符串项的最细粒度

python list

Python 如何选择列表中字符串项的最细粒度,python,list,Python,List,我有一个字符串列表：这是有史以来第三大地震第三大地震，历史记录，大规模海啸，当他们登陆时，造成了广泛的破坏，留下了一个孟加拉湾周边国家估计有23万人死亡还有印度洋，你的“大规模海啸”，你的“大面积破坏”，据估计，这些国家有230000人死亡在孟加拉湾和印度洋周围，乌安估计有230000人 “孟加拉湾和印度洋周边的国家”， “国家”，你“孟加拉湾和印度洋”，你“海湾”， u‘孟加拉和印度洋’、u‘孟加拉’、u‘印度洋’] 您可以看到，某些元素包含其他元素，如： u“有史以来第三大地震

我有一个字符串列表：

这是有史以来第三大地震第三大地震，历史记录，大规模海啸，当他们登陆时，造成了广泛的破坏，留下了一个孟加拉湾周边国家估计有23万人死亡还有印度洋，你的“大规模海啸”，你的“大面积破坏”，据估计，这些国家有230000人死亡在孟加拉湾和印度洋周围，乌安估计有230000人 “孟加拉湾和印度洋周边的国家”， “国家”，你“孟加拉湾和印度洋”，你“海湾”， u‘孟加拉和印度洋’、u‘孟加拉’、u‘印度洋’]

您可以看到，某些元素包含其他元素，如：

u“有史以来第三大地震”

包含：

“第三大地震”

u“记录历史”

我如何才能只选择最细粒度的元素，如

u'recorded history'

，然后丢弃其余的元素？

我相信这可以满足您的要求：

In [14]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [15]: [s for s in allstrings if not any(t in s for t in allstrings if t != s)]
Out[15]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']

列表理解从简单开始。它从主列表中选择满足某些条件的字符串，

allstrings

：

[s代表allstrings中的s，如果……]

字符串

必须满足的条件是：

not any(t in s for t in allstrings if t != s)

如您所见，这将测试

allstrings

中的任何其他字符串

是否在

中。如果没有这样的字符串

，则

将包含在最终列表中

可能的改进实体

'they'

中是否包含实体

'they'

？答案取决于我们所说的实体。如果我们决定答案是否定的，那么我们应该对算法做一个小的修改。最简单的方法似乎是在每个字符串中填充空格。例如：

In [25]: u'the' in u'they'
Out[25]: True

In [26]: u' the ' in u' they '
Out[26]: False

为了实现这一点，我们添加了一个步骤，添加空格，运行实体检查，然后删除多余的空格：

In [30]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [31]: allstr2 = [u' {} '.format(s.strip()) for s in allstrings]

In [32]: [s.strip() for s in allstr2 if not any(t in s for t in allstr2 if t != s)]
Out[32]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']

正如您所看到的，这种细化对给定字符串没有影响，但对其他字符串可能会有影响。

我相信这符合您的要求：

In [14]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [15]: [s for s in allstrings if not any(t in s for t in allstrings if t != s)]
Out[15]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']