Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/366.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何选择列表中字符串项的最细粒度_Python_List - Fatal编程技术网

Python 如何选择列表中字符串项的最细粒度

Python 如何选择列表中字符串项的最细粒度,python,list,Python,List,我有一个字符串列表: 这是有史以来第三大地震 第三大地震,历史记录,大规模海啸, 当他们登陆时,造成了广泛的破坏,留下了一个 孟加拉湾周边国家估计有23万人死亡 还有印度洋,你的“大规模海啸”,你的“大面积破坏”, 据估计,这些国家有230000人死亡 在孟加拉湾和印度洋周围,乌安估计有230000人 “孟加拉湾和印度洋周边的国家”, “国家”,你“孟加拉湾和印度洋”,你“海湾”, u‘孟加拉和印度洋’、u‘孟加拉’、u‘印度洋’] 您可以看到,某些元素包含其他元素,如: u“有史以来第三大地震

我有一个字符串列表:

这是有史以来第三大地震 第三大地震,历史记录,大规模海啸, 当他们登陆时,造成了广泛的破坏,留下了一个 孟加拉湾周边国家估计有23万人死亡 还有印度洋,你的“大规模海啸”,你的“大面积破坏”, 据估计,这些国家有230000人死亡 在孟加拉湾和印度洋周围,乌安估计有230000人 “孟加拉湾和印度洋周边的国家”, “国家”,你“孟加拉湾和印度洋”,你“海湾”, u‘孟加拉和印度洋’、u‘孟加拉’、u‘印度洋’]

您可以看到,某些元素包含其他元素,如:

u“有史以来第三大地震”

包含:

“第三大地震”

u“记录历史”


我如何才能只选择最细粒度的元素,如
u'recorded history'
,然后丢弃其余的元素?

我相信这可以满足您的要求:

In [14]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [15]: [s for s in allstrings if not any(t in s for t in allstrings if t != s)]
Out[15]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']
列表理解从简单开始。它从主列表中选择满足某些条件的字符串,
allstrings
[s代表allstrings中的s,如果……]

字符串
s
必须满足的条件是:

not any(t in s for t in allstrings if t != s)
如您所见,这将测试
allstrings
中的任何其他字符串
t
是否在
s
中。如果没有这样的字符串
t
,则
s
将包含在最终列表中

可能的改进 实体
'they'
中是否包含实体
'they'
?答案取决于我们所说的实体。如果我们决定答案是否定的,那么我们应该对算法做一个小的修改。最简单的方法似乎是在每个字符串中填充空格。例如:

In [25]: u'the' in u'they'
Out[25]: True

In [26]: u' the ' in u' they '
Out[26]: False
为了实现这一点,我们添加了一个步骤,添加空格,运行实体检查,然后删除多余的空格:

In [30]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [31]: allstr2 = [u' {} '.format(s.strip()) for s in allstrings]

In [32]: [s.strip() for s in allstr2 if not any(t in s for t in allstr2 if t != s)]
Out[32]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']

正如您所看到的,这种细化对给定字符串没有影响,但对其他字符串可能会有影响。

我相信这符合您的要求:

In [14]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [15]: [s for s in allstrings if not any(t in s for t in allstrings if t != s)]
Out[15]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']
列表理解从简单开始。它从主列表中选择满足某些条件的字符串,
allstrings
[s代表allstrings中的s,如果……]

字符串
s
必须满足的条件是:

not any(t in s for t in allstrings if t != s)
如您所见,这将测试
allstrings
中的任何其他字符串
t
是否在
s
中。如果没有这样的字符串
t
,则
s
将包含在最终列表中

可能的改进 实体
'they'
中是否包含实体
'they'
?答案取决于我们所说的实体。如果我们决定答案是否定的,那么我们应该对算法做一个小的修改。最简单的方法似乎是在每个字符串中填充空格。例如:

In [25]: u'the' in u'they'
Out[25]: True

In [26]: u' the ' in u' they '
Out[26]: False
为了实现这一点,我们添加了一个步骤,添加空格,运行实体检查,然后删除多余的空格:

In [30]: allstrings = [u'This', u'the third largest earthquake in recorded history', u'the third largest earthquake', u'recorded history', u'massive tsunamis , which caused widespread devastation when they hit land , leaving an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'massive tsunamis', u'widespread devastation', u'they', u'land', u'an estimated 230,000 people dead in countries around the Bay of Bengal and the Indian Ocean', u'an estimated 230,000 people', u'countries around the Bay of Bengal and the Indian Ocean', u'countries', u'the Bay of Bengal and the Indian Ocean', u'the Bay', u'Bengal and the Indian Ocean', u'Bengal', u'the Indian Ocean']

In [31]: allstr2 = [u' {} '.format(s.strip()) for s in allstrings]

In [32]: [s.strip() for s in allstr2 if not any(t in s for t in allstr2 if t != s)]
Out[32]: 
[u'This',
 u'the third largest earthquake',
 u'recorded history',
 u'massive tsunamis',
 u'widespread devastation',
 u'they',
 u'land',
 u'an estimated 230,000 people',
 u'countries',
 u'the Bay',
 u'Bengal',
 u'the Indian Ocean']

正如您所见,这种细化对给定字符串没有影响,但对其他字符串可能会有影响。

定义您所说的最细粒度是什么意思?听起来你在尝试进行词法分析(即理解单词的含义,甚至可能理解包含这些单词的短语)。这正是我想要做的,我只想选择不包含其他实体的最小实体。定义实体-python将看到的只是字符串和字符。您可以让python使用“see”字,但要在空格或标点符号上进行拆分,但即使这样也需要少量代码。为什么u“记录的历史”是最小的实体,为什么u“历史”或事件“o”不是最小的实体?定义你所说的最细粒度是什么意思?听起来你在尝试进行词法分析(即理解单词的含义,甚至可能理解包含这些单词的短语)。这正是我想要做的,我只想选择不包含其他实体的最小实体。定义实体-python将看到的只是字符串和字符。您可以让python使用“see”字,但要在空格或标点符号上进行拆分,但即使这样也需要少量代码。为什么u'recorded history'是最小的实体,为什么u'history'或event'o'不是?在尝试理解之前,我先祈祷一下:)
所有的都是内置函数,尽量不要实际使用它。@John1024非常感谢你,这一切都归结为如何编写一行代码的逻辑,我挣扎了好几天。@John1024你能告诉我在哪里可以学着编写像你这样漂亮的代码吗?比如你所展示的复杂列表理解?@Sean谢谢你的赞美!我没有特别的建议:只是阅读和练习。只要遵循标记为
python
的stackoverflow问题,就可以学到很多东西。在尝试理解之前,我先祈祷:)
all
是一个内置函数,尽量不要实际使用它。@John1024非常感谢,这一切都归结为如何编写一行程序的逻辑,我挣扎了好几天。@John1024你能告诉我在哪里可以学着编写像你这样漂亮的代码吗?比如你所展示的复杂列表理解?@Sean谢谢你的赞美!我没有特别的建议:只是阅读和练习。只要遵循标记为
python
的stackoverflow问题,就可以学到很多东西。