Python 从元组中获取前n个结果

Python 从元组中获取前n个结果,python,return-value,Python,Return Value,通过一系列函数搜索HTML并查找文本,然后查找关键字和分数,我最终得到一个元组,如下所示: test_new = extract_keywords(test_test) ('keywords: ', [('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0), ('engineer

通过一系列函数搜索HTML并查找文本,然后查找关键字和分数,我最终得到一个元组,如下所示:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
我想我可以使用计数器来查找n个最大值,但这似乎对元组不起作用。我试着用test_new[:3]对它进行切片,以获得最高值,因为它已经订购了,但这也不起作用

理想情况下,我需要通过函数传递它:

def top_keywords(rake_keywords, n=3):

#get top n keywords
return
def top_n_tups (tups, n=3):
    sorted_tup = sorted(tups, key=lambda tup: tup[1], reverse=True)
    return sorted_tup[:n]

top_n_tups(test_new[1])
def top_keywords(rake_keywords, n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords
其中,我可以基于n值返回值。尝试:

sorted(test_new, key=lambda t: t[1], reverse=True)[:5]

but got

'<' not supported between instances of 'str' and 'tuple'
函数从元组中获取前n个项 如果要创建从元组中获取前n个项的函数,则use可以使用以下函数:

def top_keywords(rake_keywords, n=3):

#get top n keywords
return
def top_n_tups (tups, n=3):
    sorted_tup = sorted(tups, key=lambda tup: tup[1], reverse=True)
    return sorted_tup[:n]

top_n_tups(test_new[1])
def top_keywords(rake_keywords, n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords
这将提供如下所示的结果集。假设这是一个元组,其中包含一个元组列表

[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
也可以使用值n调用函数。如果没有n,则默认为前3名。如果你给n=6,那么前6名。下面的例子说明了这一点

>>> top_n_tups(test_new[1],6)

[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0), ('engineer michael jones', 8.333333333333334), ('18 months engineers began', 8.25), ('embarrassed householder promised', 8.0)]
tuple包含一个tuple列表 如果要将元组存储到这样的变量中,则可以使用索引来检索它们

test_new = ('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25)])
然后,您可以使用如下内容:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
您还可以通过以下方式获得特定值:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
只包含元组 但是,如果数据没有列表,并且只包含这样的元组,那么您可以更轻松地检索它

>>> test_new = ('keywords: ',
  ('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
然后,您可以按如下方式检索它:

>>> test_new[1]
('single high-level impulse noise', 23.5)

>>> test_new[:3]
('keywords: ', ('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0))
请注意,test_num[0]是“关键字:”

函数,用于从元组中获取前n个项 如果要创建从元组中获取前n个项的函数,则use可以使用以下函数:

def top_keywords(rake_keywords, n=3):

#get top n keywords
return
def top_n_tups (tups, n=3):
    sorted_tup = sorted(tups, key=lambda tup: tup[1], reverse=True)
    return sorted_tup[:n]

top_n_tups(test_new[1])
def top_keywords(rake_keywords, n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords
这将提供如下所示的结果集。假设这是一个元组,其中包含一个元组列表

[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
也可以使用值n调用函数。如果没有n,则默认为前3名。如果你给n=6,那么前6名。下面的例子说明了这一点

>>> top_n_tups(test_new[1],6)

[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0), ('engineer michael jones', 8.333333333333334), ('18 months engineers began', 8.25), ('embarrassed householder promised', 8.0)]
tuple包含一个tuple列表 如果要将元组存储到这样的变量中,则可以使用索引来检索它们

test_new = ('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25)])
然后,您可以使用如下内容:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
您还可以通过以下方式获得特定值:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
只包含元组 但是,如果数据没有列表,并且只包含这样的元组,那么您可以更轻松地检索它

>>> test_new = ('keywords: ',
  ('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
然后,您可以按如下方式检索它:

>>> test_new[1]
('single high-level impulse noise', 23.5)

>>> test_new[:3]
('keywords: ', ('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0))
请注意,test_num[0]是“关键字:”

我想我可以使用计数器来查找n个最大值,但这似乎对元组不起作用

它对dict有效,而dict对元组有效:

Counter(dict(test_new[1])).most_common(3)
我想我可以使用计数器来查找n个最大值,但这似乎对元组不起作用

它对dict有效,而dict对元组有效:

Counter(dict(test_new[1])).most_common(3)

如果像这样存储test_new的值:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
然后你可以做:

def top_keywords(rake_keywords, n=3):
    return sorted(rake_keywords[1], key=lambda t: t[1], reverse=True)[:n]

如果像这样存储test_new的值:

test_new = extract_keywords(test_test)

('keywords: ',
 [('single high-level impulse noise', 23.5),
  ('cable replacement programme failed', 16.0),
  ('meet current british standards', 16.0),
  ('engineer michael jones', 8.333333333333334),
  ('18 months engineers began', 8.25),
  ('embarrassed householder promised', 8.0),
  ('second-hand television', 8.0),
  ('openreach chief engineer', 7.75),
  ('electrical interference emitted', 7.583333333333334),
  ('entire village lost', 7.0),
  ('stable broadband signal', 6.714285714285714),
  ('problem television fixed', 6.6),
  ('electrical noise', 5.75),
  ('electrical interference', 4.583333333333334),
  ('mr jones', 4.333333333333334),
  ('engineers discovered', 4.25))
>>> test_new[1][:3]
[('single high-level impulse noise', 23.5), ('cable replacement programme failed', 16.0), ('meet current british standards', 16.0)]
>>> test_new[1][0][0]
'single high-level impulse noise'

>>> test_new[1][0][1]
23.5
test_new = ('keywords: ', [
    ('single high-level impulse noise', 23.5),
    ('cable replacement programme failed', 16.0),
    ('meet current british standards', 16.0),
    ('engineer michael jones', 8.333333333333334),
    ('18 months engineers began', 8.25),
    ('embarrassed householder promised', 8.0),
    ('second-hand television', 8.0),
    ('openreach chief engineer', 7.75),
    ('electrical interference emitted', 7.583333333333334),
    ('entire village lost', 7.0),
    ('stable broadband signal', 6.714285714285714),
    ('problem television fixed', 6.6),
    ('electrical noise', 5.75),
    ('electrical interference', 4.583333333333334),
    ('mr jones', 4.333333333333334),
    ('engineers discovered', 4.25)
])
然后你可以做:

def top_keywords(rake_keywords, n=3):
    return sorted(rake_keywords[1], key=lambda t: t[1], reverse=True)[:n]
如果extract_results函数返回“keywords:”,[],其中实际数据集位于元组中,那么只需使用test_new[1]为数据集编制索引,并将其放入排序代码中,而不是整个元组:

sorted(test_new[1], key=lambda t: t[1], reverse=True)[:5]
但是,我认为这是一个源于extract_results函数的问题。如果我猜的话,extract_results函数的返回语句是:

return 'keywords: ', keywords
如果是这种情况,这会掩盖真实数据,因为函数现在返回一个包含字符串关键字的元组:然后是实际关键字,您现在必须对元组进行索引以获取数据。您不需要在return语句中写下给定的关键字;您的函数和返回关键字会自动生成文档。用return关键字替换该行,您可以按正常方式运行排序,而无需编写test_new[1]

如果您想帮助将排序语句转换为函数,其他答案都有

从您的示例中,我最初假设问题在于数据集本身。通过您对数据外观的说明,情况似乎并非如此。

如果extract_results函数返回“keywords:”,[],其中实际数据集位于元组中,那么只需使用test_new[1]对数据集进行索引,并将其放入排序代码中,而不是整个元组中:

sorted(test_new[1], key=lambda t: t[1], reverse=True)[:5]
但是,我认为这是一个源于extract_results函数的问题。如果我猜的话,extract_results函数的返回语句是:

return 'keywords: ', keywords
如果是这种情况,这会掩盖真实数据,因为函数现在返回一个包含字符串关键字的元组:然后是实际关键字,您现在必须对元组进行索引以获取数据。您不需要在return语句中写下给定的关键字;您的函数和返回关键字会自动生成文档。用return关键字替换该行,您可以按正常方式运行排序,而无需编写test_new[1]

如果您想帮助将排序语句转换为functi 对了,其他答案都是这样的


从您的示例中,我最初假设问题在于数据集本身。通过您对数据外观的说明,情况似乎并非如此。

您的样本数据在列表中缺少一个结尾],但您第一次尝试切片时似乎走对了方向:

test_new[1][:3]
为您提供前3个元组,然后您只需从中提取关键字:

top_keywords = [kw[0] for kw in test_new[1][:3]]
或者将其分解为一个函数:

def top_keywords(rake_keywords, n=3):

#get top n keywords
return
def top_n_tups (tups, n=3):
    sorted_tup = sorted(tups, key=lambda tup: tup[1], reverse=True)
    return sorted_tup[:n]

top_n_tups(test_new[1])
def top_keywords(rake_keywords, n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords

您的示例数据在列表中缺少一个结尾],但您第一次尝试切片时似乎走对了方向:

test_new[1][:3]
为您提供前3个元组,然后您只需从中提取关键字:

top_keywords = [kw[0] for kw in test_new[1][:3]]
或者将其分解为一个函数:

def top_keywords(rake_keywords, n=3):

#get top n keywords
return
def top_n_tups (tups, n=3):
    sorted_tup = sorted(tups, key=lambda tup: tup[1], reverse=True)
    return sorted_tup[:n]

top_n_tups(test_new[1])
def top_keywords(rake_keywords, n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords

在sortednote中更改为test_new[1]注意test_new[0]=“关键字:”,因此如果要检索列表中的值,需要查找test_new[1]。在sortednote中更改为test_new[1]注意test_new[0]=“关键字:”,因此需要查找test_new[1]如果你想检索列表中的值。我刚刚意识到我使用了一个与函数同名的局部变量,但是你明白了。我刚刚意识到我使用了一个与函数同名的局部变量,但是你明白了。