Python 如果我没有'；t在选定的列中，是否只有一个元素构成列？_Python

Python 如果我没有'；t在选定的列中，是否只有一个元素构成列？

python

Python 如果我没有'；t在选定的列中，是否只有一个元素构成列？,python,Python,下面的代码是针对以下数据定制的： [['slashdot', 'USA', 'yes', 18, 'None'], ['google', 'France', 'yes', 23, 'Premium'], ['digg', 'USA', 'yes', 24, 'Basic'], ... ] [[(0,'y'), (0,'n'), (1,'n'), (2,'y'), (3,'y')], [(0,'y'), (0,'y'), (1,'y'), (2,'y'), (3,'n')], ...]

下面的代码是针对以下数据定制的：

[['slashdot', 'USA', 'yes', 18, 'None'],
 ['google', 'France', 'yes', 23, 'Premium'],
 ['digg', 'USA', 'yes', 24, 'Basic'],
 ...
]

[[(0,'y'), (0,'n'), (1,'n'), (2,'y'), (3,'y')],
 [(0,'y'), (0,'y'), (1,'y'), (2,'y'), (3,'n')], ...]

相反，我的数据如下所示：

[['slashdot', 'USA', 'yes', 18, 'None'],
 ['google', 'France', 'yes', 23, 'Premium'],
 ['digg', 'USA', 'yes', 24, 'Basic'],
 ...
]

[[(0,'y'), (0,'n'), (1,'n'), (2,'y'), (3,'y')],
 [(0,'y'), (0,'y'), (1,'y'), (2,'y'), (3,'n')], ...]

这是密码

def divideset(rows,column,value):
   # Make a function that tells us if a row is in the first group (true) or the second group (false)
   split_function=None
   if isinstance(value,int) or isinstance(value,float): # check if the value is a number i.e int or float
      split_function=lambda row:row[column]>=value
   else:
      split_function=lambda row:row[column]==value

   # Divide the rows into two sets and return them
   set1=[row for row in rows if split_function(row)]
   set2=[row for row in rows if not split_function(row)]
   return (set1,set2)

应用于第一个数据的函数结果为

([['slashdot', 'USA', 'yes', 18, 'None'],
  ['google', 'France', 'yes', 23, 'Premium'],
  ['digg', 'USA', 'yes', 24, 'Basic'],
  ['kiwitobes', 'France', 'yes', 23, 'Basic'],
  ['slashdot', 'France', 'yes', 19, 'None'],
  ['digg', 'New Zealand', 'yes', 12, 'Basic'],
  ['google', 'UK', 'yes', 18, 'Basic'],
  ['kiwitobes', 'France', 'yes', 19, 'Basic']],
 [['google', 'UK', 'no', 21, 'Premium'],
  ['(direct)', 'New Zealand', 'no', 12, 'None'],
  ['(direct)', 'UK', 'no', 21, 'Basic'],
  ['google', 'USA', 'no', 24, 'Premium'],
  ['digg', 'USA', 'no', 18, 'None'],
  ['google', 'UK', 'no', 18, 'None'],
  ['kiwitobes', 'UK', 'no', 19, 'None'],
  ['slashdot', 'UK', 'no', 21, 'None']])

如果我把它应用到我的代码中，我有一个空集，另一个空集包含所有数据。

在下面的代码中，我们首先重新定义了

dividedata

，使它更灵活（更高效）后来，我们在您向我们展示的两种不同类型的数据上测试了我们的新实现-关于使用您的数据的标准，我不得不临时修改

新的

dividedata

不再具有根据传递给函数的值的类型选择的固定测试，而是利用Python将函数作为数据进行威胁的能力，

test

必须是您在代码的其他地方定义的函数，或者在调用

dividedata

时动态定义的函数，使用

lambda

语法

from __future__ import print_function

def dividedata(data, col_num, test=None):
    "Doc string placeholder"

    # if no test function was passed, return (quite arbitrarily)
    # a shallow copy of the original list and an empty list
    if test==None: return data[:], []

    set0, set1 = [], []
    for row in data:
        if test(row[col_num]):
            set0.append(row)
        else:
            set1.append(row)

    return set0, set1

my_data = [[(0, 'y'), (0, 'n'), (1, 'n'), (2, 'y'), (3, 'y')],
           [(0, 'y'), (0, 'y'), (1, 'y'), (2, 'y'), (3, 'n')]]

their_data = [['slashdot', 'USA', 'yes', 18, 'None'],
              ['google', 'France', 'yes', 23, 'Premium'],
              ['digg', 'USA', 'yes', 24, 'Basic'],
              ['kiwitobes', 'France', 'yes', 23, 'Basic'],
              ['slashdot', 'France', 'yes', 19, 'None'],
              ['digg', 'New Zealand', 'yes', 12, 'Basic'],
              ['google', 'UK', 'yes', 18, 'Basic'],
              ['kiwitobes', 'France', 'yes', 19, 'Basic'],
              ['google', 'UK', 'no', 21, 'Premium'],
              ['(direct)', 'New Zealand', 'no', 12, 'None'],
              ['(direct)', 'UK', 'no', 21, 'Basic'],
              ['google', 'USA', 'no', 24, 'Premium'],
              ['digg', 'USA', 'no', 18, 'None'],
              ['google', 'UK', 'no', 18, 'None'],
              ['kiwitobes', 'UK', 'no', 19, 'None'],
              ['slashdot', 'UK', 'no', 21, 'None']]

c2y, c2n = dividedata(their_data, 2, lambda answer: answer=='yes')
c1y, c1n = dividedata(my_data, 1, lambda tup: tup[1]=='y')

print(c2y, c2n, sep='\n')
print(c1y, c1n, sep='\n')

如果你给我们看1，帮助你会更容易。如何对原始数据调用函数，2。您如何在数据和3上调用它。如果你决定将你的问题与这些重要的细节结合起来，请不要把它们作为评论，而是编辑你的答案。在我们讨论这个问题时，请允许我指出，您试图重用的函数设计得不好，而且效率也很低，因为它会扫描所有数据两次？您是否有可能忘记考虑到您提供的代码将（现在）检查列表中元组的相等性？除非向函数调用传递一个元组，否则该条件将始终返回False，这就是为什么set2将是inputdata的副本，而set1将为空。根据您要筛选的内容，您可能只需要向lambda添加一个更高级别的索引，例如

行[列][tuple\u index]

。我会在几小时内尝试。是的，根据所需时间，它似乎更有效。