Python 按可用连字符数追加列表项_Python

Python 按可用连字符数追加列表项

python

Python 按可用连字符数追加列表项,python,Python,我有一张名单叫mylist。它由元组和单词及其随机标记组成。我不想使用reg-ex。最小标记为1，最大标记为5。我想有5个不同的列表，根据标签的数量对于一个标记元组，我尝试了以下方法： one=[] 对于mylist中的i：如果'-'不在i[1]中：一、附加（一）打印一个正确打印[（'country'，'NN'），[（'receive'，'VBZ'）对于第二个标签，我希望打印[（'threats'，'NN-JJ'），[（'former'，'NN-RB'）第三、第四和第五个标签集也是

我有一张名单叫mylist。它由元组和单词及其随机标记组成。我不想使用reg-ex。最小标记为1，最大标记为5。我想有5个不同的列表，根据标签的数量

对于一个标记元组，我尝试了以下方法：

one=[]
对于mylist中的i：
如果'-'不在i[1]中：
一、附加（一）
打印一个

正确打印

[（'country'，'NN'），[（'receive'，'VBZ'）

对于第二个标签，我希望打印

[（'threats'，'NN-JJ'），[（'former'，'NN-RB'）

第三、第四和第五个标签集也是如此。我不知道怎么做

我的实际文件有

个标签，它由大约1000万个单词和它们的标签组成。有没有办法知道哪个单词有最大的不同标签

这将非常有帮助！

您可以使用“-”作为分隔符拆分字符串，并按如下方式计算结果生成列表中的元素数（对于3个标记）-

您可以使用“-”作为分隔符拆分字符串，并按如下所示计算结果生成列表中的元素数（对于3个标记）-

您可以使用

defaultdict

来组织数据，并使用

.count

来计算

的数量

>>> [t for t in mylist if len(t[1].split('-')) == 3]
[('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]

您可以使用以下代码打印结果

from collections import defaultdict

mylist = [('country', 'NN'), ('shoot', 'NN-DT-PPL'), ... ]
res = defaultdict(list)

for item, tags in mylist:
    res[tags.count('-') + 1].append((item, tags))

印刷品：

for k, v in res.items():
    print(str(k) + ": " + str(v))

您可以使用

defaultdict

来组织数据，并使用

.count

来计算

的数量

>>> [t for t in mylist if len(t[1].split('-')) == 3]
[('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]

您可以使用以下代码打印结果

from collections import defaultdict

mylist = [('country', 'NN'), ('shoot', 'NN-DT-PPL'), ... ]
res = defaultdict(list)

for item, tags in mylist:
    res[tags.count('-') + 1].append((item, tags))

印刷品：

for k, v in res.items():
    print(str(k) + ": " + str(v))

最大破折号为：

brunsgaard@archbook /tmp> python test2.py
1: [('country', 'NN'), ('receive', 'VBZ')]
2: [('threats', 'NN-JJ'), ('former', 'NN-RB')]
3: [('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]
4: [('during', 'NN-VBD-JJ-RB'), ('school', 'NN-CC-JJ-DT')]
5: [('teacher', 'NN-VBZ-PPL-JJ-DT'), ('batman', 'NN-IN-ABX-CD-RB')]

但是，有更有效的方法可以做到这一点，使用字典：

max_dash_count = max(i[1].count('-') for i in mylist) + 1

之后，您将得到一个列表字典，您可以轻松地对其进行迭代：

dash_dict = dict()
for i in mylist:
    count = i[1].count('-') + 1
    if count in dash_dict:
        dash_dict[count].add(i)
    else:
        dash_dict[count] = [i]

最大破折号为：

brunsgaard@archbook /tmp> python test2.py
1: [('country', 'NN'), ('receive', 'VBZ')]
2: [('threats', 'NN-JJ'), ('former', 'NN-RB')]
3: [('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]
4: [('during', 'NN-VBD-JJ-RB'), ('school', 'NN-CC-JJ-DT')]
5: [('teacher', 'NN-VBZ-PPL-JJ-DT'), ('batman', 'NN-IN-ABX-CD-RB')]

但是，有更有效的方法可以做到这一点，使用字典：

max_dash_count = max(i[1].count('-') for i in mylist) + 1

之后，您将得到一个列表字典，您可以轻松地对其进行迭代：

dash_dict = dict()
for i in mylist:
    count = i[1].count('-') + 1
    if count in dash_dict:
        dash_dict[count].add(i)
    else:
        dash_dict[count] = [i]

！/usr/bin/python
mylist=[（'country'，'NN'），（'shoot'，'NN-DT-PPL'），（'threats'，'NN-JJ'），（'both'，'RB-JJ-NN'），（'during'，'NN-VBD-JJ-RB'），（'Foreiver'，'NN-RB'），（'school'，'NN-CC-JJ-DT'），（'teacher'，'NN-VBZ-PPL-JJ-DT'），（'receive'，'VBZ'），（'batman'，'NN-IN-ABX-CD-RB'）]
最大标签=5
def findTag（）：
d={}
对于mylist中的tup：
a、 b=tup
n=b.计数（'-'）
如果不是0[（'country'，'NN'），（'receive'，'VBZ'）]
2=>[（'threats'，'NN-JJ'），（'former'，'NN-RB'）]
3=>[（'shoot'，'NN-DT-PPL'），（'both'，'RB-JJ-NN'）]
4=>[（'during'，'NN-VBD-JJ-RB'），（'school'，'NN-CC-JJ-DT'）]
5=>[（'teacher'，'NN-VBZ-PPL-JJ-DT'），（'batman'，'NN-IN-ABX-CD-RB'）]

#！/usr/bin/python
mylist=[（'country'，'NN'），（'shoot'，'NN-DT-PPL'），（'threats'，'NN-JJ'），（'both'，'RB-JJ-NN'），（'during'，'NN-VBD-JJ-RB'），（'Foreiver'，'NN-RB'），（'school'，'NN-CC-JJ-DT'），（'teacher'，'NN-VBZ-PPL-JJ-DT'），（'receive'，'VBZ'），（'batman'，'NN-IN-ABX-CD-RB'）]
最大标签=5
def findTag（）：
d={}
对于mylist中的tup：
a、 b=tup
n=b.计数（'-'）
如果不是0[（'country'，'NN'），（'receive'，'VBZ'）]
2=>[（'threats'，'NN-JJ'），（'former'，'NN-RB'）]
3=>[（'shoot'，'NN-DT-PPL'），（'both'，'RB-JJ-NN'）]
4=>[（'during'，'NN-VBD-JJ-RB'），（'school'，'NN-CC-JJ-DT'）]
5=>[（'teacher'，'NN-VBZ-PPL-JJ-DT'），（'batman'，'NN-IN-ABX-CD-RB'）]

执行此操作的其他方法

#!/usr/bin/python

mylist = [('country', 'NN'), ('shoot', 'NN-DT-PPL'), ('threats', 'NN-JJ'), ('both','RB-JJ-NN'), ('during', 'NN-VBD-JJ-RB'), ('former', 'NN-RB'), ('school', 'NN-CC-JJ-DT'), ('teacher', 'NN-VBZ-PPL-JJ-DT'), ('receive', 'VBZ'), ('batman', 'NN-IN-ABX-CD-RB')]
MAX_TAG = 5
def findTag():
   d = {}
   for tup in mylist:
      a,b = tup
      n = b.count('-')
      if not 0 <= n <= MAX_TAG - 1:
         continue
      if n not in d:
         d[n] = []
      d[n].append(tup)

   for k in sorted(d.keys()):
      print '{} => {}'.format(k+1, d[k])
if __name__ == '__main__':
   findTag()

1 => [('country', 'NN'), ('receive', 'VBZ')]
2 => [('threats', 'NN-JJ'), ('former', 'NN-RB')]
3 => [('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]
4 => [('during', 'NN-VBD-JJ-RB'), ('school', 'NN-CC-JJ-DT')]
5 => [('teacher', 'NN-VBZ-PPL-JJ-DT'), ('batman', 'NN-IN-ABX-CD-RB')]

其他方法

#!/usr/bin/python

mylist = [('country', 'NN'), ('shoot', 'NN-DT-PPL'), ('threats', 'NN-JJ'), ('both','RB-JJ-NN'), ('during', 'NN-VBD-JJ-RB'), ('former', 'NN-RB'), ('school', 'NN-CC-JJ-DT'), ('teacher', 'NN-VBZ-PPL-JJ-DT'), ('receive', 'VBZ'), ('batman', 'NN-IN-ABX-CD-RB')]
MAX_TAG = 5
def findTag():
   d = {}
   for tup in mylist:
      a,b = tup
      n = b.count('-')
      if not 0 <= n <= MAX_TAG - 1:
         continue
      if n not in d:
         d[n] = []
      d[n].append(tup)

   for k in sorted(d.keys()):
      print '{} => {}'.format(k+1, d[k])
if __name__ == '__main__':
   findTag()

1 => [('country', 'NN'), ('receive', 'VBZ')]
2 => [('threats', 'NN-JJ'), ('former', 'NN-RB')]
3 => [('shoot', 'NN-DT-PPL'), ('both', 'RB-JJ-NN')]
4 => [('during', 'NN-VBD-JJ-RB'), ('school', 'NN-CC-JJ-DT')]
5 => [('teacher', 'NN-VBZ-PPL-JJ-DT'), ('batman', 'NN-IN-ABX-CD-RB')]