Python Str转换为Dict,每个Str的len作为k,len作为v的单词列表

Python Str转换为Dict,每个Str的len作为k,len作为v的单词列表,python,python-3.x,Python,Python 3.x,我这里有一个字符串: str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a compute

我这里有一个字符串:

str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

'Text file' refers to a type of container, while plain text refers to a type of content.

At a generic level of description, there are two kinds of computer files: text files and binary files"
我应该创建一本字典,其中的关键是单词的长度和单词的长度 值是具有相同长度的所有单词。并使用列表存储所有这些单词

这是我尝试过的,它是有效的,但我不确定如何有效地使用循环来实现这一点,请任何人分享答案

files_dict_values = {}
files_list = list(set(str_file_txt.split()))

values_1=[]
values_2=[]
values_3=[]
values_4=[]
values_5=[]
values_6=[]
values_7=[]
values_8=[]
values_9=[]
values_10=[]
values_11=[]


for ele in files_list:
  if len(ele) == 1:
    values_1.append(ele)
    files_dict_values.update({len(ele):values_1})
  elif len(ele) == 2:
    values_2.append(ele)
    files_dict_values.update({len(ele):values_2})
  elif len(ele) == 3:
    values_3.append(ele)
    files_dict_values.update({len(ele):values_3})
  elif len(ele) == 4:
    values_4.append(ele)
    files_dict_values.update({len(ele):values_4})
  elif len(ele) == 5:
    values_5.append(ele)
    files_dict_values.update({len(ele):values_5})
  elif len(ele) == 6:
    values_6.append(ele)
    files_dict_values.update({len(ele):values_6})
  elif len(ele) == 7:
    values_7.append(ele)
    files_dict_values.update({len(ele):values_7})
  elif len(ele) == 8:
    values_8.append(ele)
    files_dict_values.update({len(ele):values_8})
  elif len(ele) == 9:
    values_9.append(ele)
    files_dict_values.update({len(ele):values_9})
  elif len(ele) == 10:
    values_10.append(ele)
    files_dict_values.update({len(ele):values_10})

print(files_dict_values)
以下是我得到的输出:

{6: ['modern', 'bytes,', 'stored', 'within', 'exists', 'bytes.', 'system', 'binary', 'length', 'files:', 'refers'], 8: ['sequence', 'content.', 'variable', 'records.', 'systems,', 'computer'], 10: ['container,', 'electronic', 'delimiters', 'structured', '(sometimes', 'character,'], 1: ['A', 'a'], 4: ['will', 'line', 'data', 'done', 'last', 'more', 'kind', 'such', 'text', 'Some', 'size', 'need', 'ways', 'have', 'file', 'CP/M', 'with', 'that', 'most', 'name', 'type', 'keep', 'does'], 5: ['store', 'after', 'files', 'while', 'file"', 'known', 'those', 'plain', 'there', 'fixed', 'which', '"Text', 'file.', 'level', 'where', 'track', 'lines', 'kinds', 'text.', 'There'], 9: ['depending', 'Unix-like', 'primarily', 'textfile;', 'separated', 'Microsoft', 'flatfile)', 'operating', 'different'], 3: ['EOF', 'may', 'one', 'and', 'use', 'are', 'two', 'new', 'the', 'end', 'any', 'for', 'few', 'old', 'not'], 7: ['systems', 'denoted', 'Windows', 'because', 'spelled', 'marker,', 'padding', 'special', 'MS-DOS,', 'generic', 'contain', 'system.', 'placing'], 2: ['At', 'do', 'of', 'on', 'as', 'in', 'an', 'or', 'is', 'In', 'On', 'by', 'to']}

使用循环并让json自己创建键怎么样

str_files_txt = "A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. There are for most text files a need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records. 'Text file' refers to a type of container, while plain text refers to a type of content. At a generic level of description, there are two kinds of computer files: text files and binary files"
op={}
for items in str_files_txt.split():
    if len(items) not in op:
        op[len(items)]=[]
    op[len(items)].append(items)
for items in op:
    op[items]=list(set(op[items]))

您遇到了两个问题:清理数据和创建字典

从不属于单词的字符中清除单词后,请使用defaultdict(列表)。(这与被骗者的类似)

输出:

1 ['A', 'a']
2 ['to', 'an', 'At', 'do', 'on', 'In', 'On', 'as', 'by', 'or', 'of', 'in', 'is']
3 ['use', 'the', 'one', 'and', 'few', 'not', 'EOF', 'may', 'any', 'for', 'are', 'two', 'end', 'new', 'old']
4 ['have', 'that', 'such', 'type', 'need', 'text', 'more', 'done', 'kind', 'Some', 'does', 'most', 'file', 'with', 'line', 'ways', 'keep', 'CP/M', 'name', 'will', 'Text', 'data', 'last', 'size']
5 ['track', 'those', 'bytes', 'fixed', 'known', 'where', 'which', 'there', 'while', 'There', 'lines', 'kinds', 'store', 'files', 'plain', 'after', 'level']
6 ['exists', 'modern', 'MS-DOS', 'system', 'within', 'refers', 'length', 'marker', 'stored', 'binary']
7 ['because', 'placing', 'content', 'Windows', 'padding', 'systems', 'records', 'contain', 'special', 'generic', 'denoted', 'spelled']
8 ['computer', 'sequence', 'textfile', 'variable']
9 ['Microsoft', 'depending', 'different', 'Unix-like', 'flatfile)', 'primarily', 'container', 'character', 'separated', 'operating']
10 ['delimiters', 'characters', 'electronic', '(sometimes', 'structured']
11 ['end-of-file', 'alternative', 'end-of-line', 'description']
17 ['record-orientated']
{1: ['A', 'a', 'a', 'A', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 4: ['text', 'file', 'name', 'kind', 'file', 'that', 'text.', 'text', 'file', 'data', 'file', 'such', 'does', 'keep', 'file', 'size', 'text', 'file', 'more', 'last', 'line', 'text', 'file.', 'such', 'text', 'file', 'keep', 'file', 'size', 'most', 'text', 'need', 'have', 'done', 'ways', 'Some', 'with', 'file', 'line', 'will', 'text', 'with', "'Text", "file'", 'type', 'text', 'type', 'text'], 9: ['(sometimes', 'operating', 'operating', 'end-of-file', 'operating', 'Microsoft', 'character,', 'operating', 'end-of-line', 'different', 'depending', 'operating', 'operating', 'primarily', 'separated', 'container,'], 7: ['spelled', 'systems', 'denoted', 'placing', 'special', 'padding', 'systems', 'Windows', 'systems,', 'contain', 'special', 'because', 'systems', 'systems', 'systems', 'systems', 'records.', 'content.', 'generic'], 8: ['textfile;', 'flatfile)', 'computer', 'sequence', 'computer', 'Unix-like', 'variable', 'computer'], 2: ['an', 'is', 'is', 'of', 'is', 'as', 'of', 'of', 'as', 'In', 'as', 'of', 'in', 'of', 'is', 'by', 'or', 'as', 'an', 'as', 'in', 'On', 'as', 'do', 'on', 'of', 'in', 'to', 'in', 'on', 'as', 'or', 'to', 'of', 'to', 'of', 'At', 'of', 'of'], 3: ['old', 'CP/M', 'and', 'the', 'not', 'the', 'the', 'end', 'one', 'the', 'and', 'not', 'any', 'EOF', 'the', 'are', 'for', 'are', 'few', 'may', 'not', 'use', 'new', 'and', 'are', 'two', 'and'], 11: ['alternative', 'description,'], 10: ['structured', 'electronic', 'characters,', 'delimiters,', 'delimiters'], 5: ['lines', 'MS-DOS,', 'where', 'track', 'bytes,', 'known', 'after', 'files', 'those', 'track', 'bytes.', 'There', 'files', 'which', 'store', 'files', 'lines', 'fixed', 'while', 'plain', 'level', 'there', 'kinds', 'files:', 'files', 'files'], 6: ['exists', 'stored', 'within', 'system.', 'system', 'marker,', 'modern', 'system.', 'length', 'refers', 'refers', 'binary'], 16: ['record-orientated']}
这是我的解决方案,它得到单词的长度(没有特殊字符,例如标点符号)

要获取单词的绝对长度(带标点符号),请将
不带特殊字符的单词
替换为
单词

输出:

1 ['A', 'a']
2 ['to', 'an', 'At', 'do', 'on', 'In', 'On', 'as', 'by', 'or', 'of', 'in', 'is']
3 ['use', 'the', 'one', 'and', 'few', 'not', 'EOF', 'may', 'any', 'for', 'are', 'two', 'end', 'new', 'old']
4 ['have', 'that', 'such', 'type', 'need', 'text', 'more', 'done', 'kind', 'Some', 'does', 'most', 'file', 'with', 'line', 'ways', 'keep', 'CP/M', 'name', 'will', 'Text', 'data', 'last', 'size']
5 ['track', 'those', 'bytes', 'fixed', 'known', 'where', 'which', 'there', 'while', 'There', 'lines', 'kinds', 'store', 'files', 'plain', 'after', 'level']
6 ['exists', 'modern', 'MS-DOS', 'system', 'within', 'refers', 'length', 'marker', 'stored', 'binary']
7 ['because', 'placing', 'content', 'Windows', 'padding', 'systems', 'records', 'contain', 'special', 'generic', 'denoted', 'spelled']
8 ['computer', 'sequence', 'textfile', 'variable']
9 ['Microsoft', 'depending', 'different', 'Unix-like', 'flatfile)', 'primarily', 'container', 'character', 'separated', 'operating']
10 ['delimiters', 'characters', 'electronic', '(sometimes', 'structured']
11 ['end-of-file', 'alternative', 'end-of-line', 'description']
17 ['record-orientated']
{1: ['A', 'a', 'a', 'A', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'], 4: ['text', 'file', 'name', 'kind', 'file', 'that', 'text.', 'text', 'file', 'data', 'file', 'such', 'does', 'keep', 'file', 'size', 'text', 'file', 'more', 'last', 'line', 'text', 'file.', 'such', 'text', 'file', 'keep', 'file', 'size', 'most', 'text', 'need', 'have', 'done', 'ways', 'Some', 'with', 'file', 'line', 'will', 'text', 'with', "'Text", "file'", 'type', 'text', 'type', 'text'], 9: ['(sometimes', 'operating', 'operating', 'end-of-file', 'operating', 'Microsoft', 'character,', 'operating', 'end-of-line', 'different', 'depending', 'operating', 'operating', 'primarily', 'separated', 'container,'], 7: ['spelled', 'systems', 'denoted', 'placing', 'special', 'padding', 'systems', 'Windows', 'systems,', 'contain', 'special', 'because', 'systems', 'systems', 'systems', 'systems', 'records.', 'content.', 'generic'], 8: ['textfile;', 'flatfile)', 'computer', 'sequence', 'computer', 'Unix-like', 'variable', 'computer'], 2: ['an', 'is', 'is', 'of', 'is', 'as', 'of', 'of', 'as', 'In', 'as', 'of', 'in', 'of', 'is', 'by', 'or', 'as', 'an', 'as', 'in', 'On', 'as', 'do', 'on', 'of', 'in', 'to', 'in', 'on', 'as', 'or', 'to', 'of', 'to', 'of', 'At', 'of', 'of'], 3: ['old', 'CP/M', 'and', 'the', 'not', 'the', 'the', 'end', 'one', 'the', 'and', 'not', 'any', 'EOF', 'the', 'are', 'for', 'are', 'few', 'may', 'not', 'use', 'new', 'and', 'are', 'two', 'and'], 11: ['alternative', 'description,'], 10: ['structured', 'electronic', 'characters,', 'delimiters,', 'delimiters'], 5: ['lines', 'MS-DOS,', 'where', 'track', 'bytes,', 'known', 'after', 'files', 'those', 'track', 'bytes.', 'There', 'files', 'which', 'store', 'files', 'lines', 'fixed', 'while', 'plain', 'level', 'there', 'kinds', 'files:', 'files', 'files'], 6: ['exists', 'stored', 'within', 'system.', 'system', 'marker,', 'modern', 'system.', 'length', 'refers', 'refers', 'binary'], 16: ['record-orientated']}

您可以直接将字符串添加到字典的正确位置,如下所示:

res = {}
for ele in list(set(str_files_txt.split())):
  if len(ele) in res:
    res[len(ele)].append(ele)
  else:
    res[len(ele)] = [ele]
print(res)
res = {}
for ele in list(set(str_files_txt.split())):
  if len(ele) in res:
    res[len(ele)].append(ele)
  else:
    res[len(ele)] = [ele]
print(res)