Python 从多个词典创建csv文件？_Python_Csv_Dictionary

Python 从多个词典创建csv文件？

python csv dictionary

Python 从多个词典创建csv文件？,python,csv,dictionary,Python,Csv,Dictionary,我正在将单词的频率计算到许多文本文件（140个文档）中，我的工作的最后是创建一个csv文件，在这个文件中，我可以按单个文档和所有文档对每个单词的频率进行排序假设我有： absolut_freq= {u'hello':0.001, u'world':0.002, u'baby':0.005} doc_1= {u'hello':0.8, u'world':0.9, u'baby':0.7} doc_2= {u'hello':0.2, u'world':0.3, u'baby':0.6} ... d

我正在将单词的频率计算到许多文本文件（140个文档）中，我的工作的最后是创建一个csv文件，在这个文件中，我可以按单个文档和所有文档对每个单词的频率进行排序

假设我有：

absolut_freq= {u'hello':0.001, u'world':0.002, u'baby':0.005}
doc_1= {u'hello':0.8, u'world':0.9, u'baby':0.7}
doc_2= {u'hello':0.2, u'world':0.3, u'baby':0.6}
...
doc_140={u'hello':0.1, u'world':0.5, u'baby':0.9}

所以，我需要一个cvs文件，在excel中导出，如下所示：

WORD，ABS\u FREQ，DOC\u 1\u FREQ，DOC\u 2\u FREQ，…，DOC\u 140\u FREQ
你好，0.001 0.80.20.1
世界，0.002 0.9 0.03 0.5
婴儿，0.005 0.7 0.6 0.9

我如何使用python完成这项工作？

无论您想如何编写此数据，首先您需要一个有序的数据结构，例如2D列表：

docs = []
docs.append( {u'hello':0.001, u'world':0.002, u'baby':0.005} )
docs.append( {u'hello':0.8, u'world':0.9, u'baby':0.7} )
docs.append( {u'hello':0.2, u'world':0.3, u'baby':0.6} )
docs.append( {u'hello':0.1, u'world':0.5, u'baby':0.9} )
words = docs[0].keys()
result = [ [word] + [ doc[word] for doc in docs ] for word in words ]

然后，您可以使用内置的csv模块：

您还可以将其转换为Pandas数据帧，并将其保存为csv文件或以干净的格式继续分析

absolut_freq= {u'hello':0.001, u'world':0.002, u'baby':0.005}
doc_1= {u'hello':0.8, u'world':0.9, u'baby':0.7}
doc_2= {u'hello':0.2, u'world':0.3, u'baby':0.6}
doc_140={u'hello':0.1, u'world':0.5, u'baby':0.9}


all = [absolut_freq, doc_1, doc_2, doc_140]

# if you have a bunch of docs, you could use enumerate and then format the colname as you iterate over and create the dataframe
colnames = ['AbsoluteFreq', 'Doc1', 'Doc2', 'Doc140']


import pandas as pd

masterdf = pd.DataFrame()

for i in all:
    df = pd.DataFrame([i]).T
    masterdf = pd.concat([masterdf, df], axis=1)

# assign the column names
masterdf.columns = colnames

# get a glimpse of what the data frame looks like
masterdf.head()

# save to csv 
masterdf.to_csv('docmatrix.csv', index=True)

# and to sort the dataframe by frequency
masterdf.sort(['AbsoluteFreq'])

通过首先创建一个包含所有数据的

表

，然后使用

csv

模块将转置（列用于行交换）版本写入输出文件，您可以使其成为一个主要由数据驱动的过程，只需给出所有字典变量的变量名

import csv

absolut_freq = {u'hello': 0.001, u'world': 0.002, u'baby': 0.005}
doc_1 = {u'hello': 0.8, u'world': 0.9, u'baby': 0.7}
doc_2 = {u'hello': 0.2, u'world': 0.3, u'baby': 0.6}
doc_140 ={u'hello': 0.1, u'world': 0.5, u'baby': 0.9}

dic_names = ('absolut_freq', 'doc_1', 'doc_2', 'doc_140')  # dict variable names

namespace = globals()
words = namespace[dic_names[0]].keys()  # assume dicts all contain the same words
table = [['WORD'] + list(words)]  # header row (becomes first column of output)

for dic_name in dic_names:  # add values from each dictionary given its name
    table.append([dic_name.upper()+'_FREQ'] + list(namespace[dic_name].values()))

# Use open('merged_dicts.csv', 'wb') for Python 2.
with open('merged_dicts.csv', 'w', newline='') as csvfile:
    csv.writer(csvfile).writerows(zip(*table))

print('done')

生成的CSV文件：

WORD，绝对频率，文档1频率，文档2频率，文档140频率
世界，0.002,0.9,0.3,0.5
婴儿，0.005,0.7,0.6,0.9
你好，0.001,0.8,0.2,0.1

查看

csv.DictWriter