Python 如何将给定数据集中的值添加到空字典中
下面的代码由我正在参加的Python课程练习中的Datacamp版权所有Python 如何将给定数据集中的值添加到空字典中,python,pandas,dataframe,dictionary,Python,Pandas,Dataframe,Dictionary,下面的代码由我正在参加的Python课程练习中的Datacamp版权所有 我得到了一个csv文件,数据集包含Twitter数据,我必须迭代一列中的条目来构建一个字典,其中键是语言的名称,值是给定语言中的推文数量。生成的代码是正确的和有效的然而,我不能完全理解if-else语句部分中的代码是如何工作的 代码的输出是:{'en':97,'et':1,'und':2} 我的问题是:我们如何获得上述给定的输出。for循环中的代码内部到底发生了什么,以及if-else 谢谢大家! 我在中为| i
{'en':97,'et':1,'und':2}
谢谢大家! 我在
中为| if | else
添加了一些解释性说明,以根据要求提高对代码的理解
为了便于解释,我将数据集更改为一个最小的示例
作为一般提示:Pandas有一个内置方法(value\u counts
)可以实现同样的功能
# Import pandas
import pandas as pd
# Import Twitter data as DataFrame: df
# df = pd.read_csv('tweets.csv')
df = pd.DataFrame(
data=[
'en', # 1st row
'en', # 2nd row
'und', # 3rd row
'et', # 4th row
'und' # 5th row
],
columns=['lang']
)
# Initialize an empty dictionary: langs_count
langs_count = {}
# Extract column from DataFrame: col
col = df['lang']
print('before the loop, langs_count is an empty dict')
print(langs_count, '\n')
# Iterate over lang column in DataFrame
for ii, entry in enumerate(col):
# If the language is in langs_count, add 1
if entry in langs_count.keys():
print(f'{ii}\nif: the key "{col.iloc[ii]}" exists, so adds 1 to value')
langs_count[entry] += 1
# Else add the language to langs_count, set the value to 1
else:
print(f'{ii}\nelse: the key "{col.iloc[ii]}" does not exist, so create it with value 1')
langs_count[entry] = 1
print(langs_count, '\n')
# Print the populated dictionary
# print(langs_count)
#{'en': 97, 'et': 1, 'und': 2}
# the same could be reached through
# without the need of loop or if / else
print('value_counts solution')
df['lang'].value_counts().to_dict()
输出:
感谢您的清晰解释和您添加的额外知识。它非常有用!
# Import pandas
import pandas as pd
# Import Twitter data as DataFrame: df
# df = pd.read_csv('tweets.csv')
df = pd.DataFrame(
data=[
'en', # 1st row
'en', # 2nd row
'und', # 3rd row
'et', # 4th row
'und' # 5th row
],
columns=['lang']
)
# Initialize an empty dictionary: langs_count
langs_count = {}
# Extract column from DataFrame: col
col = df['lang']
print('before the loop, langs_count is an empty dict')
print(langs_count, '\n')
# Iterate over lang column in DataFrame
for ii, entry in enumerate(col):
# If the language is in langs_count, add 1
if entry in langs_count.keys():
print(f'{ii}\nif: the key "{col.iloc[ii]}" exists, so adds 1 to value')
langs_count[entry] += 1
# Else add the language to langs_count, set the value to 1
else:
print(f'{ii}\nelse: the key "{col.iloc[ii]}" does not exist, so create it with value 1')
langs_count[entry] = 1
print(langs_count, '\n')
# Print the populated dictionary
# print(langs_count)
#{'en': 97, 'et': 1, 'und': 2}
# the same could be reached through
# without the need of loop or if / else
print('value_counts solution')
df['lang'].value_counts().to_dict()
"""
before the loop, langs_count is an empty dict
{}
0
else: the key "en" does not exist, so create it with value 1
{'en': 1}
1
if: the key "en" exists, so adds 1 to value
{'en': 2}
2
else: the key "und" does not exist, so create it with value 1
{'en': 2, 'und': 1}
3
else: the key "et" does not exist, so create it with value 1
{'en': 2, 'und': 1, 'et': 1}
4
if: the key "und" exists, so adds 1 to value
{'en': 2, 'und': 2, 'et': 1}
value_counts solution
{'en': 2, 'et': 1, 'und': 2}
"""