Python 如何将给定数据集中的值添加到空字典中

Python 如何将给定数据集中的值添加到空字典中,python,pandas,dataframe,dictionary,Python,Pandas,Dataframe,Dictionary,下面的代码由我正在参加的Python课程练习中的Datacamp版权所有 我得到了一个csv文件,数据集包含Twitter数据,我必须迭代一列中的条目来构建一个字典,其中键是语言的名称,值是给定语言中的推文数量。生成的代码是正确的和有效的然而,我不能完全理解if-else语句部分中的代码是如何工作的 代码的输出是:{'en':97,'et':1,'und':2} 我的问题是:我们如何获得上述给定的输出。for循环中的代码内部到底发生了什么,以及if-else 谢谢大家! 我在中为| i

下面的代码由我正在参加的Python课程练习中的Datacamp版权所有

  • 我得到了一个csv文件,数据集包含Twitter数据,我必须迭代一列中的条目来构建一个字典,其中键是语言的名称,值是给定语言中的推文数量。生成的代码是正确的和有效的然而,我不能完全理解if-else语句部分中的代码是如何工作的

  • 代码的输出是:
    {'en':97,'et':1,'und':2}

  • 我的问题是:我们如何获得上述给定的输出。for循环中的代码内部到底发生了什么,以及if-else


  • 谢谢大家!

    我在
    中为| if | else
    添加了一些解释性说明,以根据要求提高对代码的理解

    为了便于解释,我将数据集更改为一个最小的示例

    作为一般提示:Pandas有一个内置方法(
    value\u counts
    )可以实现同样的功能

    # Import pandas
    import pandas as pd 
    
    # Import Twitter data as DataFrame: df
    # df = pd.read_csv('tweets.csv') 
    df = pd.DataFrame(
        data=[
              'en',  # 1st row
              'en',  # 2nd row
              'und', # 3rd row
              'et',  # 4th row
              'und'  # 5th row
              ],
        columns=['lang']
    )
    
    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df['lang']
    
    print('before the loop, langs_count is an empty dict')
    print(langs_count, '\n')
    # Iterate over lang column in DataFrame
    for ii, entry in enumerate(col):
    
        # If the language is in langs_count, add 1 
        if entry in langs_count.keys():
            print(f'{ii}\nif: the key "{col.iloc[ii]}" exists, so adds 1 to value')
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            print(f'{ii}\nelse: the key "{col.iloc[ii]}" does not exist, so create it with value 1')
            langs_count[entry] = 1
        print(langs_count, '\n')
    
    # Print the populated dictionary
    # print(langs_count)
    #{'en': 97, 'et': 1, 'und': 2}
    
    # the same could be reached through
    # without the need of loop or if / else
    print('value_counts solution')
    df['lang'].value_counts().to_dict()
    
    输出:


    感谢您的清晰解释和您添加的额外知识。它非常有用!
    # Import pandas
    import pandas as pd 
    
    # Import Twitter data as DataFrame: df
    # df = pd.read_csv('tweets.csv') 
    df = pd.DataFrame(
        data=[
              'en',  # 1st row
              'en',  # 2nd row
              'und', # 3rd row
              'et',  # 4th row
              'und'  # 5th row
              ],
        columns=['lang']
    )
    
    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df['lang']
    
    print('before the loop, langs_count is an empty dict')
    print(langs_count, '\n')
    # Iterate over lang column in DataFrame
    for ii, entry in enumerate(col):
    
        # If the language is in langs_count, add 1 
        if entry in langs_count.keys():
            print(f'{ii}\nif: the key "{col.iloc[ii]}" exists, so adds 1 to value')
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            print(f'{ii}\nelse: the key "{col.iloc[ii]}" does not exist, so create it with value 1')
            langs_count[entry] = 1
        print(langs_count, '\n')
    
    # Print the populated dictionary
    # print(langs_count)
    #{'en': 97, 'et': 1, 'und': 2}
    
    # the same could be reached through
    # without the need of loop or if / else
    print('value_counts solution')
    df['lang'].value_counts().to_dict()
    
    """
    before the loop, langs_count is an empty dict
    {} 
    
    0
    else: the key "en" does not exist, so create it with value 1
    {'en': 1} 
    
    1
    if: the key "en" exists, so adds 1 to value
    {'en': 2} 
    
    2
    else: the key "und" does not exist, so create it with value 1
    {'en': 2, 'und': 1} 
    
    3
    else: the key "et" does not exist, so create it with value 1
    {'en': 2, 'und': 1, 'et': 1} 
    
    4
    if: the key "und" exists, so adds 1 to value
    {'en': 2, 'und': 2, 'et': 1} 
    
    value_counts solution
    {'en': 2, 'et': 1, 'und': 2}
    """