Python从CSV文件在dict中添加多个数据点_Python_Csv_Dictionary

Python从CSV文件在dict中添加多个数据点

python csv dictionary

Python从CSV文件在dict中添加多个数据点,python,csv,dictionary,Python,Csv,Dictionary,我有一个CSV文件，看起来像： CountryCode, NumberCalled, CallPrice, CallDuration BS,+1234567,0.20250,29 BS,+19876544,0.20250,1 US,+121234,0.01250,4 US,+1543215,0.01250,39 US,+145678,0.01250,11 US,+18765678,None,0 我希望能够分析文件，从数据中得出一些统计数据： CountryCode, NumberOfTimes

我有一个CSV文件，看起来像：

CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0

我希望能够分析文件，从数据中得出一些统计数据：

CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54

目前，我有一句话是这样说的：

CalledStatistics = {}

当我从CSV中读取每一行时，将数据放入dict的最佳方式是什么

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

添加第二条美国线会覆盖第一条线吗？还是会根据键“CountryCode”添加数据？

这些调用中的每一个：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

将覆盖之前的呼叫

为了计算你需要的总数，你可以使用一个dict的dict。就像在for循环中，您的数据包含在以下变量中：国家代码、通话持续时间、通话价格，以及将数据存储在收集的通话统计数据中的位置：（编辑：添加第一行，以便在数据中记录为无时将call_price转换为0；这段代码旨在处理一致的数据，如仅整数，如果可能存在其他类型的数据，则需要将它们转换为整数[或任何相同类型的数]，然后python才能对它们求和）

循环后，对于每个国家/地区代码：

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

好的，最后这里是一个完整的工作脚本，处理您给出的示例：

#!/usr/bin/env python3

import csv
import decimal

with open('CalledData', newline='') as csvfile:
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')

    # btw this creates a dict, not a set
    collected_statistics = {}

    for row in csv_r:

        [country_code, number_called, call_price, call_duration] = row

        # Only to avoid the first line, but would be better to have a list of available
        # (and correct) codes, and check if the country_code belongs to this list:
        if country_code != 'CountryCode':

            call_price = call_price if call_price != 'None' else 0

            if country_code not in collected_statistics:
                collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
                                                      'CallPrice' : [decimal.Decimal(call_price)]}
            else:
                collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
                collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]


    for country_code in collected_statistics:
        print(str(country_code) + ":")
        print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
        print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
        print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

将CalledData用作与您提供的内容完全相同的文件，它将输出：

$ ./test_script
BS:
number of times called: 2
total price: 0.40500
total call duration: 30
US:
number of times called: 4
total price: 0.03750
total call duration: 54

每一个电话：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

将覆盖之前的呼叫

为了计算所需的总和，您可以使用dict of dict。例如，在for循环中，您的数据包含在以下变量中：国家代码、通话持续时间、通话价格，以及您将数据存储在收集的通话统计数据中的位置：（编辑：添加第一行，以便在数据中记录为无时将call_price转换为0；这段代码旨在处理一致的数据，如仅整数，如果可能存在其他类型的数据，则需要将它们转换为整数[或任何相同类型的数]，然后python才能对它们求和）

循环后，对于每个国家/地区代码：

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

好的，最后这里是一个完整的工作脚本，处理您给出的示例：

#!/usr/bin/env python3

import csv
import decimal

with open('CalledData', newline='') as csvfile:
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')

    # btw this creates a dict, not a set
    collected_statistics = {}

    for row in csv_r:

        [country_code, number_called, call_price, call_duration] = row

        # Only to avoid the first line, but would be better to have a list of available
        # (and correct) codes, and check if the country_code belongs to this list:
        if country_code != 'CountryCode':

            call_price = call_price if call_price != 'None' else 0

            if country_code not in collected_statistics:
                collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
                                                      'CallPrice' : [decimal.Decimal(call_price)]}
            else:
                collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
                collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]


    for country_code in collected_statistics:
        print(str(country_code) + ":")
        print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
        print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
        print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

将CalledData用作与您提供的内容完全相同的文件，它将输出：

$ ./test_script
BS:
number of times called: 2
total price: 0.40500
total call duration: 30
US:
number of times called: 4
total price: 0.03750
total call duration: 54

字典可以包含字典列表和字典列表，因此您可以实现所需的结构，如下所示：

CalledStatistics['CountryCode'] =[ {
    'CallDuration':cd_val, 
    'CallPrice':cp_val,
    'NumberOfTimesCalled':ntc_val } ]

然后可以添加如下值：

for line in lines:
    parts = line.split(',')
    CalledStatistics[parts.pop(0)].append({
        'CallDuration':parts[0], 
        'CallPrice':parts[1],
        'NumberOfTimesCalled':parts[2] })

通过将每个countryCode列为一个列表，您可以向每个countryCode添加任意多个唯一的DICT

pop（一）

方法，返回值并对列表进行变异，这样剩下的就是您需要的dict值的数据。这就是为什么我们弹出索引

，并将索引

添加到dict中。

词典可以包含词典列表和列表，因此您可以实现所需的结构，如下所示：

CalledStatistics['CountryCode'] =[ {
    'CallDuration':cd_val, 
    'CallPrice':cp_val,
    'NumberOfTimesCalled':ntc_val } ]

然后可以添加如下值：

for line in lines:
    parts = line.split(',')
    CalledStatistics[parts.pop(0)].append({
        'CallDuration':parts[0], 
        'CallPrice':parts[1],
        'NumberOfTimesCalled':parts[2] })

通过将每个countryCode列为一个列表，您可以向每个countryCode添加任意多个唯一的DICT

pop（i）

方法返回值并对列表进行变异，这样剩下的就是dict值所需的数据。这就是为什么我们弹出索引

，并将索引

添加到dict中的原因。

您的方法可能略有不同。只需读取文件，将其列为一个列表（readlines.strip（“\n”），然后拆分（“，”）

忘记第一行和最后一行（很可能是空的，test）。然后，您可以使用@zezollo used的示例创建dict，只需按要创建的dict的键添加值。确保在将其创建为列表列表后，要添加的所有值都是相同的类型

没有什么比艰苦的工作更重要的了，你会长久地记得那件事；）

测试，测试，在模拟示例上测试。阅读Python帮助和文档。这非常棒。

您的方法可能会稍有不同。只需阅读文件，将其列为一个列表（readlines.strip（“\n”）、split（“，”）

没有什么比艰苦的工作更重要的了，你会长久地记得那件事；）

测试，测试，模拟示例测试。阅读Python帮助和文档。非常精彩。

问题是什么？你有一本字典，每次阅读CSV时，国家代码都会被覆盖，因此你最终会得到一个带键的dict（BS，US）and values=最近的条目，即覆盖的数据。您真的要为

CalledStatistics['CountryCode']

分配一个集合吗？在字典中，键是唯一的值，因此，是的，这样做会覆盖该值。您只需将新值分配给现有键（US）.问题是什么？您有一本字典，每次阅读CSV时，国家/地区代码都会被覆盖，因此您最终会得到一个带有键（BS，US）和值=最新条目的dict，即被覆盖的数据。您真的要为

CalledStatistics['CountryCode'分配一个集合吗

？在字典中，键是唯一的值，因此这样做会覆盖该值。您只需将新值分配给现有键（US）。这不起作用，因为最后一行的值为None会导致TypeError。但这是一个好主意。事实上。我想我们可以假设None的价格可以被视为零。因此，数据在使用前需要进行处理。我编辑我的帖子就是为了反映这一点。不像你想的那么简单：）我们不知道所有的细节。案件越复杂，它就越复杂！你测试过了吗？它有效吗？想象一下，在文件的某个地方，有人用“五”代替了“五”。）太棒了，我知道我需要一个或多个dict，但不确定如何根据键“CountryCode”进行比较或添加。我在其中添加了一些逻辑，将无转换为0 int。我添加了一个完整的脚本。数据的类型确实不重要