使用Python查找CSV文件的标准偏差

使用Python查找CSV文件的标准偏差,python,csv,Python,Csv,我有一个名为“salarys.CSV”的CSV文件,文件内容如下: City,Job,Salary Delhi,Doctors,500 Delhi,Lawyers,400 Delhi,Plumbers,100 London,Doctors,800 London,Lawyers,700 London,Plumbers,300 Tokyo,Doctors,900 Tokyo,Lawyers,800 Tokyo,Plumbers,400 Lawyers,Doctors,300 Lawyers,Lawy

我有一个名为“salarys.CSV”的CSV文件,文件内容如下:

City,Job,Salary
Delhi,Doctors,500
Delhi,Lawyers,400
Delhi,Plumbers,100
London,Doctors,800
London,Lawyers,700
London,Plumbers,300
Tokyo,Doctors,900
Tokyo,Lawyers,800
Tokyo,Plumbers,400
Lawyers,Doctors,300
Lawyers,Lawyers,400
Lawyers,Plumbers,500
Hong Kong,Doctors,1800
Hong Kong,Lawyers,1100
Hong Kong,Plumbers,1000
Moscow,Doctors,300
Moscow,Lawyers,200
Moscow,Plumbers,100
Berlin,Doctors,800
Berlin,Plumbers,900
Paris,Doctors,900
Paris,Lawyers,800
Paris,Plumbers,500
Paris,Dog catchers,400
我需要打印每个职业工资的标准差。
这是Python的旧版本。无法使用统计信息和numpy

from __future__ import with_statement
import math
import csv
with open("salaries.csv") as f:
  def average(f): return sum(f) * 1.0 / len(f)
variance = map(lambda x: (x - avg)**2, f)
standard_deviation = math.sqrt(average(variance))
print standard_deviation
谁能帮帮我,我是python领域的新手

Error : TypeError('argument 2 to map() must support iteration',)
输出应该是

管道工311 律师286 医生448

一些注意事项:

  • Python中有内置函数来获取数字列表的长度、最小值和最大值(
    len
    min
    max

  • 如果您使用的是Python>=3.4.0,则有一个名为
    statistics
    的模块,可以帮助您计算列表的平均值和标准偏差

  • 在salaries.csv旁边创建stdev.py文件

    from statistics import mean, stdev
    f = open("salaries.csv", 'r')
    
    # Remove the first line City,Job,Salary
    f.readline()
    
    # Create the list of salaries 
    salaries = []
    for line in f.readlines():
      # After splitting the line, take the last element, remove extra spaces and cast it to int.
      value = int(line.split(',')[-1].strip())
      # Add the value to the salaries list.
      salaries.append(value)
    # min and max return the minimum and the maximum value of the list.
    print min(salaries), max(salaries)   
    print mean(salaries), stdev(salaries)  
    f.close()
    
    对于Python2.x:

    from __future__ import with_statement
    from math import sqrt
    with open('salaries.csv') as f:
      f.readline()
      # Create the list of salaries 
      salaries = []
      for line in f.readlines():
        value = int(line.split(',')[-1].strip())
        salaries.append(value)
      print min(salaries), max(salaries)   
      n = float(len(salaries))
      mean = sum(salaries)/n
      stdev = 0
      for value in salaries:
        stdev += (value - mean)**2
      stdev = sqrt(stdev/(n))
      print mean, stdev
    

    您可以为每个文件创建一个字典,并将薪资列表映射到各个职业。然后在最后使用自己的函数或numpy.mean和numpy.std进行计算:

    >>> import csv
    >>> from collections import defaultdict
    >>> from numpy import std, mean
    >>>
    >>> profession_to_salaries = defaultdict(list)
    >>>
    >>> with open('salaries.csv', 'rb') as csvfile:
    ...   reader = csv.DictReader(csvfile)
    ...   for row in reader:
    ...     profession_to_salaries[row['Job']].append(float(row['Salary']))
    ...
    >>> for profession, salaries in profession_to_salaries.items():
    ...   print profession, min(salaries), max(salaries), mean(salaries), std(salaries)
    ...
    Plumbers 100.0 1000.0 475.0 311.24748995
    Lawyers 200.0 1100.0 628.571428571 286.427680797
    Dog catchers 400.0 400.0 400.0 0.0
    Doctors 300.0 1800.0 787.5 448.434777866
    

    对于python 2.4:


    要获取每个职业的详细信息,请创建字典:

    from __future__ import with_statement
    import math
    
    def get_stats(profession, salaries):   
      n = float(len(salaries))
      mean = sum(salaries)/n
      stdev = 0
      for value in salaries:
        stdev += (value - mean)**2
      stdev = math.sqrt(stdev/(n))
      print profession, min(salaries), max(salaries), mean, stdev
    
    with open('salaries.csv') as f:
      f.readline()
      # Create the list of salaries 
      salaries = {} 
      for line in f.readlines():
        country, profession, value = line.split(',')
        value = int(value.strip())
        profession = profession.strip()
        if salaries.has_key(profession):
            salaries[profession].append(value)
        else:
            salaries[profession] = [value]
      for k,v in salaries.items():
        get_stats(k,v)  
    
    代码:

    from __future__ import with_statement
    import math
    import csv
    
    
    def std_dev(v):
        avg = sum([int(sal) for (city, job, sal) in v])/len(v)
        var = sum(map(lambda x: (int(x[-1]) - avg)**2, v))/len(v)
        return math.sqrt(var)
    
    tups = []
    with open("try.csv") as f:
        rdr = csv.reader(f, delimiter='\n')
        for line in rdr:
            tups.append(tuple(line[0].split(',')))
    tups = tups[1:]
    
    d = {}
    for (city, job, sal) in tups:
        d.setdefault(job, []).append((city, job, sal))
    
    for k, v in d.items():
        print k, std_dev(v)
    

    向我们显示您的代码并显示您收到的错误。@grantmcconnaughey添加了dun indent
    count+=1
    以及之后的所有内容。在Python中,空格和缩进很重要。你有太多了。@grantmcconnaughey你能帮我为这个程序编写一个较小的代码吗。我知道这肯定是可以做到的。只是不适合我。你为什么希望你的代码更短?我使用的是Python 2.4,它是一个在线编译器,用于提交代码。你能用math import sqrt import csv…#的_语句从uuu future_uuuImport为Python编写代码吗代替打印平均值(salaries),stdev(salaries)的open(“salaries.csv”)为f:n=float(len(f))mean=sum(f)/n stdev=0表示salaries中的值:stdev+=(value-mean)**2 stdev=sqrt(stdev/(n-1))打印平均值,stdevSyntaxError('invalid syntax',('',4,1',…\n'))with语句不是这样工作的
    with
    处理文件,这样您就不必担心以后关闭它,但它不会为您解析文件。您仍然需要解析salaries.csv中的所有行并创建工资列表。从数学导入sqrt导入csv n=float(len('salaries.csv'))mean=sum('salaries.csv')/n stdev=0表示工资中的值:stdev+=(value-mean)**2 stdev=sqrt(stdev/(n-1))打印平均值,stdevindentionerror('unexpected indented indent indented'),('',8,3,“使用open('salarys.csv','rb')作为csvfile:\n”))我该怎么办?ImportError('No module named numpy',)是一个使用旧版本的在线编译器。有什么帮助吗?如果你不能安装numpy,请提供你自己的mean和std dev函数。这可能有助于:编辑帖子。你现在可以检查它吗?NameError(“未定义全局名称'sqrt'”)我正在运行代码,它运行正常。请检查缩进和导入语句。谢谢,删除sqrt并放置`stdev=(stdev/(n))**0.5`
    from __future__ import with_statement
    import math
    import csv
    
    
    def std_dev(v):
        avg = sum([int(sal) for (city, job, sal) in v])/len(v)
        var = sum(map(lambda x: (int(x[-1]) - avg)**2, v))/len(v)
        return math.sqrt(var)
    
    tups = []
    with open("try.csv") as f:
        rdr = csv.reader(f, delimiter='\n')
        for line in rdr:
            tups.append(tuple(line[0].split(',')))
    tups = tups[1:]
    
    d = {}
    for (city, job, sal) in tups:
        d.setdefault(job, []).append((city, job, sal))
    
    for k, v in d.items():
        print k, std_dev(v)