Python 来自给定输入文件的列统计信息？_Python_List_Matrix

Python 来自给定输入文件的列统计信息？

python list matrix

Python 来自给定输入文件的列统计信息？,python,list,matrix,Python,List,Matrix,我得到了一个.txt文件的数据： 1,2,3,0,0 1,0,4,5,0 1,1,1,1,1 3,4,5,6,0 1,0,1,0,3 3,3,4,0,0 我的目标是计算给定数据的列的最小值、最大值、平均值、范围和中值，并将其写入output.txt文件我处理这个问题的逻辑如下步骤1）读取数据 infile = open("Data.txt", "r") tempLine = infile.readline() while tempLine: print(tempLine.split(

我得到了一个.txt文件的数据：

1,2,3,0,0
1,0,4,5,0
1,1,1,1,1
3,4,5,6,0
1,0,1,0,3
3,3,4,0,0

我的目标是计算给定数据的列的最小值、最大值、平均值、范围和中值，并将其写入output.txt文件

我处理这个问题的逻辑如下

步骤1）读取数据

infile = open("Data.txt", "r")
tempLine = infile.readline()
while tempLine:
   print(tempLine.split(','))
   tempLine = infile.readline()

显然，这并不完美，但我们的想法是，数据可以通过这个

步骤2）是否将数据存储到相应的列表变量中？第1行，第2行，。。。第6行

步骤3）将以上所有列表合并为一个列表，给出如下最终列表

flist =[[1,2,3,0,0],[1,0,4,5,0],[1,1,1,1,1],[3,4,5,6,0],[1,0,1,0,3],[3,3,4,0,0]]

步骤4）使用嵌套for循环，单独访问元素并将它们存储到列表变量中

col1，col2，col3，col5

步骤5）计算最小值、最大值等并写入输出文件

我的问题是，以我对计算机科学和python的初级知识，这种逻辑是否效率低下，是否可能有一种更简单、更好的逻辑来解决这个问题

我的主要问题可能是步骤2到步骤5。剩下的我肯定知道怎么做

任何建议都会有帮助

要获取数据，我需要这样做：

from statistics import median
infile = open("Data.txt", "r")
rows = [line.split(',') for line in infile.readlines()]
for row in rows:
    minRow = min(row)
    maxRow = max(row)
    avgRow = sum(row) / len(row)
    rangeRow = maxRow - minRow
    medianRow = median(row)
    #then write the data to the output file

要获取数据，我需要这样做：

from statistics import median
infile = open("Data.txt", "r")
rows = [line.split(',') for line in infile.readlines()]
for row in rows:
    minRow = min(row)
    maxRow = max(row)
    avgRow = sum(row) / len(row)
    rangeRow = maxRow - minRow
    medianRow = median(row)
    #then write the data to the output file

试试numpy。Numpy库在处理列表或矩阵中的嵌套列表时提供了快速选项

要使用numpy，必须在代码开头导入numpy

numpy.matrix(1,2,3,0,0;1,0,4,5,0;....;3,3,4,0,0)

我会给你

flist=[[1,2,3,0,0]，[1,0,4,5,0]，[1,1,1,1,1,1]，[3,4,5,6,0]，[1,0,1,0,3]，[3,3,4,0,0]

此外，您可以通过轴（在本例中为行）查看，并使用

max([axis, out])    Return the maximum value along an axis.
mean([axis, dtype, out])    Returns the average of the matrix elements along the given axis.
min([axis, out])    Return the minimum value along an axis.

这是一个numpy文档，有关详细信息，请阅读numpy文档。

试试numpy。Numpy库在处理列表或矩阵中的嵌套列表时提供了快速选项

要使用numpy，必须在代码开头导入numpy

numpy.matrix(1,2,3,0,0;1,0,4,5,0;....;3,3,4,0,0)

我会给你

flist=[[1,2,3,0,0]，[1,0,4,5,0]，[1,1,1,1,1,1]，[3,4,5,6,0]，[1,0,1,0,3]，[3,3,4,0,0]

此外，您可以通过轴（在本例中为行）查看，并使用

max([axis, out])    Return the maximum value along an axis.
mean([axis, dtype, out])    Returns the average of the matrix elements along the given axis.
min([axis, out])    Return the minimum value along an axis.

这是一个numpy文档，因此有关更多信息，请阅读numpy文档。

您可以使用pandas库进行此（）

下面的代码适用于我：

import pandas as pd
df = pd.read_csv('data.txt',header=None)
somestats = df.describe()
somestats.to_csv('dataOut.txt')

您可以将pandas库用于此（）

下面的代码适用于我：

import pandas as pd
df = pd.read_csv('data.txt',header=None)
somestats = df.describe()
somestats.to_csv('dataOut.txt')

如果有人好奇，我就是这样做的

import numpy

infile = open("Data1.txt", "r")
outfile = open("ColStats.txt", "w")

oMat = numpy.loadtxt(infile)
tMat = numpy.transpose(oMat) #Create new matrix where Columns of oMat becomes rows and rows become columns

#print(tMat)

for x in range (5):
    tempM = tMat[x]

    mn = min(tempM)
    mx = max(tempM)
    avg = sum(tempM)/6.0
    rng = mx - mn
    median = numpy.median(tempM)

    out = ("[{} {} {} {} {}]".format(mn, mx, avg, rng, median))
    outfile.write(out + '\n')

infile.close()
outfile.close()

#print(tMat)

如果有人好奇，我就是这样做的

import numpy

infile = open("Data1.txt", "r")
outfile = open("ColStats.txt", "w")

oMat = numpy.loadtxt(infile)
tMat = numpy.transpose(oMat) #Create new matrix where Columns of oMat becomes rows and rows become columns

#print(tMat)

for x in range (5):
    tempM = tMat[x]

    mn = min(tempM)
    mx = max(tempM)
    avg = sum(tempM)/6.0
    rng = mx - mn
    median = numpy.median(tempM)

    out = ("[{} {} {} {} {}]".format(mn, mx, avg, rng, median))
    outfile.write(out + '\n')

infile.close()
outfile.close()

#print(tMat)

顺便问一下，你是说我可以访问列，如果我使用说。。。y？

flist.max（0）

将按列进行搜索，为您提供一行中所有列的最大值：

[3,4,5,6,3]

，例如上面的例子

flist.max（1）

将生成按行搜索，以

[3]、[5]、[1]、[6]、[3]、[4]

的形式为一列中的所有行提供最大值。我是否可以通过直接读取data.txt来创建numpy矩阵？例如numpy.matrix（readline…等）使用numpy是完美的，我能够转换矩阵，使列变成行，然后像以前一样计算统计数据。谢谢我很高兴我能帮上忙！顺便问一下，你是说我可以访问列，如果我使用说。。。y？

flist.max（0）

将按列进行搜索，为您提供一行中所有列的最大值：

[3,4,5,6,3]

，例如上面的例子

flist.max（1）

将生成按行搜索，以

[3]、[5]、[1]、[6]、[3]、[4]

的形式为一列中的所有行提供最大值。我是否可以通过直接读取data.txt来创建numpy矩阵？例如numpy.matrix（readline…等）使用numpy是完美的，我能够转换矩阵，使列变成行，然后像以前一样计算统计数据。谢谢我很高兴我能帮上忙！如果所有行的列数相同，则可以通过一次读取一行文件，以增量方式计算所有度量，而不会有太多麻烦。最好不要从每行的数据中创建单独的变量（

row1

，

row2

，…

row5

）。您不需要这样做（甚至不需要Python 3.4中引入的

statistic

模块）。或者，如果文件不太大，您可以将其全部读入内存，这样就不需要以增量方式处理（因此更容易）。如果所有行的列数相同，则可以通过一次读取一行文件，以增量方式计算所有度量，而不必过多麻烦。最好不要从每行的数据中创建单独的变量（

row1

，

row2

，…

row5

）。您不需要这样做（甚至不需要Python 3.4中引入的

statistic

模块）。或者，如果文件不是太大，您可以将其全部读入内存，这样就不需要以增量方式处理（因此更容易）。