用python从文本文件中读取数字_Python_Numpy_Text Files

用python从文本文件中读取数字

python numpy

用python从文本文件中读取数字,python,numpy,text-files,Python,Numpy,Text Files,我有一个很大的文本文件，正如你在下面看到的，包括字符串和数字。我只想读取数字，还想删除只有3列的行，并将它们写入矩阵（m×n）。有人能告诉我python中操作这些文件的最佳方法是什么吗我的文件类似于： # Chunk-averaged data for fix Dens and group ave # Timestep Number-of-chunks Total-count # Chunk Coord1 Ncount density/number 4010000 14 1500 1 4.

我有一个很大的文本文件，正如你在下面看到的，包括字符串和数字。我只想读取数字，还想删除只有3列的行，并将它们写入矩阵（m×n）。有人能告诉我python中操作这些文件的最佳方法是什么吗

我的文件类似于：

# Chunk-averaged data for fix Dens and group ave
# Timestep Number-of-chunks Total-count
# Chunk Coord1 Ncount density/number
4010000 14 1500
  1 4.323 138.758 0.00167105
  2 12.969 121.755 0.00146629
  3 21.615 127.7 0.00153788
  4 30.261 131.682 0.00158584
  5 38.907 127.525 0.00153578
  6 47.553 136.322 0.00164172
  7 56.199 118.014 0.00142124
  8 64.845 125.842 0.00151551
  9 73.491 120.684 0.00145339
  10 82.137 132.282 0.00159306
  11 90.783 121.567 0.00146402
  12 99.429 97.869 0.00117863
  13 108.075 0 0
  14 116.721 0 0......

您还没有指定矩阵的确切含义，所以这里有一个解决方案，它可以将文本文件转换为二维列表，使每个数字都可以单独访问

它检查给定行中的第一个项目是否为数字，以及该行中是否有4个项目，在这种情况下，它会将该行作为4个单独的数字附加到2d列表

mat

。如果您想访问

mat

中的任何号码，可以使用

mat[i][j]

with open("test.txt") as f:
    content = f.readlines()

content = [x.strip() for x in content]
mat = []

for line in content:
    s = line.split(' ')
    if s[0].isdigit() and len(s) == 4:
        mat.append(s)

您还没有指定矩阵的确切含义，所以这里有一个解决方案，它可以将文本文件转换为二维列表，使每个数字都可以单独访问

它检查给定行中的第一个项目是否为数字，以及该行中是否有4个项目，在这种情况下，它会将该行作为4个单独的数字附加到2d列表

mat

。如果您想访问

mat

中的任何号码，可以使用

mat[i][j]

with open("test.txt") as f:
    content = f.readlines()

content = [x.strip() for x in content]
mat = []

for line in content:
    s = line.split(' ')
    if s[0].isdigit() and len(s) == 4:
        mat.append(s)

对于您的任务，您需要迭代器string.split（）和re.match：

import re #needed to use regexp to see if line in file contains only numbers

matrix = [] #here we'll put your numbers
i = 0 #counter for matrix rows

for line in open('myfile.txt'): #that will iterate lines in file one by one
    if not re.match('[ 0-9\.]', line): #checking for symbols other than numbers in line
        continue #and skipping an iteration if there are any

    list_of_items = line.split(' ') #presumed numbers in string are divided with spaces - splittin line into list of separate strings
    if len(list_of_items) <= 3: #we will not take ro of 3 or less into matrix
        continue

    matrix.append([]) #adding row to matrix

    for an_item in list_of_items:
        matrix[i].append(float(an_item)) #converting strings and adding floats to a row
    i += 1

import re#需要使用regexp查看文件中的行是否只包含数字
矩阵=[]#这里我们将输入您的数字
i=0#矩阵行计数器
对于打开的行（'myfile.txt'）：#将逐个迭代文件中的行
如果不重新匹配（“[0-9\.]”，第行）：#检查第行中是否有数字以外的符号
继续#并跳过迭代（如果有）
list_of_items=line.split（“”）#字符串中的假定数字用空格分隔-将行拆分为单独字符串的列表
如果任务的len（项列表），则需要迭代器、string.split（）和re.match：
import re #needed to use regexp to see if line in file contains only numbers

matrix = [] #here we'll put your numbers
i = 0 #counter for matrix rows

for line in open('myfile.txt'): #that will iterate lines in file one by one
    if not re.match('[ 0-9\.]', line): #checking for symbols other than numbers in line
        continue #and skipping an iteration if there are any

    list_of_items = line.split(' ') #presumed numbers in string are divided with spaces - splittin line into list of separate strings
    if len(list_of_items) <= 3: #we will not take ro of 3 or less into matrix
        continue

    matrix.append([]) #adding row to matrix

    for an_item in list_of_items:
        matrix[i].append(float(an_item)) #converting strings and adding floats to a row
    i += 1

import re#需要使用regexp查看文件中的行是否只包含数字
矩阵=[]#这里我们将输入您的数字
i=0#矩阵行计数器
对于打开的行（'myfile.txt'）：#将逐个迭代文件中的行
如果不重新匹配（“[0-9\.]”，第行）：#检查第行中是否有数字以外的符号
继续#并跳过迭代（如果有）
list_of_items=line.split（“”）#字符串中的假定数字用空格分隔-将行拆分为单独字符串的列表
如果len（项目列表）将样本复制粘贴到txt
：
In [350]: np.genfromtxt(txt.splitlines(), invalid_raise=False)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
    Line #2 (got 4 columns instead of 3)
    Line #3 (got 4 columns instead of 3)
  ....
  #!/usr/bin/python3
Out[350]: array([4.01e+06, 1.40e+01, 1.50e+03])

阅读第一行非注释行，并以此作为标准。跳过这一步，我可以阅读所有行：
In [351]: np.genfromtxt(txt.splitlines(), invalid_raise=False,skip_header=4)
Out[351]: 
array([[1.00000e+00, 4.32300e+00, 1.38758e+02, 1.67105e-03],
       [2.00000e+00, 1.29690e+01, 1.21755e+02, 1.46629e-03],
       [3.00000e+00, 2.16150e+01, 1.27700e+02, 1.53788e-03],
       [4.00000e+00, 3.02610e+01, 1.31682e+02, 1.58584e-03],
       [5.00000e+00, 3.89070e+01, 1.27525e+02, 1.53578e-03],
       [6.00000e+00, 4.75530e+01, 1.36322e+02, 1.64172e-03],
       [7.00000e+00, 5.61990e+01, 1.18014e+02, 1.42124e-03],
       [8.00000e+00, 6.48450e+01, 1.25842e+02, 1.51551e-03],
       [9.00000e+00, 7.34910e+01, 1.20684e+02, 1.45339e-03],
       [1.00000e+01, 8.21370e+01, 1.32282e+02, 1.59306e-03],
       [1.10000e+01, 9.07830e+01, 1.21567e+02, 1.46402e-03],
       [1.20000e+01, 9.94290e+01, 9.78690e+01, 1.17863e-03],
       [1.30000e+01, 1.08075e+02, 0.00000e+00, 0.00000e+00],
       [1.40000e+01, 1.16721e+02, 0.00000e+00, 0.00000e+00]])

实际上，在这种情况下，所有其余的都有所需的4。如果我截断最后两行，我会得到警告，但它仍会读取其他行
在将行传递到genfromtxt
之前过滤行是另一个选项genfromtxt
接受任何为其提供行的输入—文件、字符串列表或读取和过滤文件的函数。
将示例复制粘贴到txt
：
In [350]: np.genfromtxt(txt.splitlines(), invalid_raise=False)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
    Line #2 (got 4 columns instead of 3)
    Line #3 (got 4 columns instead of 3)
  ....
  #!/usr/bin/python3
Out[350]: array([4.01e+06, 1.40e+01, 1.50e+03])

阅读第一行非注释行，并以此作为标准。跳过这一步，我可以阅读所有行：
In [351]: np.genfromtxt(txt.splitlines(), invalid_raise=False,skip_header=4)
Out[351]: 
array([[1.00000e+00, 4.32300e+00, 1.38758e+02, 1.67105e-03],
       [2.00000e+00, 1.29690e+01, 1.21755e+02, 1.46629e-03],
       [3.00000e+00, 2.16150e+01, 1.27700e+02, 1.53788e-03],
       [4.00000e+00, 3.02610e+01, 1.31682e+02, 1.58584e-03],
       [5.00000e+00, 3.89070e+01, 1.27525e+02, 1.53578e-03],
       [6.00000e+00, 4.75530e+01, 1.36322e+02, 1.64172e-03],
       [7.00000e+00, 5.61990e+01, 1.18014e+02, 1.42124e-03],
       [8.00000e+00, 6.48450e+01, 1.25842e+02, 1.51551e-03],
       [9.00000e+00, 7.34910e+01, 1.20684e+02, 1.45339e-03],
       [1.00000e+01, 8.21370e+01, 1.32282e+02, 1.59306e-03],
       [1.10000e+01, 9.07830e+01, 1.21567e+02, 1.46402e-03],
       [1.20000e+01, 9.94290e+01, 9.78690e+01, 1.17863e-03],
       [1.30000e+01, 1.08075e+02, 0.00000e+00, 0.00000e+00],
       [1.40000e+01, 1.16721e+02, 0.00000e+00, 0.00000e+00]])

实际上，在这种情况下，所有其余的都有所需的4。如果我截断最后两行，我会得到警告，但它仍会读取其他行
在将行传递到genfromtxt
之前过滤行是另一个选项genfromtxt
接受任何为其提供行的输入—文件、字符串列表或读取和过滤文件的函数。
使用正则表达式提取数字！只是标题行只有三个数字，还是像这样的行会再次出现？如果是前者，只需打开文件，跳过前四行，然后让numpy读取其余的内容。如果是后者，只需让numpy使用nan fill读取整个内容，然后选择没有任何列为nan的行。逐行读取，如果有字符跳过，如果没有，则将其转换为列表如果只有4个元素（3列和一个索引列），则跳过，否则添加到dataframe@ᴀʀᴍᴀɴ它认为regex将是极大的矫枉过正！从numpy
：）中有很多很棒的方法可以使用正则表达式提取数字！只是标题行只有三个数字，还是像这样的行会再次出现？如果是前者，只需打开文件，跳过前四行，然后让numpy读取其余的内容。如果是后者，只需让numpy使用nan fill读取整个内容，然后选择没有任何列为nan的行。逐行读取，如果有字符跳过，如果没有，则将其转换为列表如果只有4个元素（3列和一个索引列），则跳过，否则添加到dataframe@ᴀʀᴍᴀɴ它认为regex将是极大的矫枉过正！numpy
：）中有很多很棒的方法