Python 循环完成后存储数组的每次迭代
我对Python非常陌生。我已经广泛地寻找解决我问题的办法,但我左右为难 我使用以下代码生成了一系列数组:Python 循环完成后存储数组的每次迭代,python,arrays,numpy,matrix,Python,Arrays,Numpy,Matrix,我对Python非常陌生。我已经广泛地寻找解决我问题的办法,但我左右为难 我使用以下代码生成了一系列数组: fh = open(short_seq, 'r') line_counter = 0 pos = [0] array = [0.0 for x in range(101)] for line in fh: line_counter += 1.0 for i in line: score = ord(i) - 33.0 array[pos
fh = open(short_seq, 'r')
line_counter = 0
pos = [0]
array = [0.0 for x in range(101)]
for line in fh:
line_counter += 1.0
for i in line:
score = ord(i) - 33.0
array[pos] += score
pos += 1
在循环内部打印之后,我得到了一个大系列的数组
[1,2,3,4.....]
[2,3,4,5,6.....]
[3,4,5,6,7,8.....100]
...
我想使用NumPy在每一列上运行stats,它们以特定的对齐方式打印出来,但是一旦我在循环之外,我只能调用整个循环的总和。我尝试了np.concatenate,但仍然留下了数组的和。如果我在循环中使用NumPy,那么我只能在每个列上运行stats,一次迭代一次,而不是整个系列。我的下一个想法是将每次迭代都添加到二维矩阵中,但我不知道如何保持对齐
任何帮助都将不胜感激
编辑:这是我的数据示例(在文本编辑器中,四个字符串中的每一个都位于另一个字符串的正下方)。我正在尝试将几千行ascii转换为数值。每一行必须在一个100个字符长的数组中,然后我需要在每一列上运行stats
CCCFFFHHHHHHHHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIIfGfgiiIHGHGHGHGHEHHFDFFFFFDDDDDBDDDDDDDEEDD
CCCFFFHHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
CCCFFFHHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<
BCCFFFDFHHHHHJJJJJJJJJJJIIJJJI@HGIIIJJJJJIJJIJIIJJJJJJJJJHHHHHHFFFDDDDDDDDDDDDDDDD?BDDDD@CDDDDDBDDDDD
array = [0.0 for x in range(101)]
这是一份清单array=np。零((101,),浮点)
是一个大小相同的数组
使用fh中的行:
可以得到一行,一个字符串。我希望行中的I迭代该字符串中的字符。这真的是你想要的吗
for i in line:
score = ord(i) - 33.0
array[pos] += score
pos += 1
通常,当人们阅读文本文件时,他们希望列的值用空格或逗号分隔,例如
123, 345, 344, 233
343, 342, 343, 343
我们使用lines.split(',')
将字符串拆分为子字符串。和float
或int
将它们转换为数字,例如
data = [float(substring) for substring in line.split(',')]
向我们展示您的一些数据文件或简化版本。这将更容易帮助。一个关键问题是,跨行的“列”数量是否一致
通常,当我们迭代数组的行时,我们会在列表中收集行值。如果子列表中的元素数量一致,我们可以将其转换为2d数组
lines = []
for line in fh:
data = [float(i) for i in line.split(',')]
lines.append(data)
print(lines)
# A = np.array(lines)
===============================
通过您的样品线,我可以做到:
In [258]: with open('stack38175089.txt') as f:
lines=f.readlines()
.....:
In [259]: [len(l) for l in lines]
Out[259]: [102, 102, 102, 102]
In [260]: data=np.array([[ord(i) for i in l.strip()] for l in lines])
In [261]: data.shape
Out[261]: (4, 101)
In [262]: data
Out[262]:
array([[67, 67, 67, 70, 70, 70, 70, 70, 72, 72, 72, 72, 72, 73, 74, 74, 74,
74, 74, 74, 73, 74, 74, 74, 74, 74, 74, 74, 74, 73, 74, 74, 74, 73,
74, 74, 74, 74, 74, 74, 74, 73, 74, 74, 73, 74, 74, 71, 73, 73, 73,
72, 73, 73, 73, 70, 71, 73, 71, 70, 72, 70, 71, 73, 73, 73, 72, 73,
72, 72, 71, 69, 72, 72, 70, 68, 70, 70, 70, 70, 70, 68, 68, 68, 68,
68, 66, 68, 68, 68, 68, 68, 68, 68, 68, 69, 68, 69, 69, 68, 68],
...
[66, 67, 67, 70, 70, 70, 68, 70, 72, 72, 72, 72, 72, 74, 74, 74, 74,
74, 74, 74, 74, 74, 74, 74, 73, 73, 74, 74, 74, 73, 64, 72, 71, 73,
73, 73, 74, 74, 74, 74, 74, 73, 74, 74, 73, 74, 73, 73, 74, 74, 74,
74, 74, 74, 74, 74, 74, 72, 72, 72, 72, 72, 72, 70, 70, 70, 68, 68,
68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 68, 63, 66, 68,
68, 68, 68, 64, 67, 68, 68, 68, 68, 68, 66, 68, 68, 68, 68, 68]])
使用这样的2d数组,我可以轻松地移动值(-33
),并对行或列应用统计计算
我可以单独阅读这些行,并在一系列列表中收集这些值。但是这个示例,我怀疑您的整个文件,足够小,可以使用readlines
尝试numpy.sum(array,axis=0)
。感谢您的回复。原始数据(ascii字符)在文件的各行中是一致的,但是,当我转换字符并开始在循环中填充数组时,它是倾斜的,但仅在开始时。我将示例加载到2d数组中。