Python 2.7 如何从文本文件解析数组/矩阵，同时保留其结构？_Python 2.7

Python 2.7 如何从文本文件解析数组/矩阵，同时保留其结构？

python-2.7

Python 2.7 如何从文本文件解析数组/矩阵，同时保留其结构？,python-2.7,Python 2.7,我是python新手，正在尝试编写一个脚本来解析下面文本文件中每对标题之间包含的数据 <Keywords> GTO </Keywords> <NumberofNuclei> 2 </NumberofNuclei> <Nuclear Charges> 6.0 8.0 1.0 1.0 </Nuclear Charges> <Primitive Exponents> 8.264000000000e+01 1.24

我是python新手，正在尝试编写一个脚本来解析下面文本文件中每对标题之间包含的数据

<Keywords> 
GTO
</Keywords>
<NumberofNuclei>
2
</NumberofNuclei>
<Nuclear Charges> 
6.0
8.0
1.0
1.0
</Nuclear Charges>
<Primitive Exponents>
8.264000000000e+01  1.241000000000e+01  2.824000000000e+00   
8.989000000000e-02  2.292000000000e+00  2.292000000000e+00  
8.380000000000e-01  8.380000000000e-01  2.920000000000e-01 
</Primitive Exponents>


GTO
2.
6
8
1
1
8.2640000000E+01 1.2410000000E+01 2.824000000000e+00
8.98900000000E-02 2.2920000000000E+00 2.2920000000000E+00
8.380000000000e-01 8.380000000000e-01 2.92000000000E-01

我尝试的代码可以做到这一点，如下所示。然而，我很难在原始指数下解析矩阵，同时保留其3x3结构。我不想把它写成清单

with open('toysystem.txt','r') as f:
 data = f.read()
 nc = re.findall(r'<Nuclear Charges>(.*?)</Nuclear Charges>',data,re.DOTALL)
 nc1 = [elem.replace('\n',',').strip(',') for elem in nc]
 non = re.findall(r'<NumberofNuclei>(.*?)</NumberofNuclei>',data,re.DOTALL)
 non1 = int("".join(map(str, non)))
 kw = re.findall(r'<Keywords>(.*?)</Keywords>',data,re.DOTALL)
 kw1 = "".join(map(str, kw)).replace('\n','')
 pe = np.array(re.findall(r'<Primitive Exponents>(.*?)</Primitive Exponents>',data,re.DOTALL))

以open（'toysystem.txt'，'r'）作为f的

：
data=f.read（）
nc=re.findall（r'（.*？），数据，re.DOTALL）
nc1=[elem.replace（'\n'，'，'）。nc中elem的条带（''，'）]
non=re.findall（r'（.*？），数据，re.DOTALL）
non1=int（“.”join（映射（str，non）））
kw=re.findall（r'（*？），数据，re.DOTALL）
kw1=“”.join（map（str，kw））.replace（'\n'，''）
pe=np.array（re.findall（r'（.*？），data，re.DOTALL））

关于如何解决这个问题/修改我的代码以按原样提取矩阵（3x3数组）有什么想法吗

谢谢大家!

您在这方面遇到了问题，因为

findall（）

返回一个列表，在本例中是一个包含一个元素的列表，而

numpy.array（）

合理地将1-list转换为一个包含一个元素的数组。在要求numpy将字符串转换为数组之前，需要先将字符串转换为3x3矩阵

pe_match = re.search(r'<Primitive Exponents>(.*?)</Primitive Exponents>',data,re.DOTALL)
#  Extract the numbers from the enclosing tags and split into 3 lines
matrix = pe_match.group(1).strip("\n").split("\n")
#  Turn each line into a 3-list
matrix2 = [m.split() for m in matrix]
pe = np.array(matrix2)

当然，在使用它之前需要进一步转换，但您的问题是关于矩阵结构。

您遇到了问题，因为

findall（）

返回一个列表，在本例中是一个包含一个元素的列表，

numpy.array（）

合理地将一个1-list转换为一个包含一个元素的数组。在要求numpy将字符串转换为数组之前，需要先将字符串转换为3x3矩阵

pe_match = re.search(r'<Primitive Exponents>(.*?)</Primitive Exponents>',data,re.DOTALL)
#  Extract the numbers from the enclosing tags and split into 3 lines
matrix = pe_match.group(1).strip("\n").split("\n")
#  Turn each line into a 3-list
matrix2 = [m.split() for m in matrix]
pe = np.array(matrix2)

这当然需要进一步的转换才能使用，但你的问题是关于矩阵结构