在python中导入.dat文件而不知道其结构

在python中导入.dat文件而不知道其结构,python,import,Python,Import,我正在尝试加载并查看可从下载的数据内容。之后我需要分析它。在这方面,我已经做到了,但我找不到任何解决办法 现在,我查看了他们的标签文件。其中提到 “将编写有用的基于Python的字母来描述每个对象 //see获取代码//格式将以“RJW”开头的逗号分隔,然后作为键//{NAME}、{FORMAT}、{Number of Dim}、{Size Dim 1}、{Size Dim 2}、//,其中{FORMAT}是该类型的Python代码,即 对于uint32//,尺寸标注的数量与 尺寸。” 所以,我

我正在尝试加载并查看可从下载的数据内容。之后我需要分析它。在这方面,我已经做到了,但我找不到任何解决办法

现在,我查看了他们的标签文件。其中提到

“将编写有用的基于Python的字母来描述每个对象
//see获取代码//格式将以“RJW”开头的逗号分隔,然后作为键//{NAME}、{FORMAT}、{Number of Dim}、{Size Dim 1}、{Size Dim 2}、//,其中{FORMAT}是该类型的Python代码,即 对于uint32//,尺寸标注的数量与 尺寸。”

所以,我想我们可以试试python。我确实有python方面的实用知识。因此,我从这个程序开始(为了简单起见,python文件和数据文件在同一个文件夹中):

我收到错误“UnicodeDecodeError:'cp949'编解码器无法解码位置65:非法多字节序列中的字节0xff”。

如果我将代码更改为:

错误消息消失,但我得到的是:

我在StackOverflow中检查了其他答案,但没有得到任何答案。我的问题可能与贴出的内容非常相似

我需要首先看到这个dat文件的内容,然后导出到其他格式,比如.csv


非常感谢您的帮助。

您需要以二进制模式打开文件

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    while True:
        chunk = f.read(160036) # that is record size as per LBL file
            # because the file is huge it will expect to hit Enter
            # to display next chunk. Use Ctrl+C to interrupt
        print(chunk)
        input('Hit Enter...')
注意,您可以解析LBL文件,构造用于
struct
模块的格式字符串,并将每个块解析为有意义的字段。这就是你引用的评论所说的

"""Example of reading NASA JUNO JADE CALIBRATED SCIENCE DATA
https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/JNO-J_SW-JAD-3-CALIBRATED-V1.0/DATA/2018/2018091/ELECTRONS/JAD_L30_LRS_ELC_ANY_CNT_2018091_V03&o=1
https://stackoverflow.com/a/66687113/4046632
"""

import struct
from functools import reduce
from operator import mul
from collections import namedtuple

__author__ = "Boyan Kolev, https://stackoverflow.com/users/4046632/buran"

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.LBL') as f:
    rjws = [line.strip('\n/* ') for line in f if line.startswith('/* RJW')]

# create the format string for struct
rjws = rjws[2:] # exclude first 2 RJW comments related to file itself
names = []
FMT = '='
print(f'Number of objects: {len(rjws)}')
for idx, rjw in enumerate(rjws):
    _, name, fmt, num_dim, *dims = rjw.split(', ')
    fstr = f'{reduce(mul, map(int, dims))}{fmt}'
    FMT = f'{FMT} {fstr}'
    names.append(name)
    print(f'{idx}:{name}, {fstr}')
FMT = FMT.replace('c', 's') # for conveninece treat 21c as s char[]
print(f"Format string: {repr(FMT)}")

# parse DAT file
s = struct.Struct(FMT)
print(f'Struct size:{s.size}')
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    n = 0
    while True: # in python3.8+ this loop can be simplified with walrus operator
        chunk = f.read(s.size)
        if not chunk:
            break
        data = s.unpack_from(chunk)
        # process data further, e.g. split data in 2D containers where appropriate
        n += 1

print(f'Number of records: {n}')

# make a named tuple to represent first 10 fields
# for nice display. This basic use of namedtuple works only
# for first 23 objects, which have single item.
num_fields = 10
Record = namedtuple('Record', names[:num_fields])
record = Record(*data[:num_fields])
print('\n----------------------\n')
print(f'First {num_fields} fields of the last record.')
print(record)
输出:

Number of objects: 49
0:DIM0_UTC, 21c
1:PACKETID, 1B
2:DIM0_UTC_UPPER, 21c

--- omitted for sake of brevity ---

46:DIM2_AZIMUTH_DESPUN_LOWER, 3072f
47:MAG_VECTOR, 3f
48:ESENSOR, 1H
Format string: '= 21s 1B 21s 1b 21s 1b 1H 1B 1B 1B 1B 1h 1h 1f 1f 1f 1f 1f 1f 1f 1f 1f 1f 3f 3f 3f 1f 9f 9f 9f 1f 1I 1I 1H 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3f 1H'
Struct size:160036
Number of records: 1101

----------------------

First 10 fields of the last record.
Record(DIM0_UTC=b'2018-091T23:56:08.925', PACKETID=106, DIM0_UTC_UPPER=b'2018-092T00:01:08.925', PACKET_MODE=1, DIM0_UTC_LOWER=b'2018-091T23:51:08.925', PACKET_SPECIES=-1, ACCUMULATION_TIME=600, DATA_UNITS=2, SOURCE_BACKGROUND=3, SOURCE_DEAD_TIME=0)

链接到

您需要调用
.read()
,使用
open(…)
查看返回的内容,以便读取文件的内容以查看数据文件中的内容。您得到的是一个
repr
输出,告诉您
data
是一个
TextIOWrapper
。非常感谢这段精彩的代码。我试图理解它,但似乎它正在发挥作用。正如我在问题中提到的,是否可以将其导出到.csv文件中,其中包含“DIM0_UTC、PACKETID、…、MAG_VECTOR、ESENSOR”列和具有相应值的行?由于这是一种分类,您也可以看看这个问题[。您可以始终保存在csv中,但有40004个单独的字段。根据描述,还有2D数据容器(全部3072=64*48)。所以你需要决定如何处理。顺便说一下,你在另一个问题中提到的NASA观测者显示了39996列,但是对于一些1D但有3个项目的物体,比如SC_POS_JUPITER_J2000XYZ,它只显示了1个值。感谢你的快速回复。除了SC_POS_JUPITER_J2000XYZ之外,其他有3个分量的量是SC_VEL_JUPITER_J2000XYZ,SC_VEL_ANGULAR_J2000XYZ,MAG_VECTOR。我想,这是一种向量。所以,如果我们考虑剩下的6列,我们得到40004(=39996+6)。我检查了我上一个问题中提到的软件,但找不到导出dat值的方法。是的,这4个值解释了8列的差异。我没有检查它如何显示DESPUN_SC_to_J2000、2000_to_JSSXYZ和J2000_to_JSSRTP,所有这些都是9-items!D对象,但我猜它是文件。此外,我编辑了我的答案以包含注释namedtuple的这种基本用法只适用于前23个物体,它们只有一个项目。是的,我还认为NASA软件只是一个查看器,没有导出选项。
"""Example of reading NASA JUNO JADE CALIBRATED SCIENCE DATA
https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/JNO-J_SW-JAD-3-CALIBRATED-V1.0/DATA/2018/2018091/ELECTRONS/JAD_L30_LRS_ELC_ANY_CNT_2018091_V03&o=1
https://stackoverflow.com/a/66687113/4046632
"""

import struct
from functools import reduce
from operator import mul
from collections import namedtuple

__author__ = "Boyan Kolev, https://stackoverflow.com/users/4046632/buran"

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.LBL') as f:
    rjws = [line.strip('\n/* ') for line in f if line.startswith('/* RJW')]

# create the format string for struct
rjws = rjws[2:] # exclude first 2 RJW comments related to file itself
names = []
FMT = '='
print(f'Number of objects: {len(rjws)}')
for idx, rjw in enumerate(rjws):
    _, name, fmt, num_dim, *dims = rjw.split(', ')
    fstr = f'{reduce(mul, map(int, dims))}{fmt}'
    FMT = f'{FMT} {fstr}'
    names.append(name)
    print(f'{idx}:{name}, {fstr}')
FMT = FMT.replace('c', 's') # for conveninece treat 21c as s char[]
print(f"Format string: {repr(FMT)}")

# parse DAT file
s = struct.Struct(FMT)
print(f'Struct size:{s.size}')
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    n = 0
    while True: # in python3.8+ this loop can be simplified with walrus operator
        chunk = f.read(s.size)
        if not chunk:
            break
        data = s.unpack_from(chunk)
        # process data further, e.g. split data in 2D containers where appropriate
        n += 1

print(f'Number of records: {n}')

# make a named tuple to represent first 10 fields
# for nice display. This basic use of namedtuple works only
# for first 23 objects, which have single item.
num_fields = 10
Record = namedtuple('Record', names[:num_fields])
record = Record(*data[:num_fields])
print('\n----------------------\n')
print(f'First {num_fields} fields of the last record.')
print(record)
Number of objects: 49
0:DIM0_UTC, 21c
1:PACKETID, 1B
2:DIM0_UTC_UPPER, 21c

--- omitted for sake of brevity ---

46:DIM2_AZIMUTH_DESPUN_LOWER, 3072f
47:MAG_VECTOR, 3f
48:ESENSOR, 1H
Format string: '= 21s 1B 21s 1b 21s 1b 1H 1B 1B 1B 1B 1h 1h 1f 1f 1f 1f 1f 1f 1f 1f 1f 1f 3f 3f 3f 1f 9f 9f 9f 1f 1I 1I 1H 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3f 1H'
Struct size:160036
Number of records: 1101

----------------------

First 10 fields of the last record.
Record(DIM0_UTC=b'2018-091T23:56:08.925', PACKETID=106, DIM0_UTC_UPPER=b'2018-092T00:01:08.925', PACKET_MODE=1, DIM0_UTC_LOWER=b'2018-091T23:51:08.925', PACKET_SPECIES=-1, ACCUMULATION_TIME=600, DATA_UNITS=2, SOURCE_BACKGROUND=3, SOURCE_DEAD_TIME=0)