Python 将文本文件转换为json格式或过滤掉键和值对
我尝试执行一项任务,可以从txt文件中找出键和值。 我脑子里有一些想法。1.使用regex 2.将txt转换为json 我很难过滤和转换格式。有没有什么好方法可以做到这一点,或者有什么可用的api可以执行这样的任务 txt文件:Python 将文本文件转换为json格式或过滤掉键和值对,python,json,regex,Python,Json,Regex,我尝试执行一项任务,可以从txt文件中找出键和值。 我脑子里有一些想法。1.使用regex 2.将txt转换为json 我很难过滤和转换格式。有没有什么好方法可以做到这一点,或者有什么可用的api可以执行这样的任务 txt文件: System: Host: ict-vm Kernel: 4.4.0-53-generic i686 (32 bit) Desktop: MATE 1.16.1 Distro: Linux Mint 18.1 Serena Machine:
System: Host: ict-vm Kernel: 4.4.0-53-generic i686 (32 bit)
Desktop: MATE 1.16.1 Distro: Linux Mint 18.1 Serena
Machine: System: ASUSTeK (portable) product: N43SN v: 1.0
Mobo: ASUSTeK model: N43SN v: 1.0 serial: NB-1234567890
Bios: American Megatrends v: N43SN.412 date: 09/21/2011
CPU: Quad core Intel Core i7-2670QM (-HT-MCP-) cache: 6144 KB
clock speeds: max: 3100 MHz 1: 807 MHz 2: 814 MHz 3: 800 MHz
4: 1078 MHz 5: 990 MHz 6: 811 MHz 7: 801 MHz 8: 844 MHz
Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller
Card-2: NVIDIA GF108M [GeForce GT 550M]
Display Server: X.org 1.18.4 drivers: intel (unloaded: fbdev,vesa) FAILED: nouveau
tty size: 80x24 Advanced Data: N/A for root
Audio: Card Intel 6 Series/C200 Series Family High Definition Audio Controller
driver: snd_hda_intel
Sound: Advanced Linux Sound Architecture v: k4.4.0-53-generic
Network: Card-1: Qualcomm Atheros AR9285 Wireless Network Adapter (PCI-Express)
driver: ath9k
IF: wlp3s0 state: up mac: 7x:2t:61:d4:72:8a
Card-2: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet driver: atl1c
IF: enp4s0 state: down mac: 14:da:e1:ay:72:b5
Drives: HDD Total Size: 507.9GB (0.4% used)
ID-1: /dev/sda model: ST95005620AS size: 500.1GB
ID-2: USB /dev/sdb model: DataTraveler_3.0 size: 7.8GB
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: System Temperatures: cpu: 48.0C mobo: N/A gpu: 43.0
Fan Speeds (in rpm): cpu: N/A
Info: Processes: 223 Uptime: 40 min Memory: 531.1/3948.3MB
Client: Shell (sudo) inxi: 2.2.35System: Host: ict-vm Kernel: 4.4.0-53-generic i686 (32 bit)
Desktop: MATE 1.16.1 Distro: Linux Mint 18.1 Serena
Machine: System: ASUSTeK (portable) product: N43SN v: 1.0
Mobo: ASUSTeK model: N43SN v: 1.0 serial: NB-1234567890
Bios: American Megatrends v: N43SN.412 date: 09/21/2011
CPU: Quad core Intel Core i7-2670QM (-HT-MCP-) cache: 6144 KB
clock speeds: max: 3100 MHz 1: 807 MHz 2: 814 MHz 3: 800 MHz
4: 1078 MHz 5: 990 MHz 6: 811 MHz 7: 801 MHz 8: 844 MHz
Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller
Card-2: NVIDIA GF108M [GeForce GT 550M]
Display Server: X.org 1.18.4 drivers: intel (unloaded: fbdev,vesa) FAILED: nouveau
tty size: 80x24 Advanced Data: N/A for root
Audio: Card Intel 6 Series/C200 Series Family High Definition Audio Controller
driver: snd_hda_intel
Sound: Advanced Linux Sound Architecture v: k4.4.0-53-generic
Network: Card-1: Qualcomm Atheros AR9285 Wireless Network Adapter (PCI-Express)
driver: ath9k
IF: wlp3s0 state: up mac: 7x:2t:61:d4:72:8a
Card-2: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet driver: atl1c
IF: enp4s0 state: down mac: 14:da:e1:ay:72:b5
Drives: HDD Total Size: 507.9GB (0.4% used)
ID-1: /dev/sda model: ST95005620AS size: 500.1GB
ID-2: USB /dev/sdb model: DataTraveler_3.0 size: 7.8GB
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: System Temperatures: cpu: 48.0C mobo: N/A gpu: 43.0
Fan Speeds (in rpm): cpu: N/A
Info: Processes: 223 Uptime: 40 min Memory: 531.1/3948.3MB
Client: Shell (sudo) inxi: 2.2.35
json文件:
{
System: {
Host:'ict-vm',
Kernel:'4.4.0-53-generic i686 (32 bit)',
.....
},
Machine:{
System: 'ASUSTeK (portable)'
}
}
预期结果:我可以使用键获取值
print(node['System'])
输出:
ASUSTeK (portable)
提前感谢如果文件格式是固定的,并且没有嵌套的dict,那么可以使用
import re
x="""System: Host: ict-vm Kernel: 4.4.0-53-generic i686 (32 bit) Desktop: MATE 1.16.1 Distro: Linux Mint 18.1 Serena
Machine: System: ASUSTeK (portable) product: N43SN v: 1.0 Mobo: ASUSTeK model: N43SN v: 1.0 serial: NB-1234567890 Bios: American Megatrends v: N43SN.412 date: 09/21/2011"""
y = re.findall(r"(\S+):\s*(.*?)(?=\s*\S+:|$)",x)
d={}
for i,j in y:
if not j:
d[i]={}
k=d[i]
else:
k[i]=j
print d
输出:
ASUSTeK (portable)
{'Machine':{'product':'N43SN','Mobo':'ASUSTeK','System':'ASUSTeK(portable)','Bios':'American Megatrends','v':'N43SN.412','date':'09/21/2011','model':'N43SN','serial':'NB 1234567890','System':{'Kernel 4.4.0-53-generic i686(32位)','Host':'ict-vm','Distro':'Linux Mint 18.1塞雷纳','MATE 1.16.1'}
您可以使用d['Machine']['System']
输出:ASUSTeK(便携式)
编辑:
对于新的输入文件,我们需要调整regex
\s*([^:\n]+):\s*((?:(?!: |\n)[\s\S])*)(?=\s+[^:\n]+:|$)
见演示
这真的是文件的格式吗,还是StackOverflow吃了一些换行符?@Bemmu我刚刚更新了txt格式。这就是我想要转换的文件,这正是我所需要的,但输出结果与你的不同。我刚刚更新了上面的txt文件格式。非常感谢。这正是我想要的。