Python 无法用0替换具有参数的元组列表中的空值
我有一个如下所示的数据Python 无法用0替换具有参数的元组列表中的空值,python,pandas,Python,Pandas,我有一个如下所示的数据 data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]] 这只是我提取的数据的一小部分。如您所见,K没有可用的值。所以我
data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
这只是我提取的数据的一小部分。如您所见,K
没有可用的值。所以我想。也许我可以用熊猫来解决这个问题。所以我这么做
import pandas as pd
import numpy as np
df = pd.Dataframe(data).fillna(0)
现在我不能使用df.fillna(0)
,因为数据中没有None
所以我尝试了df.replace(r'^\s*$,np.nan,regex=True)
,它将删除任何带有None
的空字符串,但即使这样也没有帮助
那么,我能做些什么来填补缺失的数据呢
注意:我不必总是以这种格式接收数据。我也可能收到这种格式的邮件
data = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
我正在寻找的是pandas中的通用解决方案,以填充缺少的值。如果我正确理解您的问题,您可以使用下一个列表添加
None
:
data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
new_data = [[t if len(t) == 2 else (*t, None) for t in l] for l in data]
使用
备选方案。自编辑后 为什么不将元组展平,见下文(使用) 然后呢,
pd.DataFrame(data).fillna(0)
给你:
In [299]: data = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
In [300]: pd.DataFrame(data).fillna(0).to_records(index=False).tolist()
Out[300]:
[('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
对于嵌套列表的情况:
In [308]: data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E',
...: 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
...: ]
In [309]: from itertools import chain
In [310]: pd.DataFrame(chain.from_iterable(data)).fillna(0).to_records(index=False).tolist()
Out[310]:
[('A', 204.593564568),
('B', 217.421341061),
('C', 237.296250326),
('D', 217.464281998),
('E', 206.329901299),
('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
IIUC,您可能有一个列表或列表列表,如果有,请尝试函数:
data1=[[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326),
('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953),
('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
data2 = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
比我快。如果您想将其应用于数据帧,只需执行
df=df.apply(lambda S:[x If len(x)==2 else(*x,None)for x in S])
0 1
0 A 204.593565
1 B 217.421341
2 C 237.296250
3 D 217.464282
4 E 206.329901
5 F 210.297626
6 G 228.117693
7 H 4.000000
8 I 265.319671
9 K 0.000000
In [299]: data = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
In [300]: pd.DataFrame(data).fillna(0).to_records(index=False).tolist()
Out[300]:
[('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
In [308]: data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E',
...: 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
...: ]
In [309]: from itertools import chain
In [310]: pd.DataFrame(chain.from_iterable(data)).fillna(0).to_records(index=False).tolist()
Out[310]:
[('A', 204.593564568),
('B', 217.421341061),
('C', 237.296250326),
('D', 217.464281998),
('E', 206.329901299),
('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
data1=[[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326),
('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953),
('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
data2 = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
import itertools
def myfunc(x):
if type(x[0])==list:
return pd.DataFrame(itertools.chain.from_iterable(x)).fillna(0)
else:
return pd.DataFrame(x).fillna(0)
print(myfunc(data1))
0 1
0 A 204.593565
1 B 217.421341
2 C 237.296250
3 D 217.464282
4 E 206.329901
5 F 210.297626
6 G 228.117693
7 H 4.000000
8 I 265.319671
9 K 0.000000
print(myfunc(data2))
0 1
0 F 210.297626
1 G 228.117693
2 H 4.000000
3 I 265.319671
4 K 0.000000