Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
List 如何从pandas数据帧生成列表,跳过nan值_List_Pandas_Dataframe_Nested_Nan - Fatal编程技术网

List 如何从pandas数据帧生成列表,跳过nan值

List 如何从pandas数据帧生成列表,跳过nan值,list,pandas,dataframe,nested,nan,List,Pandas,Dataframe,Nested,Nan,我有一个熊猫数据框,看起来大致像 foo foo2 foo3 foo4 a NY WA AZ NaN b DC NaN NaN NaN c MA CA NaN NaN 我想对这个数据帧的观察结果做一个嵌套列表,但省略了NaN值,所以我有类似于[['NY','WA','AZ'],['DC'],['MA',CA']的东西 这个数据帧中有一个模式,如果这有区别,那么如果fooX是空的,那么后面的fooY列也将是空的

我有一个熊猫数据框,看起来大致像

    foo   foo2   foo3  foo4
a   NY    WA     AZ    NaN
b   DC    NaN    NaN   NaN
c   MA    CA     NaN   NaN
我想对这个数据帧的观察结果做一个嵌套列表,但省略了NaN值,所以我有类似于[['NY','WA','AZ'],['DC'],['MA',CA']的东西

这个数据帧中有一个模式,如果这有区别,那么如果fooX是空的,那么后面的fooY列也将是空的

我最初在下面有类似的代码。我相信有更好的方法可以做到这一点

A = [[i] for i in subset_label['label'].tolist()]
B = [i for i in subset_label['label2'].tolist()]
C = [i for i in subset_label['label3'].tolist()]
D = [i for i in subset_label['label4'].tolist()]
out_list = []
for index, row in subset_label.iterrows():
out_list.append([row.label, row.label2, row.label3, row.label4])
out_list
试试这个:

In [77]: df.T.apply(lambda x: x.dropna().tolist()).tolist()
Out[77]: [['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

选项1
pd.DataFrame.stack
默认情况下删除na

df.stack().groupby(level=0).apply(list).tolist()

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]
​___

选项2
有趣的选择,因为我认为在熊猫对象中求和列表很有趣

df.applymap(lambda x: [x] if pd.notnull(x) else []).sum(1).tolist()

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

选项3
numpy
实验

nn = df.notnull().values
sliced = df.values.ravel()[nn.ravel()]
splits = nn.sum(1)[:-1].cumsum()
[s.tolist() for s in np.split(sliced, splits)]

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

这是一个矢量化的版本

original = pd.DataFrame(data={
    'foo': ['NY', 'DC', 'MA'],
    'foo2': ['WA', np.nan, 'CA'],
    'foo3': ['AZ', np.nan, np.nan],
    'foo4': [np.nan] * 3,
})

out = original.copy().fillna('NAN')

# Build up mapping such that each non-nan entry is mapped to [entry]
#   and nan entries are mapped to []
unique_entries = np.unique(out.values)
mapping = {e: [e] for e in unique_entries}
mapping['NAN'] = []

# Apply mapping
for c in original.columns:
    out[c] = out[c].map(mapping)

# Concatenate the lists along axis 1
out.sum(axis=1)
你应该得到像这样的东西

0    [NY, WA, AZ]
1            [DC]
2        [MA, CA]
dtype: object