Python 使用特定列的列表替换NaN值_Python_Python 3.x_Pandas_Numpy

Python 使用特定列的列表替换NaN值

python python-3.x pandas numpy

Python 使用特定列的列表替换NaN值,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我有一个两行的数据框 df = pd.DataFrame({'group' : ['c'] * 2, 'num_column': range(2), 'num_col_2': range(2), 'seq_col': [[1,2,3,4,5]] * 2, 'seq_col_2': [[1,2,3,4,5]] * 2,

我有一个两行的数据框

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

如果有8个空值，则如下所示：

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

我想要什么

将序列列（seq_col、seq_col_2、seq_col_3等）中的值替换为我自己的列表

注意：

在该数据中，只有2个序列列，但可能更多
无法替换列中已有的列表，仅NAN

我无法从字典中找到用用户提供的列表值替换NaN的解决方案

伪代码：

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

我在下面尝试了这段代码，它适用于您传递的第一列，而对于第二列则不适用。这很奇怪

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

上述方法可行，但在seq_col_2上再试一次，会得到奇怪的结果

预期输出： 给定参数输入：

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

对于输入阵列，您可以使用：

如果需要序列中的列表值，则可以在赋值之前显式转换为序列：

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))

你能显示预期的输出吗？还有，你的代码得到了什么结果？很好，终于有人发布了至少一个可执行代码示例！不幸的是，我帮不了你，但我会因此而对你的问题投赞成票。但正如Harv所提到的：一个预期的输出会有很大帮助。您基本上想将这两个列表中的10个值转换为这些列中每行的10个单独的值吗？如果是这样的话，你想对没有列表的列做些什么？链接可能会有帮助这就是你想要的吗？

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))