Pandas 遍历数据帧并用数字替换某些字符串_Pandas_Dataframe_Replace

Pandas 遍历数据帧并用数字替换某些字符串

pandas dataframe replace

Pandas 遍历数据帧并用数字替换某些字符串,pandas,dataframe,replace,Pandas,Dataframe,Replace,我有一个数据帧sample_df，看起来像： bar foo 0 rejected unidentified 1 clear caution 2 caution NaN 请注意，这只是一个随机组成的df，有很多其他列，比如说，与文本不同的数据类型。bar和foo也可能有许多空单元格/值，它们是NaN 实际df如下所示，上面只是一个示例： | | Unnamed: 0 | user_id

我有一个数据帧

sample_df

，看起来像：

     bar   foo
0    rejected unidentified
1    clear    caution
2    caution    NaN

请注意，这只是一个随机组成的df，有很多其他列，比如说，与文本不同的数据类型。bar和foo也可能有许多空单元格/值，它们是NaN

实际df如下所示，上面只是一个示例：

|      |   Unnamed: 0 | user_id                          | result   | face_comparison_result   | created_at          | facial_image_integrity_result   | visual_authenticity_result   | properties      | attempt_id                       |
|-----:|-------------:|:---------------------------------|:---------|:-------------------------|:--------------------|:--------------------------------|:-----------------------------|:----------------|:---------------------------------|
|    0 |           58 | ecee468d4a124a8eafeec61271cd0da1 | clear    | clear                    | 2017-06-20 17:50:43 | clear                           | clear                        | {}              | 9e4277fc1ddf4a059da3dd2db35f6c76 |
|    1 |           76 | 1895d2b1782740bb8503b9bf3edf1ead | clear    | clear                    | 2017-06-20 13:28:00 | clear                           | clear                        | {}              | ab259d3cb33b4711b0a5174e4de1d72c |
|    2 |          217 | e71b27ea145249878b10f5b3f1fb4317 | clear    | clear                    | 2017-06-18 21:18:31 | clear                           | clear                        | {}              | 2b7f1c6f3fc5416286d9f1c97b15e8f9 |
|    3 |          221 | f512dc74bd1b4c109d9bd2981518a9f8 | clear    | clear                    | 2017-06-18 22:17:29 | clear                           | clear                        | {}              | ab5989375b514968b2ff2b21095ed1ef |
|    4 |          251 | 0685c7945d1349b7a954e1a0869bae4b | clear    | clear                    | 2017-06-18 19:54:21 | caution                           | clear                        | {}              | dd1b0b2dbe234f4cb747cc054de2fdd3 |
|    5 |          253 | 1a1a994f540147ab913fcd61b7a859d9 | clear    | clear                    | 2017-06-18 20:05:05 | clear                           | clear                        | {}              | 1475037353a848318a32324539a6947e |
|    6 |          334 | 26e89e4a60f1451285e70ca8dc5bc90e | clear    | clear                    | 2017-06-17 20:21:54 | suspected                           | clear                        | {}              | 244fa3e7cfdb48afb44844f064134fec |
|    7 |          340 | 41afdea02a9c42098a15d94a05e8452b | NaN    | clear                    | 2017-06-17 20:42:53 | clear                           | clear                        | {}              | b066a4043122437bafae3ddcf6c2ab07 |
|    8 |          424 | 6cf6eb05a3cc4aabb69c19956a055eb9 | rejected    | NaN                    | 2017-06-16 20:00:26 |

根据下面的映射，我想用数字替换我找到的任何字符串

def no_strings(df):
  columns=list(df)
  for column in columns:
    df[column] = df[column].map(result_map)
#We will need a mapping of strings to numbers to be able to analyse later.
result_map = {'unidentified':0,"clear": 1, 'suspected': 2,"caution" : 3, 'rejected':4}

因此，输出可能如下所示：

  bar  foo
0    4    0
1    1    3
2    3    NaN

出于某种原因，当我运行

no\u strings（sample\u df）

时会出现错误

我做错了什么？

让我们尝试堆叠、映射dict，然后取消堆叠

df.stack().to_frame()[0].map(result_map).unstack()



    bar  foo
0    4    0
1    1    3
2    3    2

但是，如果您希望安全起见（假设结果映射中没有键/值，并且不希望看到NaN），请执行以下操作：

所以这个df的一个输出

    bar           foo
0   rejected      unidentified
1   clear         caution
2   caution       suspected
3   sdgdg         0000

将导致：

   bar              foo
0   4                0
1   1                3
2   3                2
3   not found        not found

为了提高效率：

cols = ['foo','bar','other_columns']
for c in cols:
   df[c] = df[c].map(lambda x: result_map.get(x, 'not found'))

你能给我们看一下样本df和预期输出吗？当然，已经编辑了。对不起，我正在将

sample\u df=sample\u df.stack（）.写入到\u frame（）[0]。映射（结果映射）。取消堆栈（）

然后

显示（样本df）

。我所看到的只是一堆n和0。好吧，这是否有助于

df.drop（columns=cols）。join（df.filter（items=cols）。apply（lambda x:x.map（result\u map））

其中

cols

是需要传输映射的列。谢谢！当您只想循环浏览这些列，而不知道它们被称为foo或bar时，是否可以编辑？所以你只需要一个函数来编辑你看到的每个包含清晰、未识别等内容的单元格，并保持现有单元格不变？@DhruvGhulati我在帖子中添加了它。参见“提高效率”下的内容；）这管用！只有一件事，它不会让现有的单元格保持不变。如果它找到了一个NaN，那么它将用not found替换它，但是如果该单元格包含类似8190909e566647a5b6afeee9b4ec6c6a的内容，那么出于某种原因，它似乎会将其变为“not found”。未找到仅适用于通过np.nan转换的空白或现有nan

    bar           foo
0   rejected      unidentified
1   clear         caution
2   caution       suspected
3   sdgdg         0000

   bar              foo
0   4                0
1   1                3
2   3                2
3   not found        not found

cols = ['foo','bar','other_columns']
for c in cols:
   df[c] = df[c].map(lambda x: result_map.get(x, 'not found'))