Python 为什么在dataframe中创建新列时NaN值出现错误?

Python 为什么在dataframe中创建新列时NaN值出现错误?,python,pandas,Python,Pandas,在Python3和pandas中,我有这个数据帧 eleitos_d_doadores.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 47490 entries, 0 to 47489 Data columns (total 21 columns): uf_x 47490 non-null object partido_eleicao_x 47

在Python3和pandas中,我有这个数据帧

eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 21 columns):
uf_x                          47490 non-null object
partido_eleicao_x             47490 non-null object
cargo_x                       47490 non-null object
nome_completo_x               47490 non-null object
cpf                           47490 non-null object
cpf_cnpj_doador               47490 non-null object
nome_doador                   47490 non-null object
valor                         47490 non-null object
tipo_receita                  47490 non-null object
fonte_recurso                 47490 non-null object
especie_recurso               47490 non-null object
cpf_cnpj_doador_originario    47490 non-null object
nome_doador_originario        47490 non-null object
tipo_doador_originario        47490 non-null object
Unnamed: 0                    47490 non-null int64
uf_y                          47490 non-null object
cargo_y                       47490 non-null object
nome_completo_y               47490 non-null object
nome_urna                     47490 non-null object
partido_eleicao_y             47490 non-null object
situacao                      47490 non-null object
dtypes: int64(1), object(20)
memory usage: 8.0+ MB
这正确地截断了许多行:01888360712变为01888360

但是有许多行没有正确截断,相反,预期值被替换为NaN,错误:50844182000155变为NaN这里正确的值是50844182

有人知道NaN内容的来源吗

下面是我为创建列而编写的命令。然后我选择了一部分有错误和命中的数据

eleitos_d_doadores['cnpj_raiz_doador'] = eleitos_d_doadores.cpf_cnpj_doador.str[:8]

eleitos_d_doadores['cnpj_raiz_doador_originario'] = eleitos_d_doadores.cpf_cnpj_doador_originario.str[:8]

eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 23 columns):
uf_x                           47490 non-null object
partido_eleicao_x              47490 non-null object
cargo_x                        47490 non-null object
nome_completo_x                47490 non-null object
cpf                            47490 non-null object
cpf_cnpj_doador                47490 non-null object
nome_doador                    47490 non-null object
valor                          47490 non-null object
tipo_receita                   47490 non-null object
fonte_recurso                  47490 non-null object
especie_recurso                47490 non-null object
cpf_cnpj_doador_originario     47490 non-null object
nome_doador_originario         47490 non-null object
tipo_doador_originario         47490 non-null object
Unnamed: 0                     47490 non-null int64
uf_y                           47490 non-null object
cargo_y                        47490 non-null object
nome_completo_y                47490 non-null object
nome_urna                      47490 non-null object
partido_eleicao_y              47490 non-null object
situacao                       47490 non-null object
cnpj_raiz_doador               3488 non-null object
cnpj_raiz_doador_originario    47490 non-null object
dtypes: int64(1), object(22)
memory usage: 8.7+ MB

nome = eleitos_d_doadores[(eleitos_d_doadores['nome_completo_x'] == 'JULIO CESAR DELGADO')]

nome.loc[:, ['cpf_cnpj_doador', 'cnpj_raiz_doador']]

    cpf_cnpj_doador     cnpj_raiz_doador
7390    1421697000137   NaN
7391    1421697000137   NaN
7392    1421697000137   NaN
7393    1421697000137   NaN
7394    56993900000131  NaN
7395    26198515000484  NaN
7396    26198515000484  NaN
7397    20574428000155  NaN
7398    12082605000158  NaN
7399    60892403000114  NaN
7400    17469701000177  NaN
7401    66460080000176  NaN
7402    21561725000129  NaN
7403    50844182000155  NaN
7404    3940864000181   NaN
7405    3940864000181   NaN
7406    3940864000181   NaN
7407    3940864000181   NaN
7408    3940864000181   NaN
7409    3940864000181   NaN
7410    3940864000181   NaN
7411    00697656691     00697656
7412    03776208660     03776208
7413    16760808649     NaN
7414    17153081000162  NaN
7415    20573722000142  NaN
7416    20573722000142  NaN
7417    20573722000142  NaN
7418    20573722000142  NaN
7419    20592604000181  NaN
7420    20573722000142  NaN
7421    15102288000182  NaN
7422    33131541000108  NaN
7423    20575279000149  NaN
7424    20575492000150  NaN

nome.loc[:, ['cpf_cnpj_doador_originario', 'cnpj_raiz_doador_originario']]
cpf_cnpj_doador_originario  cnpj_raiz_doador_originario
7390    17262213000194  17262213
7391    90400888000142  90400888
7392    16639904000100  16639904
7393    00447821000170  00447821
7394    #NULO   #NULO
7395    #NULO   #NULO
7396    #NULO   #NULO
7397    38105195100     38105195
7398    #NULO   #NULO
7399    #NULO   #NULO
7400    #NULO   #NULO
7401    #NULO   #NULO
7402    #NULO   #NULO
7403    #NULO   #NULO
7404    61186888000193  61186888
7405    15102288000182  15102288
7406    92693118000160  92693118
7407    92693118000160  92693118
7408    02125403000192  02125403
7409    33000092000169  33000092
7410    07052569000140  07052569
7411    #NULO   #NULO
7412    #NULO   #NULO
7413    #NULO   #NULO
7414    #NULO   #NULO
7415    03349915000103  03349915
7416    17463456000190  17463456
7417    71077747000196  71077747
7418    03349915000103  03349915
7419    04899037000154  04899037
7420    06142647000134  06142647
7421    #NULO   #NULO
7422    #NULO   #NULO
7423    04641376000136  04641376
7424    08250286634     08250286

可以使用pandas.DataFrame.dropna方法避免NaN值:


非常感谢。事实上,我的问题并不精确。我创建的新列中显示的NaN值不正确。我不明白为什么您能提供更多导致NaN的值的示例?是的,谢谢:3331541000108/15102288001822/21561725000129。有一些问题行,生成NaN。我注意到正确的行具有更少字符的代码:00697656691例如,00697656691生成的列为00697656,或03776208660生成的03776208好的,这里还有其他内容。你能创建一个包含10行(5行是好的,5行是坏的)的数据框,并在这里发布该数据框吗?我在这里没有看到您的逻辑错误,我能够从您在上面的评论中提供的附加数据生成所需的结果。请看这个小示例:df=pd.DataFrame{'Cat':['A','A','B','B'],'String1':['ABCDEFG','HIJKLMNO','QRSTUVW','XYZ1234']}df['Substring']=df.loc[df.Cat='A','String1'].str[:3]printdf
eleitos_d_doadores['cnpj_raiz_doador'] = eleitos_d_doadores.cpf_cnpj_doador.str[:8]

eleitos_d_doadores['cnpj_raiz_doador_originario'] = eleitos_d_doadores.cpf_cnpj_doador_originario.str[:8]

eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 23 columns):
uf_x                           47490 non-null object
partido_eleicao_x              47490 non-null object
cargo_x                        47490 non-null object
nome_completo_x                47490 non-null object
cpf                            47490 non-null object
cpf_cnpj_doador                47490 non-null object
nome_doador                    47490 non-null object
valor                          47490 non-null object
tipo_receita                   47490 non-null object
fonte_recurso                  47490 non-null object
especie_recurso                47490 non-null object
cpf_cnpj_doador_originario     47490 non-null object
nome_doador_originario         47490 non-null object
tipo_doador_originario         47490 non-null object
Unnamed: 0                     47490 non-null int64
uf_y                           47490 non-null object
cargo_y                        47490 non-null object
nome_completo_y                47490 non-null object
nome_urna                      47490 non-null object
partido_eleicao_y              47490 non-null object
situacao                       47490 non-null object
cnpj_raiz_doador               3488 non-null object
cnpj_raiz_doador_originario    47490 non-null object
dtypes: int64(1), object(22)
memory usage: 8.7+ MB

nome = eleitos_d_doadores[(eleitos_d_doadores['nome_completo_x'] == 'JULIO CESAR DELGADO')]

nome.loc[:, ['cpf_cnpj_doador', 'cnpj_raiz_doador']]

    cpf_cnpj_doador     cnpj_raiz_doador
7390    1421697000137   NaN
7391    1421697000137   NaN
7392    1421697000137   NaN
7393    1421697000137   NaN
7394    56993900000131  NaN
7395    26198515000484  NaN
7396    26198515000484  NaN
7397    20574428000155  NaN
7398    12082605000158  NaN
7399    60892403000114  NaN
7400    17469701000177  NaN
7401    66460080000176  NaN
7402    21561725000129  NaN
7403    50844182000155  NaN
7404    3940864000181   NaN
7405    3940864000181   NaN
7406    3940864000181   NaN
7407    3940864000181   NaN
7408    3940864000181   NaN
7409    3940864000181   NaN
7410    3940864000181   NaN
7411    00697656691     00697656
7412    03776208660     03776208
7413    16760808649     NaN
7414    17153081000162  NaN
7415    20573722000142  NaN
7416    20573722000142  NaN
7417    20573722000142  NaN
7418    20573722000142  NaN
7419    20592604000181  NaN
7420    20573722000142  NaN
7421    15102288000182  NaN
7422    33131541000108  NaN
7423    20575279000149  NaN
7424    20575492000150  NaN

nome.loc[:, ['cpf_cnpj_doador_originario', 'cnpj_raiz_doador_originario']]
cpf_cnpj_doador_originario  cnpj_raiz_doador_originario
7390    17262213000194  17262213
7391    90400888000142  90400888
7392    16639904000100  16639904
7393    00447821000170  00447821
7394    #NULO   #NULO
7395    #NULO   #NULO
7396    #NULO   #NULO
7397    38105195100     38105195
7398    #NULO   #NULO
7399    #NULO   #NULO
7400    #NULO   #NULO
7401    #NULO   #NULO
7402    #NULO   #NULO
7403    #NULO   #NULO
7404    61186888000193  61186888
7405    15102288000182  15102288
7406    92693118000160  92693118
7407    92693118000160  92693118
7408    02125403000192  02125403
7409    33000092000169  33000092
7410    07052569000140  07052569
7411    #NULO   #NULO
7412    #NULO   #NULO
7413    #NULO   #NULO
7414    #NULO   #NULO
7415    03349915000103  03349915
7416    17463456000190  17463456
7417    71077747000196  71077747
7418    03349915000103  03349915
7419    04899037000154  04899037
7420    06142647000134  06142647
7421    #NULO   #NULO
7422    #NULO   #NULO
7423    04641376000136  04641376
7424    08250286634     08250286
DataFrame.dropna(subset=['ColumnToCheck'], how='all', inplace=True)