Python 为什么在dataframe中创建新列时NaN值出现错误?
在Python3和pandas中,我有这个数据帧Python 为什么在dataframe中创建新列时NaN值出现错误?,python,pandas,Python,Pandas,在Python3和pandas中,我有这个数据帧 eleitos_d_doadores.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 47490 entries, 0 to 47489 Data columns (total 21 columns): uf_x 47490 non-null object partido_eleicao_x 47
eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 21 columns):
uf_x 47490 non-null object
partido_eleicao_x 47490 non-null object
cargo_x 47490 non-null object
nome_completo_x 47490 non-null object
cpf 47490 non-null object
cpf_cnpj_doador 47490 non-null object
nome_doador 47490 non-null object
valor 47490 non-null object
tipo_receita 47490 non-null object
fonte_recurso 47490 non-null object
especie_recurso 47490 non-null object
cpf_cnpj_doador_originario 47490 non-null object
nome_doador_originario 47490 non-null object
tipo_doador_originario 47490 non-null object
Unnamed: 0 47490 non-null int64
uf_y 47490 non-null object
cargo_y 47490 non-null object
nome_completo_y 47490 non-null object
nome_urna 47490 non-null object
partido_eleicao_y 47490 non-null object
situacao 47490 non-null object
dtypes: int64(1), object(20)
memory usage: 8.0+ MB
这正确地截断了许多行:01888360712变为01888360
但是有许多行没有正确截断,相反,预期值被替换为NaN,错误:50844182000155变为NaN这里正确的值是50844182
有人知道NaN内容的来源吗
下面是我为创建列而编写的命令。然后我选择了一部分有错误和命中的数据
eleitos_d_doadores['cnpj_raiz_doador'] = eleitos_d_doadores.cpf_cnpj_doador.str[:8]
eleitos_d_doadores['cnpj_raiz_doador_originario'] = eleitos_d_doadores.cpf_cnpj_doador_originario.str[:8]
eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 23 columns):
uf_x 47490 non-null object
partido_eleicao_x 47490 non-null object
cargo_x 47490 non-null object
nome_completo_x 47490 non-null object
cpf 47490 non-null object
cpf_cnpj_doador 47490 non-null object
nome_doador 47490 non-null object
valor 47490 non-null object
tipo_receita 47490 non-null object
fonte_recurso 47490 non-null object
especie_recurso 47490 non-null object
cpf_cnpj_doador_originario 47490 non-null object
nome_doador_originario 47490 non-null object
tipo_doador_originario 47490 non-null object
Unnamed: 0 47490 non-null int64
uf_y 47490 non-null object
cargo_y 47490 non-null object
nome_completo_y 47490 non-null object
nome_urna 47490 non-null object
partido_eleicao_y 47490 non-null object
situacao 47490 non-null object
cnpj_raiz_doador 3488 non-null object
cnpj_raiz_doador_originario 47490 non-null object
dtypes: int64(1), object(22)
memory usage: 8.7+ MB
nome = eleitos_d_doadores[(eleitos_d_doadores['nome_completo_x'] == 'JULIO CESAR DELGADO')]
nome.loc[:, ['cpf_cnpj_doador', 'cnpj_raiz_doador']]
cpf_cnpj_doador cnpj_raiz_doador
7390 1421697000137 NaN
7391 1421697000137 NaN
7392 1421697000137 NaN
7393 1421697000137 NaN
7394 56993900000131 NaN
7395 26198515000484 NaN
7396 26198515000484 NaN
7397 20574428000155 NaN
7398 12082605000158 NaN
7399 60892403000114 NaN
7400 17469701000177 NaN
7401 66460080000176 NaN
7402 21561725000129 NaN
7403 50844182000155 NaN
7404 3940864000181 NaN
7405 3940864000181 NaN
7406 3940864000181 NaN
7407 3940864000181 NaN
7408 3940864000181 NaN
7409 3940864000181 NaN
7410 3940864000181 NaN
7411 00697656691 00697656
7412 03776208660 03776208
7413 16760808649 NaN
7414 17153081000162 NaN
7415 20573722000142 NaN
7416 20573722000142 NaN
7417 20573722000142 NaN
7418 20573722000142 NaN
7419 20592604000181 NaN
7420 20573722000142 NaN
7421 15102288000182 NaN
7422 33131541000108 NaN
7423 20575279000149 NaN
7424 20575492000150 NaN
nome.loc[:, ['cpf_cnpj_doador_originario', 'cnpj_raiz_doador_originario']]
cpf_cnpj_doador_originario cnpj_raiz_doador_originario
7390 17262213000194 17262213
7391 90400888000142 90400888
7392 16639904000100 16639904
7393 00447821000170 00447821
7394 #NULO #NULO
7395 #NULO #NULO
7396 #NULO #NULO
7397 38105195100 38105195
7398 #NULO #NULO
7399 #NULO #NULO
7400 #NULO #NULO
7401 #NULO #NULO
7402 #NULO #NULO
7403 #NULO #NULO
7404 61186888000193 61186888
7405 15102288000182 15102288
7406 92693118000160 92693118
7407 92693118000160 92693118
7408 02125403000192 02125403
7409 33000092000169 33000092
7410 07052569000140 07052569
7411 #NULO #NULO
7412 #NULO #NULO
7413 #NULO #NULO
7414 #NULO #NULO
7415 03349915000103 03349915
7416 17463456000190 17463456
7417 71077747000196 71077747
7418 03349915000103 03349915
7419 04899037000154 04899037
7420 06142647000134 06142647
7421 #NULO #NULO
7422 #NULO #NULO
7423 04641376000136 04641376
7424 08250286634 08250286
可以使用pandas.DataFrame.dropna方法避免NaN值:
非常感谢。事实上,我的问题并不精确。我创建的新列中显示的NaN值不正确。我不明白为什么您能提供更多导致NaN的值的示例?是的,谢谢:3331541000108/15102288001822/21561725000129。有一些问题行,生成NaN。我注意到正确的行具有更少字符的代码:00697656691例如,00697656691生成的列为00697656,或03776208660生成的03776208好的,这里还有其他内容。你能创建一个包含10行(5行是好的,5行是坏的)的数据框,并在这里发布该数据框吗?我在这里没有看到您的逻辑错误,我能够从您在上面的评论中提供的附加数据生成所需的结果。请看这个小示例:df=pd.DataFrame{'Cat':['A','A','B','B'],'String1':['ABCDEFG','HIJKLMNO','QRSTUVW','XYZ1234']}df['Substring']=df.loc[df.Cat='A','String1'].str[:3]printdf
eleitos_d_doadores['cnpj_raiz_doador'] = eleitos_d_doadores.cpf_cnpj_doador.str[:8]
eleitos_d_doadores['cnpj_raiz_doador_originario'] = eleitos_d_doadores.cpf_cnpj_doador_originario.str[:8]
eleitos_d_doadores.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47490 entries, 0 to 47489
Data columns (total 23 columns):
uf_x 47490 non-null object
partido_eleicao_x 47490 non-null object
cargo_x 47490 non-null object
nome_completo_x 47490 non-null object
cpf 47490 non-null object
cpf_cnpj_doador 47490 non-null object
nome_doador 47490 non-null object
valor 47490 non-null object
tipo_receita 47490 non-null object
fonte_recurso 47490 non-null object
especie_recurso 47490 non-null object
cpf_cnpj_doador_originario 47490 non-null object
nome_doador_originario 47490 non-null object
tipo_doador_originario 47490 non-null object
Unnamed: 0 47490 non-null int64
uf_y 47490 non-null object
cargo_y 47490 non-null object
nome_completo_y 47490 non-null object
nome_urna 47490 non-null object
partido_eleicao_y 47490 non-null object
situacao 47490 non-null object
cnpj_raiz_doador 3488 non-null object
cnpj_raiz_doador_originario 47490 non-null object
dtypes: int64(1), object(22)
memory usage: 8.7+ MB
nome = eleitos_d_doadores[(eleitos_d_doadores['nome_completo_x'] == 'JULIO CESAR DELGADO')]
nome.loc[:, ['cpf_cnpj_doador', 'cnpj_raiz_doador']]
cpf_cnpj_doador cnpj_raiz_doador
7390 1421697000137 NaN
7391 1421697000137 NaN
7392 1421697000137 NaN
7393 1421697000137 NaN
7394 56993900000131 NaN
7395 26198515000484 NaN
7396 26198515000484 NaN
7397 20574428000155 NaN
7398 12082605000158 NaN
7399 60892403000114 NaN
7400 17469701000177 NaN
7401 66460080000176 NaN
7402 21561725000129 NaN
7403 50844182000155 NaN
7404 3940864000181 NaN
7405 3940864000181 NaN
7406 3940864000181 NaN
7407 3940864000181 NaN
7408 3940864000181 NaN
7409 3940864000181 NaN
7410 3940864000181 NaN
7411 00697656691 00697656
7412 03776208660 03776208
7413 16760808649 NaN
7414 17153081000162 NaN
7415 20573722000142 NaN
7416 20573722000142 NaN
7417 20573722000142 NaN
7418 20573722000142 NaN
7419 20592604000181 NaN
7420 20573722000142 NaN
7421 15102288000182 NaN
7422 33131541000108 NaN
7423 20575279000149 NaN
7424 20575492000150 NaN
nome.loc[:, ['cpf_cnpj_doador_originario', 'cnpj_raiz_doador_originario']]
cpf_cnpj_doador_originario cnpj_raiz_doador_originario
7390 17262213000194 17262213
7391 90400888000142 90400888
7392 16639904000100 16639904
7393 00447821000170 00447821
7394 #NULO #NULO
7395 #NULO #NULO
7396 #NULO #NULO
7397 38105195100 38105195
7398 #NULO #NULO
7399 #NULO #NULO
7400 #NULO #NULO
7401 #NULO #NULO
7402 #NULO #NULO
7403 #NULO #NULO
7404 61186888000193 61186888
7405 15102288000182 15102288
7406 92693118000160 92693118
7407 92693118000160 92693118
7408 02125403000192 02125403
7409 33000092000169 33000092
7410 07052569000140 07052569
7411 #NULO #NULO
7412 #NULO #NULO
7413 #NULO #NULO
7414 #NULO #NULO
7415 03349915000103 03349915
7416 17463456000190 17463456
7417 71077747000196 71077747
7418 03349915000103 03349915
7419 04899037000154 04899037
7420 06142647000134 06142647
7421 #NULO #NULO
7422 #NULO #NULO
7423 04641376000136 04641376
7424 08250286634 08250286
DataFrame.dropna(subset=['ColumnToCheck'], how='all', inplace=True)