Pandas 熊猫：有错误的行的位置_Pandas - Fatal编程技术网

Pandas 熊猫：有错误的行的位置

pandas

Pandas 熊猫：有错误的行的位置,pandas,Pandas,我对熊猫很陌生，我想知道我的代码在哪里中断。例如，我正在进行类型转换： df['x']=df['x'].astype('int') …我得到一个错误“ValueError:invalid literal for long（），以10为基数：'1.0692e+06' 一般来说，如果我在数据帧中有1000个条目，我如何找出哪个条目会导致中断。ipdb中是否有任何东西可以输出当前位置（即代码中断的位置）？基本上，我正在尝试确定哪些值无法转换为点。您看到的错误可能是由于这些值造成的在x列中为字符串：

我对熊猫很陌生，我想知道我的代码在哪里中断。例如，我正在进行类型转换：

df['x']=df['x'].astype('int')

…我得到一个错误“ValueError:invalid literal for long（），以10为基数：'1.0692e+06'

一般来说，如果我在数据帧中有1000个条目，我如何找出哪个条目会导致中断。ipdb中是否有任何东西可以输出当前位置（即代码中断的位置）？基本上，我正在尝试确定哪些值无法转换为点。

您看到的错误可能是由于这些值造成的在

列中为字符串：

In [15]: df = pd.DataFrame({'x':['1.0692e+06']})
In [16]: df['x'].astype('int')
ValueError: invalid literal for long() with base 10: '1.0692e+06'

理想情况下，可以通过确保存储在构建数据帧时，数据帧已经是整数而不是字符串。如何做到这一点当然取决于您如何构建数据帧

事实上，可以使用applymap修复数据帧：

import ast
df = df.applymap(ast.literal_eval).astype('int')

但是对数据帧中的每个值调用

ast.literal\u eval

可能会很慢，这就是为什么从一开始就解决问题是最好的选择

通常，您可以检查

行的有问题的值

但是，在本例中，异常发生在对

astype

的调用中，这是一个围绕C编译代码的薄包装器。C编译代码正在通过

df['x']中的值进行循环

，因此Python调试器在这里没有帮助——它不允许您反省从C编译代码中引发异常的值

有很多重要的部分，在C、C++、Cython或FORTRAN中编写，Python调试器不带你进入快速处理循环的非Python代码段。因此，我将转而使用一种低级的解决方案：迭代Python循环中的值，并使用

try…except

捕获第一个错误：

df = pd.DataFrame({'x':['1.0692e+06']})
for i, item in enumerate(df['x']):
   try:
      int(item)
   except ValueError:
      print('ERROR at index {}: {!r}'.format(i, item))

屈服

ERROR at index 0: '1.0692e+06'

您看到的错误可能是由于

列中的值为字符串：

In [15]: df = pd.DataFrame({'x':['1.0692e+06']})
In [16]: df['x'].astype('int')
ValueError: invalid literal for long() with base 10: '1.0692e+06'

理想情况下，可以通过确保存储在构建数据帧时，数据帧已经是整数而不是字符串。如何做到这一点当然取决于您如何构建数据帧

事实上，可以使用applymap修复数据帧：

import ast
df = df.applymap(ast.literal_eval).astype('int')

但是对数据帧中的每个值调用

ast.literal\u eval

可能会很慢，这就是为什么从一开始就解决问题是最好的选择

通常，您可以检查

行的有问题的值

但是，在本例中，异常发生在对

astype

的调用中，这是一个围绕C编译代码的薄包装器。C编译代码正在通过

df['x']中的值进行循环

，因此Python调试器在这里没有帮助——它不允许您反省从C编译代码中引发异常的值

有很多重要的部分，在C、C++、Cython或FORTRAN中编写，Python调试器不带你进入快速处理循环的非Python代码段。因此，我将转而使用一种低级的解决方案：迭代Python循环中的值，并使用

try…except

捕获第一个错误：

df = pd.DataFrame({'x':['1.0692e+06']})
for i, item in enumerate(df['x']):
   try:
      int(item)
   except ValueError:
      print('ERROR at index {}: {!r}'.format(i, item))

屈服

ERROR at index 0: '1.0692e+06'

要报告由于任何异常而无法映射的所有行，请执行以下操作：

df.apply(my_function)  # throws various exceptions at unknown rows

# print Exceptions, index, and row content
for i, row in enumerate(df):
    try:
        my_function(row)
    except Exception as e: 
        print('Error at index {}: {!r}'.format(i, row))
        print(e)

要报告由于任何异常而无法映射的所有行，请执行以下操作：

df.apply(my_function)  # throws various exceptions at unknown rows

# print Exceptions, index, and row content
for i, row in enumerate(df):
    try:
        my_function(row)
    except Exception as e: 
        print('Error at index {}: {!r}'.format(i, row))
        print(e)

我遇到了同样的问题，因为我有一个大的输入文件（300万行），枚举所有行需要很长时间。因此，我编写了一个二进制搜索来查找有问题的行

import pandas as pd
import sys

def binarySearch(df, l, r, func):
    while l <= r:
        mid = l + (r - l) // 2;

        result = func(df, mid, mid+1)
        if result:
            # Check if we hit exception at mid
            return mid, result

        result = func(df, l, mid)
        if result is None:
            # If no exception at left, ignore left half
            l = mid + 1
        else:
            r = mid - 1

    # If we reach here, then the element was not present
    return -1

def check(df, start, end):
    result = None

    try:
        # In my case, I want to find out which row cause this failure
        df.iloc[start:end].uid.astype(int)
    except Exception as e:
        result = str(e)

    return result

df = pd.read_csv(sys.argv[1])

index, result = binarySearch(df, 0, len(df), check)
print("index: {}".format(index))
print(result)

将熊猫作为pd导入
导入系统
def二进制搜索（df、l、r、func）：
当我遇到同样的问题时，由于我有一个大的输入文件（300万行），枚举所有行将花费很长时间。因此，我编写了一个二进制搜索来定位有问题的行
import pandas as pd
import sys

def binarySearch(df, l, r, func):
    while l <= r:
        mid = l + (r - l) // 2;

        result = func(df, mid, mid+1)
        if result:
            # Check if we hit exception at mid
            return mid, result

        result = func(df, l, mid)
        if result is None:
            # If no exception at left, ignore left half
            l = mid + 1
        else:
            r = mid - 1

    # If we reach here, then the element was not present
    return -1

def check(df, start, end):
    result = None

    try:
        # In my case, I want to find out which row cause this failure
        df.iloc[start:end].uid.astype(int)
    except Exception as e:
        result = str(e)

    return result

df = pd.read_csv(sys.argv[1])

index, result = binarySearch(df, 0, len(df), check)
print("index: {}".format(index))
print(result)

将熊猫作为pd导入
导入系统
def二进制搜索（df、l、r、func）：
如果在ipython中，您可以打开pdb
并开始调试：%pdb
执行命令，然后%debug
，您将能够遍历堆栈并显示values@EdChum的答案是最好的。您还可以在值上循环，并在try/except中换行。如果在ipython中，您可以打开pdb
并启动debugging:%pdb
执行命令，然后%debug
您将能够遍历堆栈并显示values@EdChum的答案是最好的。您也可以在值上循环，并使用try/except进行包装。谢谢。这解决了我的问题。我还尝试将所有内容转换为“float”，它也可以工作，即使使用字符串也是如此作为一个值。一个一般性的问题是：有没有办法一步一步地找出错误，以确定哪些值（或当前索引）正在破坏？TnxI已经添加了一个关于如何找到问题值（和索引）的建议。是否有一种更系统的调试模式允许pandas报告在任何异常中哪些行失败？@FredericBazin，您可以安排代码。或者，使用IPython，您可以在出现未捕获的异常时使用its启动调试器。进入调试器后，您可以打印当前行
@Frede的值ricBazin：但是请注意，只有当行的值可以从发生异常的帧中进行内省时，这才有效。如果您调用的是运行Cython/C/C++/Fortran代码并在行中循环的NumPy或Pandas方法，那么Python调试器将不允许您内省外部代码中变量的状态这就是为什么在上面的代码中，我在Python中对astype
进行了一个粗略的模拟，这样就可以从Python中找到行的值。谢谢。这解决了我的问题。我还尝试将所有内容转换为“float”，即使字符串作为值，它也能正常工作。不过，这是一个一般性的问题：有没有办法一步一步进入到错误中去指出什么是价值




[ionic2]相关文章推荐



                                                        
Ionic2 如何设置ion datetime元素的默认值？
ionic2 
Ionic2 离子2离子选择-以编程方式关闭选择框
ionic2 
Ionic2 类型存储上不存在属性集
ionic2 
ionic2电子和本地资产
ionic2electron 
Ionic2始终显示左侧菜单
ionic2 
Ionic2 离子清爽剂不'；我不能正常工作
ionic2 
Ionic2 如何禁用某些导航栏的工具栏背景？
ionic2 
Ionic2 离子范围滑块不工作
ionic2 
Ionic2 由于网络错误，流无法连接。确保您的连接是'；t被防火墙阻止-Tokbox错误
ionic2 
如何在Ionic2构建脚本中运行复制命令
ionic2 
Ionic2 使用角度反应形式禁用/启用离子输入
ionic2 
在屏幕上显示Ionic2中config.xml的版本
ionic2 
Ionic2 运行ionic ios构建--prod--release时出错（@ionic native/core/decorators.d.ts，找到版本4，预期为3）
ionic2 
                                       





随机文章推荐



                                                        
Stanford nlp NLP Postagger can'；格洛克命令？
stanford-nlp 
Stanford nlp 如何在StanfordCoreNLP中加载特定分类器
stanford-nlp 
Stanford nlp 关于创建斯坦福CoreNLP培训模型的问题
stanford-nlp 
Stanford nlp 斯坦福大学有多快；s CoreNLP情绪分析工具？
我正试图找出在一个相当于100万条IMDB评论的数据集上使用CoreNLP情绪分析工具（）是否可行
stanford-nlp 
Stanford nlp 斯坦福CRFClassizer性能评估输出
stanford-nlp


                                        

                                        
                                        


                                                
                                                        [pandas]相关推荐
                                                        
Pandas 使用带整数标签的多索引对数据帧进行部分更新
									Pandas
							 
Pandas 熊猫：将值重新分配给'；区块'；一个数据帧的
									Pandas
							 									Dataframe
							 
Pandas 无法正确检测序列的数据类型
									Pandas
							 
Pandas `dataframe.groupby（allcolumns）.agg（len）的行为不一致`
									Pandas
							 
Pandas 如何在seaborn中自定义箱线图的面颜色和填充图案？
									Pandas
							 									Matplotlib
							 
Pandas 熊猫-索引不在索引中
									Pandas
							 									Indexing
							 
如何按元素选择Pandas.DataFrame'；长度
									Pandas
							 
Pandas 如何确定在特定方向上更改的列值？
									Pandas
							 
Pandas 有没有办法通过将现有数据帧的一列与其余列相乘来快速创建新数据帧
									Pandas
							 									Dataframe
							 
Pandas 熊猫：使用fillna和数据帧作为值参数
									Pandas
							 
Pandas 熊猫：如何在将序列指定为行时向数据帧添加缺少的列
									Pandas
							 									Dataframe
							 
Pandas 如何绘制这张图？
									Pandas
							 									Matplotlib
							 
Pandas 为什么按结果分组的输出中缺少列名？
									Pandas
							 
Pandas 组成员：df.hist（）组成员
									Pandas
							 									Matplotlib
							 
Pandas 格式化分组数据帧
									Pandas
							 									Dataframe
							 
Pandas 根据标记获取最后一行DataFrame
									Pandas
							 
Pandas 按分组添加百分比列
									Pandas
							 
Pandas 将缺失值和异常值作为中位数进行插补，中位数计算中不包括异常值
计算缺失值和极值的中位数，中位数计算中不包括这些极值。
									Pandas
							 									Scikit Learn
							 
Pandas 基于多列中的值合并列
									Pandas
							 									Merge
							 
如何使用pandas中的条件在groupby中设置标志
									Pandas
							 
Pandas 如何为从另一个数据帧创建的变量获取1/0值（根据条件）？（熊猫/努比）
									Pandas
							 
Pandas 如果一列包含2个子字符串，则将该列中的值替换为另一列中的值
									Pandas
							 
Pandas 列上的条件操作
									Pandas
							 
如何使用pandas查找给定范围内的计数
									Pandas
							 
Pandas 检查dataframe列中是否存在值
									Pandas
							 									Dataframe
							 
在服务器上脱机运行Python、Anaconda、Pandas、Numpy获取依赖项错误
									Pandas
							 									Virtual Machine
							 
Pandas 遍历行和条件计数
									Pandas
							 
Pandas 熊猫计算日期之间的差异，得到一个非常大的数字
									Pandas
							 									Date
							 
Pandas 如何在熊猫数据框中搜索数据类型为：object的特定日期（2010-12-01）？
									Pandas
							 
Pandas 在单个批次上运行udf
									Pandas
							 									Apache Spark
							 									Pyspark
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Cryptography
Primefaces
Stata
Unit Testing
Zend Framework
Lisp
Pandas
Events
Button
Hbase
Apache Flink
Variables
Actionscript 3
Qml
Reference
Java Me
Version Control
Svg
Database
Dynamics Crm
Telegram
Concurrency
Wicket
Netty
Redis
Vhdl
Apache Storm
Karate
Emacs
Msbuild
Image Processing
Exception Handling
Oracle Apex
Yocto
Laravel 4
Windows Phone 8
Cocoa Touch
Linkedin
Matrix
Jboss
Sonarqube
Windbg
Swift2
Rdf
Fonts
Nuget
Electron
Layout
Macros
Vue.js
Browser
Libgdx
Object
Mono
Authentication
Nlp
Jupyter Notebook
Unix
Highcharts
Data Binding
Odoo
Scikit Learn
Forms
Ruby On Rails 3
Bash
Kubernetes
Go
Sprite Kit
Twig
Gwt
Ipython
Doxygen
Express
Uwp
Cocos2d X
Web Scraping
Parse Platform
Clojure
Google Cloud Firestore
Gps
Stanford Nlp
Windows Store Apps
Axapta
Tabs
Google Cloud Storage
Cloud
Datetime
Winforms
Tkinter
Spring
Discord.js
Jsf
Prolog
Sql Server
Validation
Notepad++
Orm
Interface
Serialization
Security
Internationalization
Mpi
Mqtt
Angular
Paypal
Http
Hybris
Orientdb
Sugarcrm
Routes
Visual Studio 2015
Synchronization
Chart.js
Docker
Tfs
Processing
Amazon S3
Yii
Testng
Resharper
Azure Data Factory
Nginx
Enums
Xamarin.android
Crystal Reports
Computer Science
Date
Model View Controller
Robotframework
Requirejs
3d
Inheritance
Prometheus
Iphone
Selenium
Jsf 2
Maps
Uitableview
Flask
Google Maps
Stored Procedures
Visual Studio 2013
Virtual Machine
Ruby On Rails 3.2
Internet Explorer 8
Netbeans
Sas
Xampp
Terminal
Character Encoding
Cron
Nest
Open Source
Plot
Shopify
Twitter
Z3
Ssl
Optimization
Cypress
Umbraco
Asp.net
Properties
Jetty
Select
Reactjs
Arm
Email
Dynamics Crm 2011
Neo4j
Arangodb
Performance
Sip
Shell
Xamarin.forms
C# 4.0
Process
Mongodb
Arduino
Cygwin
Nativescript
Redux
Gcc
Uml
Objective C
Speech Recognition
Notifications
Azure Ad B2c
Com
Blockchain
Kibana
Tags
Hibernate
Jwt
Dns
Delphi
Bazel
Ios
Extjs
Javafx 2
Csv


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网