Python 从具有非数字索引的数据帧中删除行_Python_Pandas_Dataframe_Indexing_Numeric

Python 从具有非数字索引的数据帧中删除行

python pandas dataframe indexing

Python 从具有非数字索引的数据帧中删除行,python,pandas,dataframe,indexing,numeric,Python,Pandas,Dataframe,Indexing,Numeric,我一直在使用pandas对CSV文件进行一些有趣的过滤，但遇到了障碍。我试图检查我的索引列中是否有乱码文本（非整数）数据，并删除这些行。我尝试在导入时使用条件将它们从dataframe中删除，并尝试在导入后将它们迭代出来，但没有成功。以下是一个例子： df = pd.read_csv(file, encoding='cp1252').set_index("numbers") results = df[df["columnA"].str.contains("search_data") &

我一直在使用pandas对CSV文件进行一些有趣的过滤，但遇到了障碍。我试图检查我的索引列中是否有乱码文本（非整数）数据，并删除这些行。我尝试在导入时使用条件将它们从dataframe中删除，并尝试在导入后将它们迭代出来，但没有成功。以下是一个例子：

df = pd.read_csv(file, encoding='cp1252').set_index("numbers")
results = df[df["columnA"].str.contains("search_data") & ~df["columnB"].isin(seach_list)]
#I need to add to the above statement to check column "numbers" which I have set to be the index,
#to catch some expected garbled text and filter it out... because it is
#an integer, I can't use str.contains or isdigit or isalnum, I've tried to do len(df["columns"] < 20 , df.index < 20 .... i've tried 
#i've tried a few other options at this point as well
# after bringing it in, I've also tried iterating through it:
#
for index, row in results.iterrows():
    if not (isinstance( row["numbers"], int )):
         print(str(row["numbers"]))
         #append whole row to new dataframe
#This also didn't work

作为补充说明，我必须更改pd函数的编码，以便在存在一些非utf-8数据时仍能读取文件中的所有良好数据。。。否则它将在导入时抛出错误。

您可以使用

pd.to\u numeric

将

数字列转换为数字。所有非数字项都将强制为NaN
，然后您可以删除这些行
df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')

df = df.dropna(subset=['numbers']).set_index('numbers')

df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')

df = df.dropna(subset=['numbers']).set_index('numbers')