Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中读取带有panda的文件时忽略空数据帧_Python_Pandas_Dataframe - Fatal编程技术网

在Python中读取带有panda的文件时忽略空数据帧

在Python中读取带有panda的文件时忽略空数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个txt文件: `Empty DataFrame Columns: [0, 1, 2, 3, 4] Index: [] Empty DataFrame Columns: [0, 1, 2, 3, 4] Index: [] 0 1 2 \ 46 RNA/4v6p.csv,46AA/U/551 RNA/4v6p.csv,4

我有这样一个txt文件:

`Empty DataFrame 
 Columns: [0, 1, 2, 3, 4]
 Index: []
 Empty DataFrame
 Columns: [0, 1, 2, 3, 4]
 Index: []
                       0                         1                           2  \
46   RNA/4v6p.csv,46AA/U/551    RNA/4v6p.csv,46AA/A/33         RNA/4v6p.csv,46WW_cis   
47   RNA/4v6p.csv,46AA/G/550    RNA/4v6p.csv,46AA/C/34         RNA/4v6p.csv,46WW_cis   
48   RNA/4v6p.csv,46AA/A/553    RNA/4v6p.csv,46AA/U/30         RNA/4v6p.csv,46WW_cis   
49   RNA/4v6p.csv,46AA/U/552    RNA/4v6p.csv,46AA/A/33         RNA/4v6p.csv,46WW_cis   
50   RNA/4v6p.csv,46AA/U/1199   RNA/4v6p.csv,46AA/G/1058       RNA/4v6p.csv,46WW_cis   

     3   4  
46 NaN NaN  
47 NaN NaN  
48 NaN NaN  
49 NaN NaN  
50 NaN NaN`
我想把它读入一个有3列的数组。目前,我尝试使用
pd.read\u csv(self.filename,delim\u whitespace=True)
,但在尝试读取
空数据帧时,这会给我带来很多错误。如何使程序忽略此部分

编辑: 若我的文件中并没有空的数据帧,那个么最佳的解决方案就是。该文件是在许多文件中搜索的结果,其中一些文件是空的。我认为我已经通过提供一个异常过滤了空文件,这样在空文件中搜索的效果就不会存储在结果中。我想我做得不对。有人能纠正我吗

from numpy import numpy.mean as nm
def find_same_direction_chain(self, results):
         separation= lambda x: pd.Series([i for i in x.split('/')])
         left_chain=self.data[0].apply(separation)
         right_chain=self.data[1].apply(separation)
         i=1
         try:
            while i<len(self.data[:])-5:
                if nm(left_chain[2][i:i+3])>=nm(left_chain[2][i+2:i+5])  and nm(right_chain[2][i:i+3])>=nm(right_chain[2][i+2:i+5]) and len(self.data[:])>0:   
                    if nm(left_chain[2][i+2:i+5])>=nm(left_chain[2][i+4:i+7])  and nm(right_chain[2][i+2:i+5])>=nm(right_chain[2][i+4:i+7]):   
                        results.chains.append(str(self.filename+", "+str(i)+self.data[0:3][i:i+5]))

                else: pass
                i+=1
         except ValueError:
                    results.bin.append(self.filename)
         except TypeError:
                    results.data_structure_error.append(self.filename)
从numpy导入numpy.mean作为nm
def查找相同方向链(自身、结果):
分离=λx:pd.系列([i代表x.split('/')中的i)
左链=self.data[0]。应用(分隔)
右链=self.data[1]。应用(分隔)
i=1
尝试:
而i=nm(左链[2][i+2:i+5])和nm(右链[2][i:i+3])>=nm(右链[2][i+2:i+5])和len(自身数据[:])>0:
如果nm(左链[2][i+2:i+5])>=nm(左链[2][i+4:i+7])和nm(右链[2][i+2:i+5])>=nm(右链[2][i+4:i+7]):
results.chains.append(str(self.filename+“,“+str(i)+self.data[0:3][i:i+5]))
其他:通过
i+=1
除值错误外:
results.bin.append(self.filename)
除类型错误外:
results.data\u structure\u error.append(self.filename)
您可以使用:

import pandas as pd
import io

temp=u"""Empty DataFrame 
 Columns: [0, 1, 2, 3, 4]
 Index: []
 Empty DataFrame
 Columns: [0, 1, 2, 3, 4]
 Index: []
                       0                         1                           2  \
46   RNA/4v6p.csv,46AA/U/551    RNA/4v6p.csv,46AA/A/33         RNA/4v6p.csv,46WW_cis   
47   RNA/4v6p.csv,46AA/G/550    RNA/4v6p.csv,46AA/C/34         RNA/4v6p.csv,46WW_cis   
48   RNA/4v6p.csv,46AA/A/553    RNA/4v6p.csv,46AA/U/30         RNA/4v6p.csv,46WW_cis   
49   RNA/4v6p.csv,46AA/U/552    RNA/4v6p.csv,46AA/A/33         RNA/4v6p.csv,46WW_cis   
50   RNA/4v6p.csv,46AA/U/1199   RNA/4v6p.csv,46AA/G/1058       RNA/4v6p.csv,46WW_cis   

     3   4  
46 NaN NaN  
47 NaN NaN  
48 NaN NaN  
49 NaN NaN  
50 NaN NaN"""
或在以下位置使用
skiprows
的解决方案:

编辑:

您可以尝试更改(我没有样本数据,因此未经测试):

致:


我想我不能使用skiprows,因为在我的文件中,空的数据帧部分被不规则地放置。好的,尝试第一个不使用
skiprows
的解决方案。但最好是在写入文件之前过滤空的
DataFrames
,例如
打印[df for df in dfs if len(df)>0]
(dfs是
数据帧的
列表
)这可能是我所需要的,尽管当数据帧中的某些元素的条件得到满足时,我会将它们保存到一个列表中,比如:
results.chains.append(str(self.filename+”,“+str(I)+self.data[0:3][I:I+5])
,然后我会将这个列表保存到一个文件中,其中的
打开(“chains.txt”,“a+”,作为f:
f.write(“\n.join”)(自我结果链)
所以我想知道,为什么我的文件中有空的数据帧?它们是如何到达的?我想这真的很难帮助你,因为这是未完成的代码
nm
,测试数据丢失。但是如果
结果
数据帧的
列表
,请尝试用
附加
检查代码,以及
df
的空位置>如果len(self.data[0:3][i:i+5])>0:results.chains.append(str(self.filename+”,“+str(i)+self.data[0:3][i:i+5]),则添加
但中没有数据,则无法测试。
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), delim_whitespace=True, names=range(7))

#remove rows with NaN in columns 0 - 3
df = df.dropna(subset=[0,1,2,3])

#remove rows where first column contains text 'Columns'
df = df[~df.iloc[:,0].str.contains('Columns')] 

#shift first row
df.iloc[0,:] = df.iloc[0,:].shift(-3)

#set first column to index
df = df.set_index(df.iloc[:,0])
#remove unnecessary columns
df = df.drop([0,4,5,6], axis=1)
print df
                           1                         2                      3
0                                                                            
46   RNA/4v6p.csv,46AA/U/551    RNA/4v6p.csv,46AA/A/33  RNA/4v6p.csv,46WW_cis
47   RNA/4v6p.csv,46AA/G/550    RNA/4v6p.csv,46AA/C/34  RNA/4v6p.csv,46WW_cis
48   RNA/4v6p.csv,46AA/A/553    RNA/4v6p.csv,46AA/U/30  RNA/4v6p.csv,46WW_cis
49   RNA/4v6p.csv,46AA/U/552    RNA/4v6p.csv,46AA/A/33  RNA/4v6p.csv,46WW_cis
50  RNA/4v6p.csv,46AA/U/1199  RNA/4v6p.csv,46AA/G/1058  RNA/4v6p.csv,46WW_cis
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), delim_whitespace=True, names=range(7), skiprows=6)

#remove rows with NaN
df = df.dropna(subset=[0,1,2,3])

#shift first row
df.iloc[0,:] = df.iloc[0,:].shift(-3)

#set first column to index
df = df.set_index(df.iloc[:,0])
#remove unnecessary columns
df = df.drop([0,4,5,6], axis=1)
print df
                           1                         2                      3
0                                                                            
46   RNA/4v6p.csv,46AA/U/551    RNA/4v6p.csv,46AA/A/33  RNA/4v6p.csv,46WW_cis
47   RNA/4v6p.csv,46AA/G/550    RNA/4v6p.csv,46AA/C/34  RNA/4v6p.csv,46WW_cis
48   RNA/4v6p.csv,46AA/A/553    RNA/4v6p.csv,46AA/U/30  RNA/4v6p.csv,46WW_cis
49   RNA/4v6p.csv,46AA/U/552    RNA/4v6p.csv,46AA/A/33  RNA/4v6p.csv,46WW_cis
50  RNA/4v6p.csv,46AA/U/1199  RNA/4v6p.csv,46AA/G/1058  RNA/4v6p.csv,46WW_cis
results.chains.append(str(self.filename+", "+str(i)+self.data[0:3][i:i+5]))
if len(self.data[0:3][i:i+5]) > 0:                      
    results.chains.append(str(self.filename+", "+str(i)+self.data[0:3][i:i+5]))