Python 3.x 使用前2行和最后一行字符串从凌乱的文件中读取和绘制数据

Python 3.x 使用前2行和最后一行字符串从凌乱的文件中读取和绘制数据,python-3.x,matplotlib,graph,read-data,Python 3.x,Matplotlib,Graph,Read Data,如果有任何类似的问题和答案,请评论下来。到目前为止,在浏览之后,我看到了类似于Java的问题,但不是Python 我试图从一个凌乱的文件(没有标题)中获取数据,读取并绘制它。重要的列是#6(用于X轴/名称)、#19(用于Y轴/秒)和#23(用于标签) 秒列需要除以1000。 数据文件由一堆其他注释混合而成。但是,我试图用数据中的模式来绘制图表。这些列之间用空格隔开。它以read seq开头,以字母a、b、c或d结尾。否则,这条线不是我想要画的 示例图如下所示 请注意,数据没有模式。对于其余的列,

如果有任何类似的问题和答案,请评论下来。到目前为止,在浏览之后,我看到了类似于Java的问题,但不是Python

我试图从一个凌乱的文件(没有标题)中获取数据,读取并绘制它。重要的列是#6(用于X轴/名称)#19(用于Y轴/秒)#23(用于标签)

秒列需要除以1000。

数据文件由一堆其他注释混合而成。但是,我试图用数据中的模式来绘制图表。这些列之间用空格隔开。它以
read seq
开头,以字母
a
b
c
d
结尾。否则,这条线不是我想要画的

示例图如下所示

请注意,数据没有模式。对于其余的列,如下所示。我以
c2.a
c3.z
等为例,这样在阅读时比较列就很容易了

示例图如下所示:

到目前为止,我有以下几点:

import pandas as pd  

parser = argparse.ArgumentParser()
parser.add_argument('File', help="Enter the file name to graph it | At least one file is required to graph")

args=parser.parse_args()

file = args.file
file_1 = pd.read_csv(file, sep=" ", header=None)

感谢您的帮助


编辑1: 我的代码如下,但错误如下:

import pandas as pd
import seaborn as sns

df_dict = pd.read_csv('RESULTS-20190520')

df = pd.DataFrame(df_dict)
# Note that the 'read' and 'seq' values were imported as separate columns. 

# .loc selects rows where the first and second columns are 'read' and 'seq' respectively
# and where the final column has a string pattern ending with a|b|c|d. Note you can change the case argument if desired.
# Finally, we return only columns 6, 19, and 22 since that's all we care about.
df = df.loc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]]

# Rename vars and manipulate per edits
df['x'] = df[6]
# Divide y-var by 1000
df['y'] = df[19] / 1000 
# Use pandas' str.replace regex functionality to clean string column
df['cat'] = df[22].str.replace(pat=r'(\d+)(\D+)-(.*)', repl=r'\1-\3')

# This should be a lineplot, but as you didn't provide enough sample data, a scatterplot shows the concept. 
sns.lineplot(data=df, x='x', y='y', hue='cat', markers=True)
错误:

Traceback (most recent call last):
  File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\...\TEST1.py", line 12, in <module>
    df = df.iloc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]]
  File "C:\...\Python\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
回溯(最近一次呼叫最后一次):
文件“C:\…\Python\lib\site packages\pandas\core\index\base.py”,第2657行,在get\u loc中
返回发动机。获取位置(钥匙)
pandas.\u libs.index.IndexEngine.get\u loc中的文件“pandas\\u libs\index.pyx”,第108行
pandas.\u libs.index.IndexEngine.get\u loc中第132行的文件“pandas\\u libs\index.pyx”
pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”,第1601行
pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”,第1608行
关键错误:0
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“C:\…\TEST1.py”,第12行,在
df=df.iloc[(df[0]=='read')&(df[1]=='seq')&df[22].str.match(pat=r'^.*a$|^.*b$| ^.*c$| ^.*d$,case=False),[6,19,22]]
文件“C:\…\Python\lib\site packages\pandas\core\frame.py”,第2927行,在u getitem中__
indexer=self.columns.get_loc(键)
文件“C:\…\Python\lib\site packages\pandas\core\index\base.py”,第2659行,在get\u loc中
返回self.\u引擎。获取\u loc(self.\u可能\u cast\u索引器(键))
pandas.\u libs.index.IndexEngine.get\u loc中的文件“pandas\\u libs\index.pyx”,第108行
pandas.\u libs.index.IndexEngine.get\u loc中第132行的文件“pandas\\u libs\index.pyx”
pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”,第1601行
pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”,第1608行
关键错误:0

从使用
pd.read\u剪贴板(sep='\s',header=None)读入并使用
df.to\u dict()
保存的示例数据开始,这似乎是(如果我理解正确的话)使用布尔条件相当简单地应用
.loc
,然后进行绘图(这里是一个很好的选项,因为它提供了一个方便的
hue
参数)

安装程序
将熊猫作为pd导入
导入seaborn作为sns
df_dict={0:{0:'read',1:'read',2:'read',3:'read',4:'read',5:'read'},
1:{0:'seq',1:'seq',2:'seq',3:'seq',4:'seq',5:'seq'},
2:{0:'c2',1:'c2',2:'c2',3:'c2',4:'c2',5:'c2'},
3:{0:'c3',1:'c3',2:'c3',3:'c3',4:'c3',5:'c3'},
4:{0:'c4',1:'c4',2:'c4',3:'c4',4:'c4',5:'c4'},
5:{0:'c5',1:'c5',2:'c5',3:'c5',4:'c5',5:'c5'},
6: {0: 17, 1: 15, 2: 1, 3: 5, 4: 18, 5: 11},
7:{0:'c7',1:'c7',2:'c7',3:'c7',4:'c7',5:'c7'},
8:{0:'c8',1:'c8',2:'c8',3:'c8',4:'c8',5:'c8'},
9:{0:'c9',1:'c9',2:'c9',3:'c9',4:'c9',5:'c9'},
10:{0:'c10',1:'c10',2:'c10',3:'c10',4:'c10',5:'c10'},
11:{0:'c11',1:'c11',2:'c11',3:'c11',4:'c11',5:'c11'},
12:{0:'c12',1:'c12',2:'c12',3:'c12',4:'c12',5:'c12'},
13:{0:'c13',1:'c13',2:'c13',3:'c13',4:'c13',5:'c13'},
14:{0:'c14',1:'c14',2:'c14',3:'c14',4:'c14',5:'c14'},
15:{0:'c15',1:'c15',2:'c15',3:'c15',4:'c15',5:'c15'},
16:{0:'c16',1:'c16',2:'c16',3:'c16',4:'c16',5:'c16'},
17:{0:'c17',1:'c17',2:'c17',3:'c17',4:'c17',5:'c17'},
18:{0:'c18',1:'c18',2:'c18',3:'c18',4:'c18',5:'c18'},
19: {0: 3193.22, 1: 864.8, 2: 1214.42, 3: 1192.15, 4: 1866.22, 5: 2822.11},
20:{0:'c20',1:'c20',2:'c20',3:'c20',4:'c20',5:'c20'},
21:{0:'c21',1:'c21',2:'c21',3:'c21',4:'c21',5:'c21'},
22:{0:'1G-b',1:'1G-c',2:'1G-d',3:'10G-a',4:'1G-a',5:'10G-c'}
df=pd.数据帧(df_dict)
#请注意,“read”和“seq”值作为单独的列导入。
使用
.loc
和.str.match()`过滤记录,然后打印
#.loc选择第一列和第二列分别为“read”和“seq”的行
#最后一列有一个以| b | c | d结尾的字符串模式。注意,如果需要,可以更改case参数。
#最后,我们只返回第6、19和22列,因为这就是我们所关心的。
df=df.loc[(df[0]=“read”)和(df[1]=“seq”)
&df[22].str.match(pat=r'^.*a$| ^.*b$| ^.*c$| ^.*d$,case=False),
[6,19,22]]
#重命名变量并按编辑操作
df['x']=df[6]
#将y-var除以1000
df['y']=df[19]/1000
#使用pandas的str.replace regex功能清除字符串列
df['cat']=df[22].str.replace(pat=r'(\d+)(\d+)-(.*),repl=r'\1-\3')
#这应该是一个线形图,但由于没有提供足够的样本数据,散点图显示了这个概念。
散点图(数据=df,x='x',y='y',色调=cat')

你好@Brendan。谢谢你的回答。我刚刚意识到
Traceback (most recent call last):
  File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\...\TEST1.py", line 12, in <module>
    df = df.iloc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]]
  File "C:\...\Python\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0