Python 3.x 使用前2行和最后一行字符串从凌乱的文件中读取和绘制数据_Python 3.x_Matplotlib_Graph_Read Data

Python 3.x 使用前2行和最后一行字符串从凌乱的文件中读取和绘制数据

python-3.x matplotlib graph

Python 3.x 使用前2行和最后一行字符串从凌乱的文件中读取和绘制数据,python-3.x,matplotlib,graph,read-data,Python 3.x,Matplotlib,Graph,Read Data,如果有任何类似的问题和答案，请评论下来。到目前为止，在浏览之后，我看到了类似于Java的问题，但不是Python 我试图从一个凌乱的文件（没有标题）中获取数据，读取并绘制它。重要的列是#6（用于X轴/名称）、#19（用于Y轴/秒）和#23（用于标签）秒列需要除以1000。数据文件由一堆其他注释混合而成。但是，我试图用数据中的模式来绘制图表。这些列之间用空格隔开。它以read seq开头，以字母a、b、c或d结尾。否则，这条线不是我想要画的示例图如下所示请注意，数据没有模式。对于其余的列，

如果有任何类似的问题和答案，请评论下来。到目前为止，在浏览之后，我看到了类似于Java的问题，但不是Python

我试图从一个凌乱的文件（没有标题）中获取数据，读取并绘制它。重要的列是#6（用于X轴/名称）、#19（用于Y轴/秒）和#23（用于标签）
秒列需要除以1000。
数据文件由一堆其他注释混合而成。但是，我试图用数据中的模式来绘制图表。这些列之间用空格隔开。它以
read seq
开头，以字母
a
、
b
、
c
或
d
结尾。否则，这条线不是我想要画的
示例图如下所示
请注意，数据没有模式。对于其余的列，如下所示。我以
c2.a
，
c3.z
等为例，这样在阅读时比较列就很容易了
示例图如下所示：
到目前为止，我有以下几点：

import pandas as pd parser = argparse.ArgumentParser() parser.add_argument('File', help="Enter the file name to graph it | At least one file is required to graph") args=parser.parse_args() file = args.file file_1 = pd.read_csv(file, sep=" ", header=None)
感谢您的帮助

编辑1: 我的代码如下，但错误如下：

import pandas as pd import seaborn as sns df_dict = pd.read_csv('RESULTS-20190520') df = pd.DataFrame(df_dict) # Note that the 'read' and 'seq' values were imported as separate columns. # .loc selects rows where the first and second columns are 'read' and 'seq' respectively # and where the final column has a string pattern ending with a|b|c|d. Note you can change the case argument if desired. # Finally, we return only columns 6, 19, and 22 since that's all we care about. df = df.loc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]] # Rename vars and manipulate per edits df['x'] = df[6] # Divide y-var by 1000 df['y'] = df[19] / 1000 # Use pandas' str.replace regex functionality to clean string column df['cat'] = df[22].str.replace(pat=r'(\d+)(\D+)-(.*)', repl=r'\1-\3') # This should be a lineplot, but as you didn't provide enough sample data, a scatterplot shows the concept. sns.lineplot(data=df, x='x', y='y', hue='cat', markers=True)
错误：

Traceback (most recent call last): File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\...\TEST1.py", line 12, in <module> df = df.iloc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]] File "C:\...\Python\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0

回溯（最近一次呼叫最后一次）：文件“C:\…\Python\lib\site packages\pandas\core\index\base.py”，第2657行，在get\u loc中返回发动机。获取位置（钥匙） pandas.\u libs.index.IndexEngine.get\u loc中的文件“pandas\\u libs\index.pyx”，第108行 pandas.\u libs.index.IndexEngine.get\u loc中第132行的文件“pandas\\u libs\index.pyx” pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”，第1601行 pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”，第1608行关键错误：0 在处理上述异常期间，发生了另一个异常：回溯（最近一次呼叫最后一次）：文件“C:\…\TEST1.py”，第12行，在 df=df.iloc[（df[0]=='read'）&（df[1]=='seq'）&df[22].str.match（pat=r'^.*a$|^.*b$| ^.*c$| ^.*d$，case=False），[6,19,22]] 文件“C:\…\Python\lib\site packages\pandas\core\frame.py”，第2927行，在u getitem中__ indexer=self.columns.get_loc（键）文件“C:\…\Python\lib\site packages\pandas\core\index\base.py”，第2659行，在get\u loc中返回self.\u引擎。获取\u loc（self.\u可能\u cast\u索引器（键）） pandas.\u libs.index.IndexEngine.get\u loc中的文件“pandas\\u libs\index.pyx”，第108行 pandas.\u libs.index.IndexEngine.get\u loc中第132行的文件“pandas\\u libs\index.pyx” pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”，第1601行 pandas.\u libs.hashtable.PyObjectHashTable.get\u项中的文件“pandas\\u libs\hashtable\u class\u helper.pxi”，第1608行关键错误：0
从使用
pd.read\u剪贴板（sep='\s'，header=None）读入并使用df.to\u dict（）保存的示例数据开始，这似乎是（如果我理解正确的话）使用布尔条件相当简单地应用.loc ，然后进行绘图（这里是一个很好的选项，因为它提供了一个方便的hue 参数）安装程序将熊猫作为pd导入导入seaborn作为sns df_dict={0:{0:'read'，1:'read'，2:'read'，3:'read'，4:'read'，5:'read'}， 1:{0:'seq'，1:'seq'，2:'seq'，3:'seq'，4:'seq'，5:'seq'}， 2:{0:'c2'，1:'c2'，2:'c2'，3:'c2'，4:'c2'，5:'c2'}， 3:{0:'c3'，1:'c3'，2:'c3'，3:'c3'，4:'c3'，5:'c3'}， 4:{0:'c4'，1:'c4'，2:'c4'，3:'c4'，4:'c4'，5:'c4'}， 5:{0:'c5'，1:'c5'，2:'c5'，3:'c5'，4:'c5'，5:'c5'}， 6: {0: 17, 1: 15, 2: 1, 3: 5, 4: 18, 5: 11}, 7:{0:'c7'，1:'c7'，2:'c7'，3:'c7'，4:'c7'，5:'c7'}， 8:{0:'c8'，1:'c8'，2:'c8'，3:'c8'，4:'c8'，5:'c8'}， 9:{0:'c9'，1:'c9'，2:'c9'，3:'c9'，4:'c9'，5:'c9'}， 10:{0:'c10'，1:'c10'，2:'c10'，3:'c10'，4:'c10'，5:'c10'}， 11:{0:'c11'，1:'c11'，2:'c11'，3:'c11'，4:'c11'，5:'c11'}， 12:{0:'c12'，1:'c12'，2:'c12'，3:'c12'，4:'c12'，5:'c12'}， 13:{0:'c13'，1:'c13'，2:'c13'，3:'c13'，4:'c13'，5:'c13'}， 14:{0:'c14'，1:'c14'，2:'c14'，3:'c14'，4:'c14'，5:'c14'}， 15:{0:'c15'，1:'c15'，2:'c15'，3:'c15'，4:'c15'，5:'c15'}， 16:{0:'c16'，1:'c16'，2:'c16'，3:'c16'，4:'c16'，5:'c16'}， 17:{0:'c17'，1:'c17'，2:'c17'，3:'c17'，4:'c17'，5:'c17'}， 18:{0:'c18'，1:'c18'，2:'c18'，3:'c18'，4:'c18'，5:'c18'}， 19: {0: 3193.22, 1: 864.8, 2: 1214.42, 3: 1192.15, 4: 1866.22, 5: 2822.11}, 20:{0:'c20'，1:'c20'，2:'c20'，3:'c20'，4:'c20'，5:'c20'}， 21:{0:'c21'，1:'c21'，2:'c21'，3:'c21'，4:'c21'，5:'c21'}， 22:{0:'1G-b'，1:'1G-c'，2:'1G-d'，3:'10G-a'，4:'1G-a'，5:'10G-c'} df=pd.数据帧（df_dict） #请注意，“read”和“seq”值作为单独的列导入。使用.loc和.str.match（）`过滤记录，然后打印 #.loc选择第一列和第二列分别为“read”和“seq”的行 #最后一列有一个以| b | c | d结尾的字符串模式。注意，如果需要，可以更改case参数。 #最后，我们只返回第6、19和22列，因为这就是我们所关心的。 df=df.loc[（df[0]=“read”）和（df[1]=“seq”） &df[22].str.match（pat=r'^.*a$| ^.*b$| ^.*c$| ^.*d$，case=False）， [6,19,22]] #重命名变量并按编辑操作 df['x']=df[6] #将y-var除以1000 df['y']=df[19]/1000 #使用pandas的str.replace regex功能清除字符串列 df['cat']=df[22].str.replace（pat=r'（\d+）（\d+）-（.*），repl=r'\1-\3'） #这应该是一个线形图，但由于没有提供足够的样本数据，散点图显示了这个概念。散点图（数据=df，x='x'，y='y'，色调=cat'）你好@Brendan。谢谢你的回答。我刚刚意识到 Traceback (most recent call last): File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\...\TEST1.py", line 12, in <module> df = df.iloc[(df[0] == 'read') & (df[1] == 'seq') & df[22].str.match(pat=r'^.*a$|^.*b$|^.*c$|^.*d$', case=False), [6,19,22]] File "C:\...\Python\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "C:\...\Python\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0