Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于在OSX和raspbian中工作的蛋白质数据库的Python脚本在Ubuntu中不工作_Python_Python 3.x_Pandas_Ubuntu - Fatal编程技术网

用于在OSX和raspbian中工作的蛋白质数据库的Python脚本在Ubuntu中不工作

用于在OSX和raspbian中工作的蛋白质数据库的Python脚本在Ubuntu中不工作,python,python-3.x,pandas,ubuntu,Python,Python 3.x,Pandas,Ubuntu,出于某种原因,我的python脚本在MAC OSX和raspbian buster中都能工作(是的,我在绝望的时刻在一个树莓中尝试过),但它在Ubuntu 18中不起作用,所以我在我的主PC中使用了它。我甚至在其他PC中尝试过新安装Ubuntu Mate 20,但它仍然不起作用 以下是脚本: import sys import csv from http.client import IncompleteRead import pandas as pd from Bio import Entrez

出于某种原因,我的python脚本在MAC OSX和raspbian buster中都能工作(是的,我在绝望的时刻在一个树莓中尝试过),但它在Ubuntu 18中不起作用,所以我在我的主PC中使用了它。我甚至在其他PC中尝试过新安装Ubuntu Mate 20,但它仍然不起作用

以下是脚本:

import sys
import csv
from http.client import IncompleteRead
import pandas as pd
from Bio import Entrez
Entrez.email = ""

    

# get from WPs accession, corresponding assembly, NC IDs, strains names. Write a csv table with all these as final data tablee,
#+ a table with WPs and Assembly IDs for inputting in FLAG

list_of_accession = []
with open (sys.argv[1], 'r') as csvfile:
    efetchin=csv.reader(csvfile, delimiter = ',')
    for row in efetchin:
        list_of_accession.append(str(row[0]))
        
with open('efetch_output.txt', mode = 'w') as efetch_output:
    efetch_output = csv.writer(efetch_output, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    efetch_output.writerow(['ID','Source', 'Nucleotide Accession', 'Start', 'Stop', 'Strand', 'Protein', 'Protein Name', 'Organism', ' Strain', 'Assembly'])

input_handle = Entrez.efetch(db="protein", id= list_of_accession, rettype="ipg", retmode="tsv")
for line in input_handle:
    print(line, file=open('efetch_output.txt','a'))
input_handle.close()
#process file in pandas
file_name = "efetch_output.txt"
file_name_output = "final_output.tsv"
df = pd.read_csv(file_name, sep="\t", low_memory=False)
# Get names of indexes for which rows have to be dropped
indexNames = df[ df['Source'] == 'INSDC'].index
# Delete these row indexes from dataFrame
df.drop(indexNames , inplace=True)
#rearrange table columns
df = df[['ID', 'Source', 'Nucleotide Accession', 'Protein', 'Protein Name', 'Start', 'Stop', 'Strand', 'Organism',' Strain', 'Assembly']]
#Sort table on Assembly number ignoring GCF_
df['sort'] = df['Assembly'].str.extract('(\d+)', expand=False).astype(str)
df.sort_values('sort',inplace=True, ascending=True)
df = df.drop('sort', axis=1)
#drop all duplicates that're similar in indicated subset fields
df3=df.drop_duplicates(subset=['Start', 'Stop', 'Strand', 'Organism',' Strain', 'Assembly'],keep='first')
#sorts dataframe alphabetically by Organism and writes to csv
df3.sort_values(by = "Organism", axis=0, ascending=True, inplace=False).to_csv("final_parsed_output.tsv", "\t", index=False)
#get WP_X and GFC_X IDs in a tsv to input in FLAGs
new_dataframe1 = df3[['Assembly', 'Protein']]
new_dataframe2 = df3[['Organism',' Strain', 'Assembly', 'Protein']]
new_dataframe1.sort_values(by = "Protein", axis=0, ascending=True, inplace=False).to_csv('flags_input.tsv', '\t', header=False, columns = ['Assembly', 'Protein'])
new_dataframe2.sort_values(by = "Organism", axis=0, ascending=True, inplace=False).to_csv('flags_input_wstrains.tsv', '\t', header=False, columns = ['Organism',' Strain', 'Assembly', 'Protein'])





print ('program finished')
我不知道我是否可以在这里上传一个csv作为例子,你可以使用。但它们基本上是csv中的蛋白质列表,如下所示:

WP_047566605.1 WP_043586512.1 WP_086526429.1 WP_043669791.1 WP_086513259.1 WP_086518190.1 WP_053774664.1 WP_012298127.1 WP_063071144.1 WP_012038522.1 WP_066595335.1 WP_088456184.1 WP_058743206.1 WP_042537210.1 WP_058724426.1

我在ubuntu mate 20中遇到的错误是:

jj@p4:~/Documents/Bioinformatica/Bioinformatic/August/Codes/Etna$ python3 etna.py JJTEST.csv 
/usr/local/lib/python3.8/dist-packages/pandas/core/computation/expressions.py:68: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return op(a, b)
Traceback (most recent call last):
  File "etna.py", line 44, in <module>
    df['sort'] = df['Assembly'].str.extract('(\d+)', expand=False).astype(str)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 5126, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/strings.py", line 2100, in __init__
    self._inferred_dtype = self._validate(data)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/strings.py", line 2157, in _validate
    raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
jj@p4:~/Documents/Bioinformatica/Bioinformatic/August/code/Etna$python3 Etna.py JJTEST.csv
/usr/local/lib/python3.8/dist-packages/pandas/core/computation/expressions.py:68:FutureWarning:elementwise比较失败;而是返回标量,但将来将执行元素级比较
返回op(a、b)
回溯(最近一次呼叫最后一次):
文件“etna.py”,第44行,在
df['sort']=df['Assembly'].str.extract('(\d+),expand=False.astype(str)
文件“/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py”,第5126行,在__
返回对象。\uuuGetAttribute(self,name)
文件“/usr/local/lib/python3.8/dist-packages/pandas/core/accessor.py”,第187行,在__
存取器_obj=自身。_存取器(obj)
文件“/usr/local/lib/python3.8/dist-packages/pandas/core/strings.py”,第2100行,在__
self.\u推断的\u数据类型=self.\u验证(数据)
文件“/usr/local/lib/python3.8/dist packages/pandas/core/strings.py”,第2157行,在
raise AttributeError(“只能使用带字符串值的.str访问器!”)
AttributeError:只能对字符串值使用.str访问器!

我不完全理解问题所在,但我已将输出文件从txt修改为csv,并将de tsv str更改为float。现在它正在工作。

这是否回答了您的问题?我尝试将第44行更改为
df['sort']=df['Assembly'].astype(str).str.extract('(\d+),expand=False).astype(float)
,新的错误是:
/usr/local/lib/python3.8/dist-packages/pandas/core/computation/expressions.py:68:FutureWarning:elementwise比较失败;返回标量,但将来将执行元素比较返回op(a,b)程序已完成
如果我这样做,则会出现相同错误:
df['sort']=df['Assembly'].astype(str).str.extract('(\d+),expand=False).astype(str)
很高兴您解决了问题。下一次,请先摘录a,作为您问题的一部分。作为这里的一个新用户,也可以阅读一下。@UlrichEckhardt我认为包含WP_编号的引用就足够作为一个最小的可复制示例了。我不知道如何上传CSV文件。代码也是可能的最小值。很抱歉,如果我没有提出正确的问题,我试图遵守所有规则。如果不合适,我可以删除帖子。你可以将数据内联到Python代码中,不需要第二个文件。此外,抛出错误的行之后的任何行都与示例无关。您需要检查是否可以删除或简化任何其他代码。