Python 大熊猫子功能的奇怪行为
我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码,用一个引用列减去df列Python 大熊猫子功能的奇怪行为,python,pandas,Python,Pandas,我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码,用一个引用列减去df列 import pandas as pd def normalize(df,col): '''Enter the column value in "col".''' return df.sub(df[col], axis=0) df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3
import pandas as pd
def normalize(df,col):
'''Enter the column value in "col".'''
return df.sub(df[col], axis=0)
df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,'A'); print(new.head(3))
正如预期的那样,此代码的输出如下所示:
df:
A B C D E
target_id
one 10.0 3 20 10 1
two 10.0 4 30 10 1
three 6.7 5 40 10 1
A B C D E
target_id
one 0.0 -7.0 10.0 0.0 -9.0
two 0.0 -6.0 20.0 0.0 -9.0
three 0.0 -1.7 33.3 3.3 -5.7
但是,当我把它作为一个可执行文件放在argparse中时,我得到了所有的NaN
import argparse
import platform
import os
import pandas as pd
def normalize(df,col):
'''Normalize the log table with desired column,
Enter the column value in "col".'''
return df.sub(df[col], axis=0)
parser = argparse.ArgumentParser(description='''Manipulate tables ''',
usage='python3 %(prog)s -e input.tsv [-nm col_name] -op output.tsv',
epilog='''Short prog. desc:\
Pass the expression matrix to filter, log2(val) etc.,''')
parser.add_argument("-e","--expr", metavar='', required=True, help="tab-delimited expression matrix file")
parser.add_argument("-op","--outprefix", metavar='', required=True, help="output file prefix")
parser.add_argument("-nm","--norm", metavar='', required=True, nargs=1, type=str, help="Normalize table based on column chosen")
args=parser.parse_args()
print(args)
if (os.path.isfile(args.expr)):
df = pd.read_csv(args.expr, sep='\t', index_col=0); print(df.head(3))
if(args.norm):
norm_df = normalize(df,args.norm); print(norm_df.head(3))
outfile = args.outprefix + ".normalized.tsv"
norm_df.to_csv(outfile, sep='\t'); print("Normalized table written to ", outfile)
else:
print("Provide valid option...")
else:
print("Please provide proper input..")
此执行的输出为:
python norm_debug.py -e norm_debug.txt -nm A -op norm_debug
A B C D E
target_id
one 10.0 3 20 10 1
two 10.0 4 30 10 1
three 6.7 5 40 10 1
A B C D E
target_id
one 0.0 NaN NaN NaN NaN
two 0.0 NaN NaN NaN NaN
three 0.0 NaN NaN NaN NaN
我使用,Python版本:3.6.7,Pandas版本:1.1.2
。第一个(硬编码)在Jupyter笔记本中执行,而argparse在标准终端中执行。这里的问题是什么
提前感谢问题是您被一列数据帧除如下:
new = normalize(df,['A'])
print (new)
A B C D E
target_id
one 0.0 NaN NaN NaN NaN
two 0.0 NaN NaN NaN NaN
three 0.0 NaN NaN NaN NaN
print (df.sub(df[['A']], axis=0))
A B C D E
target_id
one 0.0 NaN NaN NaN NaN
two 0.0 NaN NaN NaN NaN
three 0.0 NaN NaN NaN NaN
因为参数col\u name
是一个类似[col\u name]
的元素列表,而不是类似col\u name
的字符串
如果不可能,您可以通过以下方式更改功能:
或者使用@
args中的解决方案。norm
已被解析为列表['a']
,而不是标量'a'
(因为选项nargs=1
)。删除该选项。这并不能解释代码在没有argparse
的情况下工作的原因。问题在别处。argparse
,OP配置它的方式,将-nm A
解析为['A']
,而不是'A'
。但在原始版本中,它是'A'
。看我的答案。我解释问题,你们得到argparse
解决方案。但问题不在别处。
def normalize(df,col):
'''Enter the column value in "col".'''
return df.sub(df[col].squeeze(), axis=0)
# df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,['A'])
print (new)
A B C D E
target_id
one 0.0 -7.0 10.0 0.0 -9.0
two 0.0 -6.0 20.0 0.0 -9.0
three 0.0 -1.7 33.3 3.3 -5.7