Python 大熊猫子功能的奇怪行为

Python 大熊猫子功能的奇怪行为,python,pandas,Python,Pandas,我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码,用一个引用列减去df列 import pandas as pd def normalize(df,col): '''Enter the column value in "col".''' return df.sub(df[col], axis=0) df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3

我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码,用一个引用列减去df列

import pandas as pd
def normalize(df,col):
    '''Enter the column value in "col".'''
    return df.sub(df[col], axis=0)
df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,'A'); print(new.head(3))
正如预期的那样,此代码的输出如下所示:

df:
             A  B   C   D  E
target_id                    
one        10.0  3  20  10  1
two        10.0  4  30  10  1
three       6.7  5  40  10  1

             A    B     C    D    E
target_id                          
one        0.0 -7.0  10.0  0.0 -9.0
two        0.0 -6.0  20.0  0.0 -9.0
three      0.0 -1.7  33.3  3.3 -5.7
但是,当我把它作为一个可执行文件放在argparse中时,我得到了所有的NaN

import argparse
import platform
import os
import pandas as pd

def normalize(df,col):
    '''Normalize the log table with desired column, 
    Enter the column value in "col".'''
    return df.sub(df[col], axis=0)

parser = argparse.ArgumentParser(description='''Manipulate tables ''',
    usage='python3 %(prog)s -e input.tsv [-nm col_name] -op output.tsv',
    epilog='''Short prog. desc:\
    Pass the expression matrix to filter, log2(val) etc.,''')

parser.add_argument("-e","--expr", metavar='', required=True, help="tab-delimited expression matrix file")
parser.add_argument("-op","--outprefix", metavar='', required=True, help="output file prefix")
parser.add_argument("-nm","--norm", metavar='', required=True, nargs=1, type=str, help="Normalize table based on column chosen")

args=parser.parse_args()
print(args)
if (os.path.isfile(args.expr)):
    df = pd.read_csv(args.expr, sep='\t', index_col=0); print(df.head(3))
    if(args.norm):
        norm_df = normalize(df,args.norm); print(norm_df.head(3))
        outfile = args.outprefix + ".normalized.tsv"
        norm_df.to_csv(outfile, sep='\t'); print("Normalized table written to ", outfile)
    else:
        print("Provide valid option...")
else:
    print("Please provide proper input..")
此执行的输出为:

python norm_debug.py -e norm_debug.txt -nm A -op norm_debug

             A  B   C   D  E
target_id                    
one        10.0  3  20  10  1
two        10.0  4  30  10  1
three       6.7  5  40  10  1

             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN
我使用,
Python版本:3.6.7,Pandas版本:1.1.2
。第一个(硬编码)在Jupyter笔记本中执行,而argparse在标准终端中执行。这里的问题是什么


提前感谢

问题是您被一列数据帧除如下:

new = normalize(df,['A'])

print (new)
             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN


print (df.sub(df[['A']], axis=0))
             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN
因为参数
col\u name
是一个类似
[col\u name]
的元素列表,而不是类似
col\u name
的字符串

如果不可能,您可以通过以下方式更改功能:


或者使用@

args中的解决方案。norm
已被解析为列表
['a']
,而不是标量
'a'
(因为选项
nargs=1
)。删除该选项。

这并不能解释代码在没有
argparse
的情况下工作的原因。问题在别处。
argparse
,OP配置它的方式,将
-nm A
解析为
['A']
,而不是
'A'
。但在原始版本中,它是
'A'
。看我的答案。我解释问题,你们得到
argparse
解决方案。但问题不在别处。
def normalize(df,col):
    '''Enter the column value in "col".'''
    return df.sub(df[col].squeeze(), axis=0)
# df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,['A'])

print (new)
             A    B     C    D    E
target_id                          
one        0.0 -7.0  10.0  0.0 -9.0
two        0.0 -6.0  20.0  0.0 -9.0
three      0.0 -1.7  33.3  3.3 -5.7