Python 大熊猫子功能的奇怪行为_Python_Pandas

Python 大熊猫子功能的奇怪行为

python pandas

Python 大熊猫子功能的奇怪行为,python,pandas,Python,Pandas,我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码，用一个引用列减去df列 import pandas as pd def normalize(df,col): '''Enter the column value in "col".''' return df.sub(df[col], axis=0) df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3

我遇到了熊猫中df.sub函数的奇怪问题。我编写了一个简单的代码，用一个引用列减去df列

import pandas as pd
def normalize(df,col):
    '''Enter the column value in "col".'''
    return df.sub(df[col], axis=0)
df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,'A'); print(new.head(3))

正如预期的那样，此代码的输出如下所示：

df:
             A  B   C   D  E
target_id                    
one        10.0  3  20  10  1
two        10.0  4  30  10  1
three       6.7  5  40  10  1

             A    B     C    D    E
target_id                          
one        0.0 -7.0  10.0  0.0 -9.0
two        0.0 -6.0  20.0  0.0 -9.0
three      0.0 -1.7  33.3  3.3 -5.7

但是，当我把它作为一个可执行文件放在argparse中时，我得到了所有的NaN

import argparse
import platform
import os
import pandas as pd

def normalize(df,col):
    '''Normalize the log table with desired column, 
    Enter the column value in "col".'''
    return df.sub(df[col], axis=0)

parser = argparse.ArgumentParser(description='''Manipulate tables ''',
    usage='python3 %(prog)s -e input.tsv [-nm col_name] -op output.tsv',
    epilog='''Short prog. desc:\
    Pass the expression matrix to filter, log2(val) etc.,''')

parser.add_argument("-e","--expr", metavar='', required=True, help="tab-delimited expression matrix file")
parser.add_argument("-op","--outprefix", metavar='', required=True, help="output file prefix")
parser.add_argument("-nm","--norm", metavar='', required=True, nargs=1, type=str, help="Normalize table based on column chosen")

args=parser.parse_args()
print(args)
if (os.path.isfile(args.expr)):
    df = pd.read_csv(args.expr, sep='\t', index_col=0); print(df.head(3))
    if(args.norm):
        norm_df = normalize(df,args.norm); print(norm_df.head(3))
        outfile = args.outprefix + ".normalized.tsv"
        norm_df.to_csv(outfile, sep='\t'); print("Normalized table written to ", outfile)
    else:
        print("Provide valid option...")
else:
    print("Please provide proper input..")

此执行的输出为：

python norm_debug.py -e norm_debug.txt -nm A -op norm_debug

             A  B   C   D  E
target_id                    
one        10.0  3  20  10  1
two        10.0  4  30  10  1
three       6.7  5  40  10  1

             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN

我使用，

Python版本：3.6.7，Pandas版本：1.1.2

。第一个（硬编码）在Jupyter笔记本中执行，而argparse在标准终端中执行。这里的问题是什么

提前感谢

问题是您被一列数据帧除如下：

new = normalize(df,['A'])

print (new)
             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN


print (df.sub(df[['A']], axis=0))
             A   B   C   D   E
target_id                     
one        0.0 NaN NaN NaN NaN
two        0.0 NaN NaN NaN NaN
three      0.0 NaN NaN NaN NaN

因为参数

col\u name

是一个类似

[col\u name]

的元素列表，而不是类似

col\u name

的字符串

如果不可能，您可以通过以下方式更改功能：

或者使用@

args中的解决方案。norm

已被解析为列表

['a']

，而不是标量

'a'

（因为选项

nargs=1

）。删除该选项。

这并不能解释代码在没有

argparse

的情况下工作的原因。问题在别处。

argparse

，OP配置它的方式，将

-nm A

解析为

['A']

，而不是

'A'

。但在原始版本中，它是

'A'

。看我的答案。我解释问题，你们得到

argparse

解决方案。但问题不在别处。

def normalize(df,col):
    '''Enter the column value in "col".'''
    return df.sub(df[col].squeeze(), axis=0)
# df = pd.read_csv('norm_debug.txt', sep='\t', index_col=0); print(df.head(3))
new = normalize(df,['A'])

print (new)
             A    B     C    D    E
target_id                          
one        0.0 -7.0  10.0  0.0 -9.0
two        0.0 -6.0  20.0  0.0 -9.0
three      0.0 -1.7  33.3  3.3 -5.7