Python 熊猫中excel的匹配索引函数_Python_Python 3.x_Pandas

Python 熊猫中excel的匹配索引函数

python python-3.x pandas

Python 熊猫中excel的匹配索引函数,python,python-3.x,pandas,Python,Python 3.x,Pandas,Excel中有一个匹配索引函数，我使用它来匹配所需列中的元素 =iferror(INDEX($B$2:$F$8,MATCH($J4,$B$2:$B$8,0),MATCH(K$3,$B$1:$F$1,0)),0) 这就是我现在正在使用的函数，它给了我很好的结果，但我想用python实现它 brand N Z None Honor 63 96 190 Tecno 0 695 763 从这张桌子上我想要 brand L N Z Hon

Excel中有一个匹配索引函数，我使用它来匹配所需列中的元素

=iferror(INDEX($B$2:$F$8,MATCH($J4,$B$2:$B$8,0),MATCH(K$3,$B$1:$F$1,0)),0)

这就是我现在正在使用的函数，它给了我很好的结果，但我想用python实现它

brand   N   Z   None
Honor   63  96  190     
Tecno   0   695 763

从这张桌子上我想要

  brand L   N   Z
  Honor 0   63  96
  Tecno 0   0   695

它应该比较列和索引，并给出适当的值

我在python中尝试过查找函数，但这给了我

ValueError: Row labels must have same size as column labels

您是否需要使用熊猫来执行此操作。您也可以使用简单的python来实现这一点。从一个文本文件中读取并打印出匹配和处理的字段

Python中的基本文件读取如下所示。其中datafile.csv是您的文件。这将读取一个文件中的所有行并打印出正确的结果。首先，您需要以.csv格式保存文件，以便在字段“，”之间有一个分隔符

import csv # use csv
print('brand L N Z') # print new header
with open('datafile.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    next(spamreader, None) # skip old header
    for row in spamreader:
        # You need to add Excel Match etc... logic here.
        print(row[0], 0, row[1], row[2]) # print output

输入文件：

brand,N,Z,None
Honor,63,96,190
Tecno,0,695,763

打印出：

brand L N Z
Honor 0 63 96
Tecno 0 0 695

（我不熟悉Excel Match函数，因此您可能需要在上述Python脚本中添加一些逻辑，以使逻辑能够处理所有数据。）

是否需要使用Pandas执行此操作。您也可以使用简单的python来实现这一点。从一个文本文件中读取并打印出匹配和处理的字段

import csv # use csv
print('brand L N Z') # print new header
with open('datafile.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    next(spamreader, None) # skip old header
    for row in spamreader:
        # You need to add Excel Match etc... logic here.
        print(row[0], 0, row[1], row[2]) # print output

输入文件：

brand,N,Z,None
Honor,63,96,190
Tecno,0,695,763

打印出：

brand L N Z
Honor 0 63 96
Tecno 0 0 695

（我不熟悉Excel Match函数，因此您可能需要在上面的Python脚本中添加一些逻辑，以使逻辑能够处理所有数据。）

您使用Excel公式基本上要做的是创建类似于透视表的内容，您也可以使用pandas来完成。例如，像这样：

# Define the columns and brands, you like to have in your result table
# along with the dataframe in variable df it's the only input
columns_query=['L', 'N', 'Z']
brands_query=['Honor', 'Tecno', 'Bar']

# no begin processing by selecting the columns
# which should be shown and are actually present
# add the brand, even if it was not selected
columns_present= {col for col in set(columns_query) if col in df.columns}
columns_present.add('brand')
# select the brands in question and take the
# info in columns we identified for these brands
# from this generate a "flat" list-like data
# structure using melt
# it contains records containing
# (brand, column-name and cell-value)
flat= df.loc[df['brand'].isin(brands_query), columns_present].melt(id_vars='brand')

# if you also want to see the columns and brands,
# for which you have no data in your original df
# you can use the following lines (if you don't
# need them, just skip the following lines until
# the next comment)
# the code just generates data points for the
# columns and rows, which would otherwise not be
# displayed and fills them wit NaN (the pandas 
# equivalent for None)
columns_missing= set(columns_query).difference(columns_present)
brands_missing=  set(brands_query).difference(df['brand'].unique())
num_dummies= max(len(brands_missing), len(columns_missing))
dummy_records= {
    'brand': list(brands_missing) +     [brands_query[0]]  * (num_dummies - len(brands_missing)),
    'variable': list(columns_missing) + [columns_query[0]] * (num_dummies - len(columns_missing)),
    'value': [np.NaN] * num_dummies
}
dummy_records= pd.DataFrame(dummy_records)
flat= pd.concat([flat, dummy_records], axis='index', ignore_index=True)

# we get the result by the following line:
flat.set_index(['brand', 'variable']).unstack(level=-1)

对于我的testdata，这将输出：

         value             
variable     L     N      Z
brand                      
Bar        NaN   NaN    NaN
Honor      NaN  63.0   96.0
Tecno      NaN   0.0  695.0

testdata是（注意，上面我们没有看到col None和row Foo，但我们看到了row Bar和column L，它们实际上不在testdata中，但被“查询”）：

您可以使用以下方法生成此测试数据：

import pandas as pd
import numpy as np
import io

raw=\
"""brand   N   Z   None
Honor   63  96  190     
Tecno   0   695 763
Foo     8   111 231"""

df= pd.read_csv(io.StringIO(raw), sep='\s+')

注意：输出中显示的结果是常规数据帧。因此，如果您计划将数据写回excel工作表，应该没有问题（pandas提供了在excel文件中读取/写入数据帧的方法）。

您使用excel公式基本上要做的是创建类似透视表的内容，您也可以使用pandas这样做。例如，像这样：

# Define the columns and brands, you like to have in your result table
# along with the dataframe in variable df it's the only input
columns_query=['L', 'N', 'Z']
brands_query=['Honor', 'Tecno', 'Bar']

# no begin processing by selecting the columns
# which should be shown and are actually present
# add the brand, even if it was not selected
columns_present= {col for col in set(columns_query) if col in df.columns}
columns_present.add('brand')
# select the brands in question and take the
# info in columns we identified for these brands
# from this generate a "flat" list-like data
# structure using melt
# it contains records containing
# (brand, column-name and cell-value)
flat= df.loc[df['brand'].isin(brands_query), columns_present].melt(id_vars='brand')

# if you also want to see the columns and brands,
# for which you have no data in your original df
# you can use the following lines (if you don't
# need them, just skip the following lines until
# the next comment)
# the code just generates data points for the
# columns and rows, which would otherwise not be
# displayed and fills them wit NaN (the pandas 
# equivalent for None)
columns_missing= set(columns_query).difference(columns_present)
brands_missing=  set(brands_query).difference(df['brand'].unique())
num_dummies= max(len(brands_missing), len(columns_missing))
dummy_records= {
    'brand': list(brands_missing) +     [brands_query[0]]  * (num_dummies - len(brands_missing)),
    'variable': list(columns_missing) + [columns_query[0]] * (num_dummies - len(columns_missing)),
    'value': [np.NaN] * num_dummies
}
dummy_records= pd.DataFrame(dummy_records)
flat= pd.concat([flat, dummy_records], axis='index', ignore_index=True)

# we get the result by the following line:
flat.set_index(['brand', 'variable']).unstack(level=-1)

对于我的testdata，这将输出：

         value             
variable     L     N      Z
brand                      
Bar        NaN   NaN    NaN
Honor      NaN  63.0   96.0
Tecno      NaN   0.0  695.0

testdata是（注意，上面我们没有看到col None和row Foo，但我们看到了row Bar和column L，它们实际上不在testdata中，但被“查询”）：

您可以使用以下方法生成此测试数据：

import pandas as pd
import numpy as np
import io

raw=\
"""brand   N   Z   None
Honor   63  96  190     
Tecno   0   695 763
Foo     8   111 231"""

df= pd.read_csv(io.StringIO(raw), sep='\s+')

这在pandas中很简单，但为了说明如何操作，我需要更多信息。要在哪些列上匹配这两个表？您想比较哪些列？e、 g您是否希望表1中的N、Z列与表2中相同品牌行的相同列相匹配？如果品牌在表2中出现两次，你会有什么期望。在这种情况下，excel将只进行第一场比赛。如果你想在熊猫身上得到同样的效果，那就有点困难了。如果你不需要它，它会变得更容易。一个建议：你可以添加，你用excel代码做什么？请注意，并非所有人都知道excel函数的名称，不幸的是，在非英语excel版本中，函数名称（至少有一些）被翻译为excel版本的构建语言。通过查找，我猜您是从二维表中获取了一个值，该值基于与某个值匹配的表边框上的“索引行”/“索引列”。可能是一个pircture Ior，如果以文本形式显示表格的一部分，这将有助于理解。这里的主要问题是生成的数据可能包含也可能不包含标准表中的所有标题。我想比较列和行标题，并给出适当的值，否则它应该只打印Nan或零，我可以稍后更改。这在pandas中很简单，但为了演示如何，我需要更多的信息。要在哪些列上匹配这两个表？您想比较哪些列？e、 g您是否希望表1中的N、Z列与表2中相同品牌行的相同列相匹配？如果品牌在表2中出现两次，你会有什么期望。在这种情况下，excel将只进行第一场比赛。如果你想在熊猫身上得到同样的效果，那就有点困难了。如果你不需要它，它会变得更容易。一个建议：你可以添加，你用excel代码做什么？请注意，并非所有人都知道excel函数的名称，不幸的是，在非英语excel版本中，函数名称（至少有一些）被翻译为excel版本的构建语言。通过查找，我猜您是从二维表中获取了一个值，该值基于与某个值匹配的表边框上的“索引行”/“索引列”。可能是一个pircture Ior，如果以文本形式显示表格的一部分，这将有助于理解。这里的主要问题是生成的数据可能包含或不包含标准表中的所有标题。我想比较列标题和行标题，并给出适当的值，否则它应该只打印Nan或零，我可以稍后更改Hi，也许你是对的，熊猫不是必需的，但是我认为问题不在于如何打开一个文件，所以如果你有一个关于如何获取单元格的解决方案，如果你能添加它，它将是非常有用的。顺便说一句。如果你用手来做这件事，使用pythons csv模块不是更容易吗？我不知道Match在Excel中是如何工作的，必须添加逻辑