Python 数据帧-条件列创建_Python_Python 3.x_Pandas_Dataframe

Python 数据帧-条件列创建

python python-3.x pandas dataframe

Python 数据帧-条件列创建,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我正试图根据另一列的条件逻辑创建一个新列。我尝试过搜索，但没有找到任何解决我问题的方法我已经将CSV导入熊猫数据框，它的结构如下。我为这篇文章编辑了一些描述，但除此之外，一切都是一样的： #code used to load dataframe: df = pd.read_csv(r"C:\filepath\filename.csv") #output from print(type(df)): #class 'pandas.core.frame.DataFrame' #output fr

我正试图根据另一列的条件逻辑创建一个新列。我尝试过搜索，但没有找到任何解决我问题的方法

我已经将CSV导入熊猫数据框，它的结构如下。我为这篇文章编辑了一些描述，但除此之外，一切都是一样的：

#code used to load dataframe:
df = pd.read_csv(r"C:\filepath\filename.csv")

#output from print(type(df)):
#class 'pandas.core.frame.DataFrame'

#output from print(df.columns.values):
#['Type' 'Trans Date' 'Post Date' 'Description' 'Amount'] 

#output from print(df.columns):
    Index(['Type', 'Trans Date', 'Post Date', 'Description', 'Amount'], dtype='object')
#output from print

Type  Trans Date   Post Date            Description  Amount
0  Sale  01/25/2018  01/25/2018                  DESC1  -13.95

1  Sale  01/25/2018  01/26/2018   AMAZON MKTPLACE PMTS   -6.99

2  Sale  01/24/2018  01/25/2018          SUMMIT BISTRO   -5.85

3  Sale  01/24/2018  01/25/2018                  DESC3   -9.13

4  Sale  01/24/2018  01/26/2018    DYNAMIC VENDING INC   -1.60

然后我编写以下代码：

def criteria(row):
    if row.Description.find('SUMMIT BISTRO')>0:
        return 'Lunch'
    elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
        return 'Amazon'
    elif row.Description.find('Aldi')>0:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df.apply(criteria, axis=0)

错误：

Traceback (most recent call last):
File "C:\Users\Test_BankReconcile2.py", line 44, in <module>
df['Category'] = df.apply(criteria, axis=0)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
  File "C:\Users\OneDrive\Documents\finance\Test_BankReconcile2.py", line 35, in criteria
if row.Description.find('SUMMIT BISTRO')>0:
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3081, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: ("'Series' object has no attribute 'Description'", 'occurred at index Type')

回溯（最近一次呼叫最后一次）：
文件“C:\Users\Test\u BankReconcile2.py”，第44行，在
df['Category']=df.apply（标准，轴=0）
文件“C:\Users\Anaconda3\lib\site packages\pandas\core\frame.py”，第4262行，在apply中
忽略故障=忽略故障）
文件“C:\Users\Anaconda3\lib\site packages\pandas\core\frame.py”，第4358行，在“应用”标准中
结果[i]=func（v）
标准中第35行的文件“C:\Users\OneDrive\Documents\finance\Test\u BankReconcile2.py”
如果row.Description.find（'SUMMIT BISTRO'）>0：
文件“C:\Users\Anaconda3\lib\site packages\pandas\core\generic.py”，第3081行，在\uuu getattr中__
返回对象。\uuuGetAttribute（self，name）
AttributeError:（“'Series'对象没有属性'Description'，'发生在索引类型'）

我能够在另一家银行的非常相似的csv文件（本例来自我的信用卡）上成功地执行相同类型的命令，因此我不知道发生了什么，但可能我需要以某种方式定义数据帧，而我没有这样做？或者是其他一些我看不到的很明显的东西？提前感谢您帮助我解决此问题。

是的，您的问题是需要将

轴=1

传递到

。应用：
In [52]: df
Out[52]:
   Type  Trans Date   Post Date           Description  Amount
0  Sale  01/25/2018  01/25/2018                 DESC1  -13.95
1  Sale  01/25/2018  01/26/2018  AMAZON MKTPLACE PMTS   -6.99
2  Sale  01/24/2018  01/25/2018         SUMMIT BISTRO   -5.85
3  Sale  01/24/2018  01/25/2018                 DESC3   -9.13
4  Sale  01/24/2018  01/26/2018   DYNAMIC VENDING INC   -1.60

In [53]: def criteria(row):
    ...:     if row.Description.find('SUMMIT BISTRO')>0:
    ...:         return 'Lunch'
    ...:     elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
    ...:         return 'Amazon'
    ...:     elif row.Description.find('Aldi')>0:
    ...:         return 'Groceries'
    ...:     else:
    ...:         return 'NotWorking'
    ...:

In [54]: df.apply(criteria, axis=1)
Out[54]:
0    NotWorking
1    NotWorking
2    NotWorking
3    NotWorking
4    NotWorking
dtype: object

第二个问题是你有一个逻辑错误，而不是。find（x）>0
你想要。find（x）>=0
，或者更好的是，其他字符串中的一些字符串
，对于更一般的解决方案，在循环中省略描述，而使用df['Description']
对于字符串中的检查子字符串，请使用
中的
def criteria(row):
    if 'SUMMIT BISTRO' in row:
        return 'Lunch'
    elif 'AMAZON MKTPLACE PMTS' in row:
        return 'Amazon'
    elif 'Aldi' in row:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df['Description'].apply(criteria)

df.apply（标准，轴=1）
df.apply（func，轴=0）
（axis=0
为默认值）将函数func
应用于df
的每一列（数据帧的列为系列）。因此，函数criteria（row）
实际上接收的不是一行，而是一列。更改为axis=1应该可以解决问题。这很有效，谢谢。我想保留描述。查找逻辑，因为我似乎也能够使用“and row.Amount@Brian，您仍然可以这样做。不要使用.find
，除非您确实需要index@juanpa.arrivillaga-在
中使用自由更改/添加您的解决方案，我对此没有问题（+1）