Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将pandas与python结合使用以写入数据帧_Python_Regex_Excel_Pandas_String Search - Fatal编程技术网

将pandas与python结合使用以写入数据帧

将pandas与python结合使用以写入数据帧,python,regex,excel,pandas,string-search,Python,Regex,Excel,Pandas,String Search,这是数据帧2 import pandas as pd import re import numpy as np data= [['Empty','CMI-General Liability | 05-9362','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','Alarm Central 05-8642','Empty','Empty'],['Empty','Market 466','Empty'

这是数据帧2

import pandas as pd
import re
import numpy as np

data= [['Empty','CMI-General Liability | 05-9362','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','Alarm Central 05-8642','Empty','Empty'],['Empty','Market 466','Empty','Empty'],['Empty','Talent, Experience','Empty','Empty'],['Empty','Food Division','Empty','Empty'],['Empty','Quality WMCC','Empty','Empty'],['Empty','Modular Execution Team | 01-9700','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','CE - Engineering - US','Empty','Empty'],['Empty','Fresh, Freezer & Cooler - 18-8110','Empty','Empty'],['Empty','9701','Empty','Empty'],['Empty','Contact Center','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','Private Brands GM - 01-8683','Empty','Empty']]
df2=pd.DataFrame(data,columns=['JobTitle','Department','TrueDepartment','Dept_Function'])
data5 = [[1,'TRUCKING, MARCY, NY','Empty','Empty'],[2,'TRUCKING-GREENVILLE,TN','Empty','Empty'],[3,'DC 40, HOPE MILLS, NC','Empty','Empty'],[4,'TRUCKING, SHARON SPRINGS','Empty','Empty'],[5,'DISP PAULS VALLEY OK FDC','Empty','Empty'],[6,'COLDWATER, MI','Empty','Empty'],[7,'AMERICOLD LOGISTICS','Empty','Empty'],[8,'DFW3N FORT WORTH FC WHS.COM','Empty','Empty'],[9,'PCCC CURRENTLY BEING REVIEWED','Empty','Empty'],[466,'Springfield, MO','Empty','Empty'],[8110,'Fresh Dept','Empty','Empty'],[8642,'Security','Security & Compliance','Empty'],[8683,'General Merchandise','Empty','Empty'],[9362,'General Liability','Empty','Empty'],[9700,'Execution Team','Empty','Empty'],[9701,'Produce TN','Empty','Empty']]

df5=pd.DataFrame(data5,columns=['Dept_Nbr','Dept_Desc_good','Dept_Desc_better','Dept_Abrv'])
df5是dataframe5

JobTitle    Department                   TrueDepartment    Dept_Function

            CMI-General Liability | 05-9362     
            Central Operations      
            Alarm Central 05-8642       
            Market 466      
            Talent, Experience      
            Food Division       
            Quality WMCC        
            Modular Execution Team | 01-9700        
            US Central Operations       
            CE - Engineering - US       
            Fresh, Freezer & Cooler - 18-8110       
            9701        
            Contact Center      
            Central Operations      
            US Central Operations       
            Private Brands GM - 01-8683          
运行代码后的预期结果

Dept_Nbr    Dept_Desc_good                Dept_Desc_better     Dept_Abrv
1           TRUCKING, MARCY, NY                     
2           TRUCKING-GREENVILLE,TN              
3           DC 40, HOPE MILLS, NC                   
4           TRUCKING, SHARON SPRINGS            
5           DISP PAULS VALLEY OK FDC            
6           COLDWATER, MI                       
7           AMERICOLD LOGISTICS           
8           DFW3N FORT WORTH FC - WHS.COM       
9           PCCC CURRENTLY BEING REVIEWED       
466         Springfield, MO     
8110        Fresh Dept      
8642        Security                      Security & Compliance 
8683        General Merchandise                                    
9362        General Liability       
9700        Execution Team      
9701        Produce TN      
当前代码:

JobTitle Department                         TrueDepartment  

         CMI-General Liability | 05-9362    General Liability   
         Central Operations     
         Alarm Central 05-8642              Security & Compliance   
         Market 466     
         Talent, Experience     
         Food Division      
         Quality WMCC       
         Modular Execution Team | 01-9700   Execution Team  
         US Central Operations      
         CE - Engineering - US      
         Fresh, Freezer & Cooler - 18-8110  Fresh Dept  
         9701                               Produce TN  
         Contact Center     
         Central Operations     
         US Central Operations      
         Private Brands GM - 01-8683        General Merchandise     
getting errorTypeError:'in'需要字符串作为左操作数,而不是int'
我想我应该试着把n改成字符串类型

此外,我还必须找出如何在
df2
中的“Department”列中找到跟在hypen后面或是单元格中唯一数字的子字符串(即
9701
)。我可能需要为此使用正则表达式(
re
)。对于
df2
中的第一个部门,它将找到字符串“9362”,并将其与
df5
中的
Dept\Nbr
匹配,并将“一般责任”写入
TrueDepartment
df5
实际上有
Dept\u Nbr
,从1连续到超过10000

阿姆斯特朗先生建议更改我的代码后出现的最新错误。。。仅在实际完整数据帧上使用时出错,而不是在我给出的示例数据帧上

import pandas as pd
import re


numbers = df5['Dept_Nbr'].tolist()
df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']]
df5.set_index('Dept_Nbr')
for n in numbers:
    for i in df5.index:
        if n in df2.loc[i, 'Department']:
            if df5.at[int(n), 'Dept_Desc_better']: #if values exists
                df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better')
            else:
                df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_good')
keyrerror回溯(最近一次调用)
~/anaconda3/lib/python3.6/site-packages/pandas/core/index/base.py in
获取位置(自身、键、方法、公差)
3062尝试:
->3063自动返回发动机。获取位置(钥匙)
3064键错误除外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_item()
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_item()

键错误:“部门编号” 在处理上述异常期间,发生了另一个异常: KeyError回溯(最近一次呼叫最后一次) 在() ---->1个数字=df5['Dept_Nbr'].tolist() 2 df5['Dept_Nbr']=[df5['Dept_Nbr']中i的int(i)] 3 df5=df5.设置索引('Dept_Nbr')#2685返回自我。_getitem_列(键) 2686 2687 def_getitem_列(自身,键): ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_列(self,key) 2690#获取列 2691如果self.columns.u是唯一的: ->2692返回自我。获取项目缓存(密钥) 2693 2694#重复列和可能的降维 ~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _获取\u项\u缓存(自身,项) 2484 res=cache.get(项) 2485如果res为无: ->2486 values=self.\u data.get(项目) 2487 res=自身。方框\项目\值(项目,值) 2488缓存[项目]=res get中的~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py(self、item、fastpath) 4113 4114如果不是isna(项目): ->4115 loc=自身项目。获取loc(项目) 4116其他: 4117索引器=np.arange(len(self.items))[isna(self.items)] get_loc中的~/anaconda3/lib/python3.6/site-packages/pandas/core/index/base.py(self、key、method、tolerance) 3063自动返回发动机。获取位置(钥匙) 3064键错误除外: ->3065返回self.\u引擎。获取位置(self.\u可能\u投射\u索引器(键)) 3066 3067 indexer=self.get_indexer([key],method=method,tolerance=tolerance) pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc() 中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_item() 中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_item()
键错误:“部门编号”
开始-上面的数据帧与数据帧结构不匹配。我花了很长时间才弄明白为什么
9362!=9362
:-)

以下是一些需要考虑的问题:

KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in 
get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Dept_Nbr'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-14-89dd44993593> in <module>()
----> 1 numbers = df5['Dept_Nbr'].tolist()
      2 df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']]
      3 df5 = df5.set_index('Dept_Nbr')  #<-- need to actually set df5 to the new index
      4 
      5 for n in numbers:

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in 
__getitem__(self, key)
   2683             return self._getitem_multilevel(key)
   2684         else:
-> 2685             return self._getitem_column(key)
   2686 
   2687     def _getitem_column(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in 
_getitem_column(self, key)
   2690         # get column
   2691         if self.columns.is_unique:
-> 2692             return self._get_item_cache(key)
   2693 
   2694         # duplicate columns & possible reduce dimensionality

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in 
_get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3063                 return self._engine.get_loc(key)
   3064             except KeyError:
-> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3066 
   3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Dept_Nbr'

顺便说一句——代码确实有效,索引行0中没有错误——这就是数据帧不同的地方。我还保留了尽可能多的代码,但我想有更好的迭代方法。

欢迎使用StackOverflow。请按照您创建此帐户时的建议,阅读并遵循帮助文档中的发布指南。适用于这里。在您发布MCVE代码并准确描述问题之前,我们无法有效地帮助您。我们应该能够将您发布的代码粘贴到文本文件中,并重现您描述的问题。(1) 将数据帧硬编码到代码中;(2) 提供完整的错误消息,这样我们就不必猜测位置和回溯详细信息。从您的文本中,我猜您在
if n in
行中得到了错误<代码>如果在中是非法的。没有您的调试跟踪,没有您自己的MCVE,我们无法确定问题。。。然而,什么有真正的部门,什么没有,背后的逻辑是什么?大多数在连接两个数据帧后为空。如果df2的Department列在名为“Dept_Nbr”的列中包含同样在df5中的部门号,则只有TrueDepartment值才会写入该列。此处有完整的错误消息:在()5表示数字中的n:6表示df5中的i.index:--->7表示df2中的n.loc[i,'Dept']:8表示df5.at[int(n),'Dept_Desc_AD']:#如果值存在9表示df2.at[i,'TrueDepartment']=df5.at(int(n),'Dept_Desc_AD')TypeError:'in'需要字符串作为左操作数,不,非常感谢你的帮助。我真的很感激!很抱歉,第一个dept 9362的硬编码不正确。此代码在我的示例数据帧上运行良好。实际的数据帧要长得多。当将此代码应用于实际数据帧时,我得到以下错误:此代码在我的示例数据帧上工作得非常好。
numbers = df5['Dept_Nbr'].tolist()
df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']]
df5 = df5.set_index('Dept_Nbr')  #<-- need to actually set df5 to the new index

for n in numbers:
    for i in range(len(df5.index)):  #<-- you want to iterate through the number of elements not the elements themselves
        if str(n) == df2.loc[i, 'Department'][-4:]: #<-- convert n to str and slice df2 string for the last 4 chars
            if df5.loc[n, 'Dept_Desc_better'] != "Empty":  #<-- you're actually checking against a string, not a NaN
                df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_better']  #<-- use .loc not .at
            else:
                df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_good']

df2 = df2.replace(to_replace="Empty", value="")   #<-- your desired output has '' rather than 'Empty' - so replaced.
df2

   JobTitle                         Department         TrueDepartment  Dept_Function
0              CMI-General Liability | 05-9632
1                           Central Operations
2                        Alarm Central 05-8642  Security & Compliance
3                                   Market 466
4                           Talent, Experience
5                                Food Division
6                                 Quality WMCC
7             Modular Execution Team | 01-9700         Execution Team
8                        US Central Operations
9                        CE - Engineering - US
10           Fresh, Freezer & Cooler - 18-8110             Fresh Dept
11                                        9701             Produce TN
12                              Contact Center
13                          Central Operations
14                       US Central Operations
15                 Private Brands GM - 01-8683    General Merchandise