Python 准备excel写入的数据框

Python 准备excel写入的数据框,python,pandas,Python,Pandas,我已开始编辑read.excel,结果见下表: | descr | serial | ref | type | val | qty | uom | |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | | Product 1 | NaN

我已开始编辑read.excel,结果见下表:

| descr | serial | ref | type | val | qty | uom | |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | | Product 1 | NaN | 12345 | type 1 | NaN | 6 | PCS | | Product 2 | NaN | 23456 | NaN | NaN | 4 | PCS | | Product 3 | NaN | 66778 MAKER: MANUFACTURER 1 ... | type 2 | NaN | 4 | PCS | | Product 4 | NaN | 88776 MAKER: MANUFACTURER 2 ... | NaN | NaN | 2 | PCS | | Product 5 | 500283 | 99117 MAKER: MANUFACTURER 1 ... | NaN | NaN | 12 | PCS | | Product 6 | 500283 | 00116 MAKER: MANUFACTURER 1 ... | NaN | NaN | 12 | PCS | | Product 7 | 900078 | 307128 MAKER: MANUFACTURER 3 ... | NaN | NaN | 12 | PCS | | Product 8 | 900078 | 411354 MAKER: MANUFACTURER 3 ... | NaN | NaN | 2 | PCS | |描述|序列|参考|类型|价值|数量|计量单位| |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | |产品1 |南| 12345 |类型1 |南| 6 |件| |产品2 |南| 23456 |南|南| 4 |件| |产品3 |南| 66778制造商:制造商1…|类型2 |南| 4 |件| |产品4 |南| 88776制造商:制造商2…|楠|楠| 2件| |产品5 | 500283 | 99117制造商:制造商1…|楠|楠| 12件| |产品6 | 500283 | 00116制造商:制造商1…|楠|楠| 12件| |产品7 | 900078 | 307128制造商:制造商3…|楠|楠| 12件| |产品8 | 900078 | 411354制造商:制造商3…|楠|楠| 2件| 我现在有两个问题

  • 如果列[“ref”]包含string-aside-int,我需要将它们分开,将字符串放入一个新列(ref2)。 我的运气很好:.split(“”,1)[0]和.split(“”,1)[1]
  • 问:如何循环行,找出某一列是否包含int(标准)之外的字符串,并将其拆分为两个不同的列

  • 我的输出应该是:
  • |参考(内部)|说明|数量| |---------- |----------------------- |----- | |12345 |产品1 | 6| ||类型1|| |23456 |产品2 | 4| |66778 |产品3 | 4| ||制造商:制造商1 || ||类型2|| |88776 |产品4 | 2| ||制造商:制造商2 || |99117 |产品5 | 12| ||序列号:500283 || ||制造商:制造商1 || |00116 |产品6 | 12| ||序列号:500283 || ||制造商:制造商1 || |307128 |产品7 | 12| ||序列号:900078 || ||制造商:制造商3 || 我只是不知道如何在Pandas的数据帧中实现上述输出

    问题2:假设我在满足条件的情况下成功地将一个单元格拆分为两个,那么如何根据上面的示例输出排列它们?(列_old的int转到ref(int)*和Maker:XXX转到列_ref2并在列B中合成以在Excel中输出。与type(以及可能的其他列)相同)

    谢谢你的提示!

    以下是我的尝试:

    我要加载的csv示例:

    descr,serial,ref,type,val,qty,uom
    Product 1,,12345,type 1,,6,PCS
    Product 2,,23456,,,4,PCS
    Product 3,,66778 MAKER: MANUFACTURER 1,type 2,,4,PCS
    Product 4,,88776 MAKER: MANUFACTURER 2,,,2,
    
    加载数据并创建一个新的数据框,名为
    cleaned
    ,该数据框将根据所需输出进行操作和按摩

    import pandas as pd
    import numpy as np
    
    raw = pd.read_csv("data.csv") # reading in the example file
    cleaned = pd.DataFrame() # creating new dataframe 
    cleaned['ref (int)'] =  raw['ref'].str.split(' ').str[0].copy() # creating ref (int) column that is just the first plat of the ref colum
    
    # moving the rest of the data over
    cleaned['description'] = raw['descr'] 
    cleaned['ref_maker'] = raw['ref'].str.split(' ').str[1:].apply(' '.join) # making a new column for the rest of ref description if there is a text part after the integer in the ref column
    cleaned['type_full'] = raw['type']
    cleaned['qty'] = raw['qty']
    
    clear_mask = cleaned.duplicated(['ref', 'qty'], keep='first') # looking for rows where the ref and qty values are the same as above, we dont want that to show up so this creates a series of booleans
    cleaned.loc[clear_mask, 'qty'] = '' # setting duplicates to empty strings
    cleaned.loc[clear_mask, 'ref'] = ''
    cols = cleaned.columns.tolist() # rearranging columns so that qty is at the end
    cols.append(cols.pop(cols.index('qty')))
    cleaned = cleaned[cols]
    print(cleaned)
    
    现在我们有了一个数据帧(
    清理后的
    ),它看起来像这样:

        ref (int) description              ref_maker type_full  qty
    0     12345   Product 1                           type 1    6
    1     23456   Product 2                              NaN    4
    2     66778   Product 3  MAKER: MANUFACTURER 1    type 2    4
    3     88776   Product 4  MAKER: MANUFACTURER 2       NaN    2
    
        ref (int)  qty                   desc
    0     12345    6              Product 1
    1     12345    6                 type 1
    2     23456    4              Product 2
    3     66778    4              Product 3
    4     66778    4  MAKER: MANUFACTURER 1
    5     66778    4                 type 2
    6     88776    2              Product 4
    7     88776    2  MAKER: MANUFACTURER 2
    
    现在我们需要清理它

    cleaned.replace('', np.NaN, inplace=True) # replacing empty strings with NaN
    cleaned.set_index(['ref (int)', 'qty'], inplace=True) # fixing ref and qty columns for when it stacks (stacking will help make the multi-lined duplicates you wanted)
    cleaned = cleaned.stack().to_frame().reset_index() # stacking the dataframe and then converting it back to a dataframe
    
    (供参考),
    .stack()
    命令将为您提供以下信息(这几乎是您想要的):

    现在我们再做一点清洁:

    del cleaned['level_2'] # cleaning up old remnants from the stack (level_2 corresponds to the column names that you dont want in your final output)
    cleaned.dropna() # deleting rows that have no values
    cleaned.columns = ['ref', 'qty', 'desc'] # renaming the columns for clarity
    
    现在看起来是这样的:

        ref (int) description              ref_maker type_full  qty
    0     12345   Product 1                           type 1    6
    1     23456   Product 2                              NaN    4
    2     66778   Product 3  MAKER: MANUFACTURER 1    type 2    4
    3     88776   Product 4  MAKER: MANUFACTURER 2       NaN    2
    
        ref (int)  qty                   desc
    0     12345    6              Product 1
    1     12345    6                 type 1
    2     23456    4              Product 2
    3     66778    4              Product 3
    4     66778    4  MAKER: MANUFACTURER 1
    5     66778    4                 type 2
    6     88776    2              Product 4
    7     88776    2  MAKER: MANUFACTURER 2
    
    最后一步是用空字符串替换重复值,使其与所需输出匹配

    import pandas as pd
    import numpy as np
    
    raw = pd.read_csv("data.csv") # reading in the example file
    cleaned = pd.DataFrame() # creating new dataframe 
    cleaned['ref (int)'] =  raw['ref'].str.split(' ').str[0].copy() # creating ref (int) column that is just the first plat of the ref colum
    
    # moving the rest of the data over
    cleaned['description'] = raw['descr'] 
    cleaned['ref_maker'] = raw['ref'].str.split(' ').str[1:].apply(' '.join) # making a new column for the rest of ref description if there is a text part after the integer in the ref column
    cleaned['type_full'] = raw['type']
    cleaned['qty'] = raw['qty']
    
    clear_mask = cleaned.duplicated(['ref', 'qty'], keep='first') # looking for rows where the ref and qty values are the same as above, we dont want that to show up so this creates a series of booleans
    cleaned.loc[clear_mask, 'qty'] = '' # setting duplicates to empty strings
    cleaned.loc[clear_mask, 'ref'] = ''
    cols = cleaned.columns.tolist() # rearranging columns so that qty is at the end
    cols.append(cols.pop(cols.index('qty')))
    cleaned = cleaned[cols]
    print(cleaned)
    
    以下是最终输出:

     ref (int)                   desc qty
    0     12345              Product 1   6
    1                           type 1    
    2     23456              Product 2   4
    3     66778              Product 3   4
    4            MAKER: MANUFACTURER 1    
    5                           type 2    
    6     88776              Product 4   2
    7            MAKER: MANUFACTURER 2   
    

    请将示例数据以文本形式而不是图像形式发布。谢谢,先生!只有一件事:已清理。设置索引(['ref(int)'),您能告诉我如何(以及在代码中的位置)实现以下内容。*ref(int)行应保持原样。我找到了如何将“MAKER:”更改为“MKR:”例如,如果其他列的字符数不超过30个字符,我想将它们缝合在一起。我不想用串联方式剪切列。无论如何,谢谢。