Python 准备excel写入的数据框_Python_Pandas

Python 准备excel写入的数据框

python pandas

Python 准备excel写入的数据框,python,pandas,Python,Pandas,我已开始编辑read.excel，结果见下表： | descr | serial | ref | type | val | qty | uom | |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | | Product 1 | NaN

我已开始编辑read.excel，结果见下表：

| descr | serial | ref | type | val | qty | uom | |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | | Product 1 | NaN | 12345 | type 1 | NaN | 6 | PCS | | Product 2 | NaN | 23456 | NaN | NaN | 4 | PCS | | Product 3 | NaN | 66778 MAKER: MANUFACTURER 1 ... | type 2 | NaN | 4 | PCS | | Product 4 | NaN | 88776 MAKER: MANUFACTURER 2 ... | NaN | NaN | 2 | PCS | | Product 5 | 500283 | 99117 MAKER: MANUFACTURER 1 ... | NaN | NaN | 12 | PCS | | Product 6 | 500283 | 00116 MAKER: MANUFACTURER 1 ... | NaN | NaN | 12 | PCS | | Product 7 | 900078 | 307128 MAKER: MANUFACTURER 3 ... | NaN | NaN | 12 | PCS | | Product 8 | 900078 | 411354 MAKER: MANUFACTURER 3 ... | NaN | NaN | 2 | PCS | |描述|序列|参考|类型|价值|数量|计量单位| |----------- |-------- |---------------------------------- |-------- |----- |----- |----- | |产品1 |南| 12345 |类型1 |南| 6 |件| |产品2 |南| 23456 |南|南| 4 |件| |产品3 |南| 66778制造商：制造商1…|类型2 |南| 4 |件| |产品4 |南| 88776制造商：制造商2…|楠|楠| 2件| |产品5 | 500283 | 99117制造商：制造商1…|楠|楠| 12件| |产品6 | 500283 | 00116制造商：制造商1…|楠|楠| 12件| |产品7 | 900078 | 307128制造商：制造商3…|楠|楠| 12件| |产品8 | 900078 | 411354制造商：制造商3…|楠|楠| 2件| 我现在有两个问题

如果列[“ref”]包含string-aside-int，我需要将它们分开，将字符串放入一个新列（ref2）。我的运气很好：.split（“”，1）[0]和.split（“”，1）[1]

问：如何循环行，找出某一列是否包含int（标准）之外的字符串，并将其拆分为两个不同的列

我的输出应该是：

|参考（内部）|说明|数量| |---------- |----------------------- |----- | |12345 |产品1 | 6| ||类型1|| |23456 |产品2 | 4| |66778 |产品3 | 4| ||制造商：制造商1 || ||类型2|| |88776 |产品4 | 2| ||制造商：制造商2 || |99117 |产品5 | 12| ||序列号：500283 || ||制造商：制造商1 || |00116 |产品6 | 12| ||序列号：500283 || ||制造商：制造商1 || |307128 |产品7 | 12| ||序列号：900078 || ||制造商：制造商3 || 我只是不知道如何在Pandas的数据帧中实现上述输出

问题2：假设我在满足条件的情况下成功地将一个单元格拆分为两个，那么如何根据上面的示例输出排列它们？（列_old的int转到ref（int）*和Maker:XXX转到列_ref2并在列B中合成以在Excel中输出。与type（以及可能的其他列）相同）

谢谢你的提示！

以下是我的尝试：

我要加载的csv示例：

descr,serial,ref,type,val,qty,uom
Product 1,,12345,type 1,,6,PCS
Product 2,,23456,,,4,PCS
Product 3,,66778 MAKER: MANUFACTURER 1,type 2,,4,PCS
Product 4,,88776 MAKER: MANUFACTURER 2,,,2,

加载数据并创建一个新的数据框，名为

cleaned

，该数据框将根据所需输出进行操作和按摩

import pandas as pd
import numpy as np

raw = pd.read_csv("data.csv") # reading in the example file
cleaned = pd.DataFrame() # creating new dataframe 
cleaned['ref (int)'] =  raw['ref'].str.split(' ').str[0].copy() # creating ref (int) column that is just the first plat of the ref colum

# moving the rest of the data over
cleaned['description'] = raw['descr'] 
cleaned['ref_maker'] = raw['ref'].str.split(' ').str[1:].apply(' '.join) # making a new column for the rest of ref description if there is a text part after the integer in the ref column
cleaned['type_full'] = raw['type']
cleaned['qty'] = raw['qty']

clear_mask = cleaned.duplicated(['ref', 'qty'], keep='first') # looking for rows where the ref and qty values are the same as above, we dont want that to show up so this creates a series of booleans
cleaned.loc[clear_mask, 'qty'] = '' # setting duplicates to empty strings
cleaned.loc[clear_mask, 'ref'] = ''
cols = cleaned.columns.tolist() # rearranging columns so that qty is at the end
cols.append(cols.pop(cols.index('qty')))
cleaned = cleaned[cols]
print(cleaned)

现在我们有了一个数据帧（

清理后的

），它看起来像这样：

    ref (int) description              ref_maker type_full  qty
0     12345   Product 1                           type 1    6
1     23456   Product 2                              NaN    4
2     66778   Product 3  MAKER: MANUFACTURER 1    type 2    4
3     88776   Product 4  MAKER: MANUFACTURER 2       NaN    2

    ref (int)  qty                   desc
0     12345    6              Product 1
1     12345    6                 type 1
2     23456    4              Product 2
3     66778    4              Product 3
4     66778    4  MAKER: MANUFACTURER 1
5     66778    4                 type 2
6     88776    2              Product 4
7     88776    2  MAKER: MANUFACTURER 2

现在我们需要清理它

cleaned.replace('', np.NaN, inplace=True) # replacing empty strings with NaN
cleaned.set_index(['ref (int)', 'qty'], inplace=True) # fixing ref and qty columns for when it stacks (stacking will help make the multi-lined duplicates you wanted)
cleaned = cleaned.stack().to_frame().reset_index() # stacking the dataframe and then converting it back to a dataframe

（供参考），

.stack（）

命令将为您提供以下信息（这几乎是您想要的）：

现在我们再做一点清洁：

del cleaned['level_2'] # cleaning up old remnants from the stack (level_2 corresponds to the column names that you dont want in your final output)
cleaned.dropna() # deleting rows that have no values
cleaned.columns = ['ref', 'qty', 'desc'] # renaming the columns for clarity

现在看起来是这样的：

    ref (int) description              ref_maker type_full  qty
0     12345   Product 1                           type 1    6
1     23456   Product 2                              NaN    4
2     66778   Product 3  MAKER: MANUFACTURER 1    type 2    4
3     88776   Product 4  MAKER: MANUFACTURER 2       NaN    2

    ref (int)  qty                   desc
0     12345    6              Product 1
1     12345    6                 type 1
2     23456    4              Product 2
3     66778    4              Product 3
4     66778    4  MAKER: MANUFACTURER 1
5     66778    4                 type 2
6     88776    2              Product 4
7     88776    2  MAKER: MANUFACTURER 2

最后一步是用空字符串替换重复值，使其与所需输出匹配

import pandas as pd
import numpy as np

raw = pd.read_csv("data.csv") # reading in the example file
cleaned = pd.DataFrame() # creating new dataframe 
cleaned['ref (int)'] =  raw['ref'].str.split(' ').str[0].copy() # creating ref (int) column that is just the first plat of the ref colum

# moving the rest of the data over
cleaned['description'] = raw['descr'] 
cleaned['ref_maker'] = raw['ref'].str.split(' ').str[1:].apply(' '.join) # making a new column for the rest of ref description if there is a text part after the integer in the ref column
cleaned['type_full'] = raw['type']
cleaned['qty'] = raw['qty']

clear_mask = cleaned.duplicated(['ref', 'qty'], keep='first') # looking for rows where the ref and qty values are the same as above, we dont want that to show up so this creates a series of booleans
cleaned.loc[clear_mask, 'qty'] = '' # setting duplicates to empty strings
cleaned.loc[clear_mask, 'ref'] = ''
cols = cleaned.columns.tolist() # rearranging columns so that qty is at the end
cols.append(cols.pop(cols.index('qty')))
cleaned = cleaned[cols]
print(cleaned)

以下是最终输出：

 ref (int)                   desc qty
0     12345              Product 1   6
1                           type 1    
2     23456              Product 2   4
3     66778              Product 3   4
4            MAKER: MANUFACTURER 1    
5                           type 2    
6     88776              Product 4   2
7            MAKER: MANUFACTURER 2

请将示例数据以文本形式而不是图像形式发布。谢谢，先生！只有一件事：已清理。设置索引（['ref（int）'），您能告诉我如何（以及在代码中的位置）实现以下内容。*ref（int）行应保持原样。我找到了如何将“MAKER:”更改为“MKR:”例如，如果其他列的字符数不超过30个字符，我想将它们缝合在一起。我不想用串联方式剪切列。无论如何，谢谢。