Python 按索引行复制/复制行/Numpy
我试图通过数据帧的索引复制/复制数据帧中的几行,但我并没有为此做出任何接近的结果 给定此数据帧:Python 按索引行复制/复制行/Numpy,python,pandas,numpy,Python,Pandas,Numpy,我试图通过数据帧的索引复制/复制数据帧中的几行,但我并没有为此做出任何接近的结果 给定此数据帧: Source_ID | SecondaryName | PrimaryName | Address | City | State | Full_Postal_Code | Postal_Code | Country | Telephone | Brand0 | B
Source_ID | SecondaryName | PrimaryName | Address | City | State | Full_Postal_Code | Postal_Code | Country | Telephone | Brand0 | Brand1 | Brand2
123456 | JACK SCHMITT CADILLAC, INC. | JACK SCHMITT CADILLAC, INC. | 915 W HWY 50 | O FALLON | IL | 62269 | 62269 | USA | 6186321001 | Cadillac | GMC | Buick
987654 | JAMES E. BLACK CADILLAC | JAMES E. BLACK CADILLAC | 3929 ADMIRAL PEARY HWY | EBENSBURG | PA | 15931 | 15931 | USA | 8144729553 | Cadillac | NaN | GMC
753951 | COLE-VALLEY MOTOR COMPANY | COLE-VALLEY MOTOR COMPANY | 4111 ELM ROAD NE | WARREN | OH | 44483 | 44483 | USA | 3303721668 | Cadillac | Buick | NaN
159357 | MCDONALD GMC-CADILLAC, INC. | MCDONALD GMC-CADILLAC, INC. | 5155 STATE ST | SAGINAW | MI | 48603 | 48603 | USA | 9897905154 | Cadillac | Buick | NaN
456987 | DAVID BRUCE AUTO CENTER, INC. | DAVID BRUCE AUTO CENTER, INC. | 555 LATHAM DR | BOURBONNAIS | IL | 60914 | 60914 | USA | 8159337709 | Cadillac | Chevrolet | GMC
321456 | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST | BELVIDERE | IL | 61008 | 61008 | USA | 8155443403 | Cadillac | NaN | NaN
我的守则如下:
def duplicateDealers(self, data):
pd.options.display.width = 0
counter = 0
for index, row in data.iterrows():
brandColumn = 'Brand' + str(counter)
# print(index, row[brandColumn])
if str(row[brandColumn]) == 'Cadillac':
newData = pd.DataFrame(np.repeat(data.loc[int(index)], 1))
newData['Repeated'] = 'Yes'
print(newData.columns)
print(type(pd.DataFrame(np.repeat(data.loc[int(index)], 1))))
print(newData)
如果我使用以下代码:
newData = pd.DataFrame(np.repeat(data.loc[int(index)], 1, axis=0))
我得到这个错误:
ValueError:在repeat()的实现中不支持“axis”参数
我想用这段代码实现什么
我迭代行和列以识别列“Brand0”中的单词“Cadillac”,如果条件为真,那么我希望通过其索引复制整行并保持行的原始格式,然后我将根据自己的意愿操作新行数据
输出如下(列名“4108”是一个随机索引,数据帧有大量记录,超过5k):
我想要的结果是:
我做错了什么
问候并感谢你
编辑:
以下是一些示例数据:
编辑2: 以下是我试图实现的目标的更多细节: 每一行可能有几个BrandX列,根据其内容,我将复制该行并将品牌名称和其他内容添加到源ID中,以便根据经销商品牌获得正确数量的记录 数据帧:
Source_ID | SecondaryName | PrimaryName | Address | City | State | Full_Postal_Code | Postal_Code | Country | Telephone | Brand0 | Brand1 | Brand2
123456 | JACK SCHMITT CADILLAC, INC. | JACK SCHMITT CADILLAC, INC. | 915 W HWY 50 | O FALLON | IL | 62269 | 62269 | USA | 6186321001 | Cadillac | GMC | Buick
987654 | JAMES E. BLACK CADILLAC | JAMES E. BLACK CADILLAC | 3929 ADMIRAL PEARY HWY | EBENSBURG | PA | 15931 | 15931 | USA | 8144729553 | Cadillac | NaN | GMC
753951 | COLE-VALLEY MOTOR COMPANY | COLE-VALLEY MOTOR COMPANY | 4111 ELM ROAD NE | WARREN | OH | 44483 | 44483 | USA | 3303721668 | Cadillac | Buick | NaN
159357 | MCDONALD GMC-CADILLAC, INC. | MCDONALD GMC-CADILLAC, INC. | 5155 STATE ST | SAGINAW | MI | 48603 | 48603 | USA | 9897905154 | Cadillac | Buick | NaN
456987 | DAVID BRUCE AUTO CENTER, INC. | DAVID BRUCE AUTO CENTER, INC. | 555 LATHAM DR | BOURBONNAIS | IL | 60914 | 60914 | USA | 8159337709 | Cadillac | Chevrolet | GMC
321456 | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST | BELVIDERE | IL | 61008 | 61008 | USA | 8155443403 | Cadillac | NaN | NaN
预期产出:
Source_ID | SecondaryName | PrimaryName | Address | City | State | Full_Postal_Code | Postal_Code | Country | Telephone | Brand0 | Brand1 | Brand2
123456_Cadillac | JACK SCHMITT CADILLAC, INC. | JACK SCHMITT CADILLAC, INC. | 915 W HWY 50 | O FALLON | IL | 62269 | 62269 | USA | 6186321001 | Cadillac | GMC | Buick
123456_GMC | JACK SCHMITT CADILLAC, INC. | JACK SCHMITT CADILLAC, INC. | 915 W HWY 50 | O FALLON | IL | 62269 | 62269 | USA | 6186321001 | Cadillac | GMC | Buick
123456_Buick | JACK SCHMITT CADILLAC, INC. | JACK SCHMITT CADILLAC, INC. | 915 W HWY 50 | O FALLON | IL | 62269 | 62269 | USA | 6186321001 | Cadillac | GMC | Buick
987654_Cadillac | JAMES E. BLACK CADILLAC | JAMES E. BLACK CADILLAC | 3929 ADMIRAL PEARY HWY | EBENSBURG | PA | 15931 | 15931 | USA | 8144729553 | Cadillac | NaN | GMC
987654_GMC | JAMES E. BLACK CADILLAC | JAMES E. BLACK CADILLAC | 3929 ADMIRAL PEARY HWY | EBENSBURG | PA | 15931 | 15931 | USA | 8144729553 | Cadillac | NaN | GMC
753951_Cadillac | COLE-VALLEY MOTOR COMPANY | COLE-VALLEY MOTOR COMPANY | 4111 ELM ROAD NE | WARREN | OH | 44483 | 44483 | USA | 3303721668 | Cadillac | Buick | NaN
753951_GMC | COLE-VALLEY MOTOR COMPANY | COLE-VALLEY MOTOR COMPANY | 4111 ELM ROAD NE | WARREN | OH | 44483 | 44483 | USA | 3303721668 | Cadillac | Buick | NaN
159357_Cadillac | MCDONALD GMC-CADILLAC, INC. | MCDONALD GMC-CADILLAC, INC. | 5155 STATE ST | SAGINAW | MI | 48603 | 48603 | USA | 9897905154 | Cadillac | Buick | NaN
159357_Buick | MCDONALD GMC-CADILLAC, INC. | MCDONALD GMC-CADILLAC, INC. | 5155 STATE ST | SAGINAW | MI | 48603 | 48603 | USA | 9897905154 | Cadillac | Buick | NaN
456987_Cadillac | DAVID BRUCE AUTO CENTER, INC. | DAVID BRUCE AUTO CENTER, INC. | 555 LATHAM DR | BOURBONNAIS | IL | 60914 | 60914 | USA | 8159337709 | Cadillac | Chevrolet | GMC
456987_Chevrolet | DAVID BRUCE AUTO CENTER, INC. | DAVID BRUCE AUTO CENTER, INC. | 555 LATHAM DR | BOURBONNAIS | IL | 60914 | 60914 | USA | 8159337709 | Cadillac | Chevrolet | GMC
456987_GMC | DAVID BRUCE AUTO CENTER, INC. | DAVID BRUCE AUTO CENTER, INC. | 555 LATHAM DR | BOURBONNAIS | IL | 60914 | 60914 | USA | 8159337709 | Cadillac | Chevrolet | GMC
321456_Cadillac | JACK WOLF CADILLAC-GMC TRUCK, INC. | JACK WOLF CADILLAC-GMC TRUCK, INC. | 1855 N STATE ST | BELVIDERE | IL | 61008 | 61008 | USA | 8155443403 | Cadillac | NaN | NaN
下面的代码可以做到这一点 模块
import numpy as np
import pandas as pd
示例数据
df = pd.DataFrame({'Source':[111347,115742,100007], 'Brand0':['Cadillac', 'Cadillac', 'Alternative']})
使用np解决方案。重复
df.loc[np.repeat(df.index.values, list(df['Brand0'].isin(['Cadillac'])+1))]
你能分享数据而不是截图吗。这将有助于重现您的问题并帮助您解决问题。谢谢您的回答,我已经更新了帖子。实际上,这是可行的,但一次复制所有行,我会丢失每个重复行的可见性,以便在行的单元格中进行一些更改。假设我需要根据我在条件中检测到的品牌(可以是凯迪拉克、别克、雪佛兰等)更改“来源”,然后进行一些更改,例如:Row1=Source['111347_GMC'];Row2=Source['111347_凯迪拉克']。这就是为什么我需要一行一行地复制数据,这样我就可以控制每个复制的行,并对行进行必要的更改。不确定我是否理解您的要求,在一次复制完所有数据之后,使用df.iloc[0:2,:]
逐行分割数据不是更容易吗?@RuthgerRighart,可能添加df['Repeated']=df.duplicated()
您的答案符合OP要求。@RuthgerRighart如果我不太清楚,我很抱歉,我已经再次更新了帖子。