Python 将列中的值提取到新列
我想将列的内部提取为多个列。这是导入到dataframe后的原始数据Python 将列中的值提取到新列,python,pandas,dataframe,data-extraction,Python,Pandas,Dataframe,Data Extraction,我想将列的内部提取为多个列。这是导入到dataframe后的原始数据 data = {'ID': ['A0001', 'A0002', 'A0003', 'A0004', 'A0005'], 'Name': ['John', 'Micheal', 'Angle', 'Jim', 'Rome'], 'Details': ['Type:\nHouse\nVector:\nTriangle\n\nMission:\nCompleted,lv5\n\nNote user
data = {'ID': ['A0001', 'A0002', 'A0003', 'A0004', 'A0005'],
'Name': ['John', 'Micheal', 'Angle', 'Jim', 'Rome'],
'Details': ['Type:\nHouse\nVector:\nTriangle\n\nMission:\nCompleted,lv5\n\nNote user:\n#', 'Type:\n#\nVector:\n\n\nMission:\nFailed\nNote user:\n#', 'Type:\nCar\nVector:\nSquare\nMission:\nCompleted\nNote user:\n', 'Type:\n#\nVector:\n#\nMission:\nCompleted without award\n\nNote user:\nNo end', 'Type:\n#\nVector:\n#\nMission:\n\n\nNote user:\nThere are many mistake.\nI cant choose.\nI cant buy.']
}
df = pd.DataFrame (data, columns=['ID', 'Name', 'Details'])
df
ID Name Details
A0001 John Type:\nHouse\nVector:\nTriangle\n\nMission:\nCompleted,lv5\n\nNote user:\n#
A0002 Micheal Type:\n#\nVector:\n\n\nMission:\nFailed\nNote user:\n#
A0003 Angle Type:\nCar\nVector:\nSquare\nMission:\nCompleted\nNote user:\n
A0004 Jim Type:\n#\nVector:\n#\nMission:\nCompleted without award\n\nNote user:\nNo end
A0005 Rome Type:\n#\nVector:\n#\nMission:\n\n\nNote user:\nThere are many mistake.\nI cant choose.\nI cant buy.
我想提取“详细信息”列中的值。但我不知道该怎么做
我预期的数据如下所示
data = {'ID': ['A0001', 'A0002', 'A0003', 'A0004', 'A0005'],
'Name': ['John', 'Micheal', 'Angle', 'Jim', 'Rome'],
'Type': ['House', '#', 'Car', '#', '#'],
'Vector': ['Triangle', '', 'Square', '#', '#'],
'Mission': ['Completed,lv5', 'Failed', 'Completed', 'Completed without award', ''],
'Note user': ['#', '#', '', 'No end', 'There are many mistake.I cant choose.I cant buy.']
}
df = pd.DataFrame (data, columns=['ID', 'Name', 'Type', 'Vector', 'Mission', 'Note user'])
df
ID Name Type Vector Mission Note
A0001 John House Triangle Completed,lv5 #
A0002 Micheal # Failed #
A0003 Angle Car Square Completed
A0004 Jim # # Completed without award No end
A0005 Rome # # There are many mistake.I cant choose.I cant buy.
以下是我尝试的内容:
Details
中的第一个值是:
'Type:\nHouse\nVector:\nTriangle\n\nMission:\nCompleted,lv5\n\nNote user:\n#'
我编写这个函数是为了将细节提取到一个dict
。我对数组的索引进行了硬编码,但如果愿意,您可以选择不这样做:
def extract_details(text):
array = text.replace("\n\n", "\n").split("\n")
return {
array[0].replace(":", ""): array[1],
array[2].replace(":", ""): array[3],
array[4].replace(":", ""): array[5],
array[6].replace(":", ""): array[7]
}
将函数应用于整个列:
df['Details'].apply(extract_details)
将此新列连接到原始列:
pd.concat([
df,
pd.DataFrame(df['Details'].apply(extract_details).apply(pd.Series))
], axis=1)
你可以用它来得到答案。附件中附有文件链接
首先,我将所有\n
替换为'
。这样,所有换行符都将从Details
列中删除
然后我抓取两个关键字之间的所有文本。
对于类型,数据介于'Type:'
和'Vector:'
之间。矢量和任务也是如此。注意,我正在抓取'Note user:'
之后的所有数据。现在您已经从详细信息列中提取了数据,可以删除该列了
import pandas as pd
data = {'ID': ['A0001', 'A0002', 'A0003', 'A0004', 'A0005'],
'Name': ['John', 'Micheal', 'Angle', 'Jim', 'Rome'],
'Details': ['Type:\nHouse\nVector:\nTriangle\n\nMission:\nCompleted,lv5\n\nNote user:\n#', 'Type:\n#\nVector:\n\n\nMission:\nFailed\nNote user:\n#', 'Type:\nCar\nVector:\nSquare\nMission:\nCompleted\nNote user:\n', 'Type:\n#\nVector:\n#\nMission:\nCompleted without award\n\nNote user:\nNo end', 'Type:\n#\nVector:\n#\nMission:\n\n\nNote user:\nThere are many mistake.\nI cant choose.\nI cant buy.']
}
df = pd.DataFrame (data, columns=['ID', 'Name', 'Details'])
df['Details'] = df.Details.str.replace('\n','', regex=True)
df['Type'] = df.Details.str.extract('Type\:(.*)Vector')
df['Vector'] = df.Details.str.extract('Vector\:(.*)Mission')
df['Mission'] = df.Details.str.extract('Mission\:(.*)Note')
df['Note'] = df.Details.str.extract('Note user\:(.*)')
print (df[['ID','Name','Type','Vector']])
print (df[['Mission','Note']])
其输出将为:
ID Name Type Vector
0 A0001 John House Triangle
1 A0002 Micheal #
2 A0003 Angle Car Square
3 A0004 Jim # #
4 A0005 Rome # #
Mission Note
0 Completed,lv5 #
1 Failed #
2 Completed
3 Completed without award No end
4 There are many mistake.I cant choose.I cant buy.
看起来您希望按\n
拆分列,并将所有值存储到不同的列中。类似的列有一些值\n\n\n标题也是带冒号的单词(例如,键入:Vector:Mission:Note:)。