Python 从列中提取数字
我有一个包含许多列的数据集。我想搜索以下任何一个数字:Python 从列中提取数字,python,pandas,Python,Pandas,我有一个包含许多列的数据集。我想搜索以下任何一个数字: Column_to_look_at 10 days ago I was ... How old are you? I am 24 years old I do not know. Maybe 23.12? I could21n .... 我需要创建两列:一列提取该列中包含的数字,另一列仅包含布尔值(如果行包含或不包含数字) 我期望的产出 Column_to_look_at Numbers
Column_to_look_at
10 days ago I was ...
How old are you?
I am 24 years old
I do not know. Maybe 23.12?
I could21n ....
我需要创建两列:一列提取该列中包含的数字,另一列仅包含布尔值(如果行包含或不包含数字)
我期望的产出
Column_to_look_at Numbers Bool
10 days ago I was ... [10] 1
How old are you? [] 0
I am 24 years old [24] 1
I do not know. Maybe 23.12 or 23.14? [23.12, 23.14] 1
I could21n .... [21] 1
我用于选择数字的代码如下
df[df.applymap(np.isreal).all(1)]
但实际上这并没有给我预期的输出(至少对于数字选择)。
如果您对如何从该列中提取数字有任何建议,我们将不胜感激。谢谢这样就可以了
def checknum(x):
num_list = re.findall(r"[+-]?\d+(?:\.\d+)?", x['Column_to_look_at'])
return num_list
df['Numbers'] = df.apply(checknum, axis=1)
df['Bool'] = df.apply(lambda x: 1 if len(x['Numbers']) > 0 else 0, axis=1)
您需要一个正则表达式模式匹配来从每一行获取数字数据。谢谢。像这样的
df.Column\u to\u look\u at.str.extract(“(\d+)”)
。如何分配布尔值?这会很有帮助:它只提取正整数,忽略浮点数甚至负整数。现在将提取pos、neg和浮点数