Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/.net/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python在某些字符串后提取数字_Python_Regex_Pandas_Data Manipulation - Fatal编程技术网

Python在某些字符串后提取数字

Python在某些字符串后提取数字,python,regex,pandas,data-manipulation,Python,Regex,Pandas,Data Manipulation,我有一个数据框,如下所示 import pandas as pd page = ['A','B','C','D'] URL = ['aaa.bbb3333.ccc.de12345.dddd.cccc','ccc2222.ddd.aaa.ho16589.ddd','ddd16893.aaa.de59875','aaa15875.ccc.ddd.ho13532'] df = pd.DataFrame({'page':page,'URL':URL}) 我想创建一个列,在“de”或“ho”之后提取数字

我有一个数据框,如下所示

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb3333.ccc.de12345.dddd.cccc','ccc2222.ddd.aaa.ho16589.ddd','ddd16893.aaa.de59875','aaa15875.ccc.ddd.ho13532']
df = pd.DataFrame({'page':page,'URL':URL})
我想创建一个列,在“de”或“ho”之后提取数字。注意数字的长度可能不同,“de”或“ho”的位置也可能不同

我的代码如下所示:

import re
def extract_number(df,url):
    for url in df:
        if df[url].str.contains('de', na = False) == True:
            match = re.search('de:P(\d+)')
        elif df[url].str.contains('ho', na = False) == True:
            match = re.search('ho:P(\d+)')
        else:
            match = 'not found'
        print(match)

out = extract_number(df, 'URL')
它返回错误“序列的真值不明确”。使用a.empty、a.bool()、a.item()、a.any()或a.all()

所需的输出应如下所示:

import pandas as pd
page = ['A','B','C','D']
URL = ['aaa.bbb.ccc.de12345.dddd.cccc','ccc.ddd.aaa.ho16589.ddd','ddd.aaa.de59875','aaa.ccc.ddd.ho13532']
ID = ['12345','16589','59875','13532']
df = pd.DataFrame({'page':page,'URL':URL,'ID':ID})

万分感谢

使用具有正向后视功能的
str.extract

df["num"] = df["URL"].str.extract(r"(?<=de|ho)(\d+)")

print (df)

#
  page                                URL    num
0    A  aaa.bbb3333.ccc.de12345.dddd.cccc  12345
1    B        ccc2222.ddd.aaa.ho16589.ddd  16589
2    C               ddd16893.aaa.de59875  59875
3    D           aaa15875.ccc.ddd.ho13532  13532

df[“num”]=df[“URL”].str.extract(r)(?您还需要从URL中删除数字?删除数字是什么意思?生成的数据帧应该有URL,但id之外没有其他数字(我猜)…所以,不是
aaa.bbb3333.ccc.de12345.dddd.cccc
,而是这个
aaa.bbb.ccc.de12345.dddddd.cccc
是的,你是对的…但是他/她期望的结果没有这些数字…无论如何,干得好:)所以抱歉,伙计们。那是个错误。我忘了在URL中键入数字。谢谢!