Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用beautifulsoup和regex从字符串中获取日期,目前没有_Python_Regex_Date_Beautifulsoup - Fatal编程技术网

Python 使用beautifulsoup和regex从字符串中获取日期,目前没有

Python 使用beautifulsoup和regex从字符串中获取日期,目前没有,python,regex,date,beautifulsoup,Python,Regex,Date,Beautifulsoup,因此,当我写下我的文本时,我能够以以下格式捕获日期: text = "The event takes place from May 14-June 11, 2018" match = re.search(r'(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2}\-(Jan(uary)?|Fe

因此,当我写下我的文本时,我能够以以下格式捕获日期:

text = "The event takes place from May 14-June 11, 2018"
match = re.search(r'(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2}\-(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', text).group()
print(match)
'May 14-June 11, 2018'
但我真正想要的是使用beautifulsoup和regex从html页面中的任何位置提取日期,但我似乎无法复制上面的成功,尽管文本肯定存在于html中。我是个新手,所以我可能错过了一些明显的东西

open_page = driver.get(url)
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'html.parser')
date = soup.find(text=re.compile('(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2}\-(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}'))
print(date)
'None'
我还尝试:

html_source = driver.page_source
soup = BeautifulSoup(html_source, 'html.parser')
text = soup.get_text().strip()
match = re.search(r'(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2}\-(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}', text)
print(match)
'None'
HTML是:

<div class="event--date">
       The event takes place from May 14-June 11, 2018        </div>

这个文件真的有日期吗?如果是,请在你的问题中包含相关的HTML片段。用HTML编辑你能打印文本的值吗?只是想验证它是否正确解析了html。我尝试了
re.search
soup.find
和你的html片段,并得到了预期的日期。啊,我让它工作了,因为它是长破折号而不是短破折号。谢谢你的帮助!
soup = BeautifulSoup(html_source, 'html.parser')
text = soup.get_text().strip()
print(text)
'@charset "UTF-8";[ng\:cloak],[ng-cloak],[data-ng-cloak],[x-ng-cloak],.ng-cloak,.x-ng-cloak,.ng-hide:not(.ng-hide-animate){display:none !important;}ng\:form{display:block;}.ng-animate-shim{visibility:hidden;}.ng-anchor{position:absolute;}


The event takes place from May 14–June 11, 2018    '