Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
我的Java正则表达式没有';我不能正常工作_Java_Regex - Fatal编程技术网

我的Java正则表达式没有';我不能正常工作

我的Java正则表达式没有';我不能正常工作,java,regex,Java,Regex,我编写了一个正则表达式,如下所示,它使用从字符串中提取日期: (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\*){0,2}\s+\d{1,2}\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{4} 在转换为java正则表达式之前,我在这里测试了它 结果看起来没问题,符合我的要求 “

我编写了一个正则表达式,如下所示,它使用从字符串中提取日期:

(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\*){0,2}\s+\d{1,2}\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{4}
在转换为java正则表达式之前,我在这里测试了它

结果看起来没问题,符合我的要求

“el”对象是字符串类型的arraylist

holiday: New Year's Day Wednesday 1 January 2014
holiday: Chinese New Year Friday 31 January 2014 Saturday 1 February 2014
holiday: Good Friday Friday 18 April 2014
holiday: Labour Day Thursday 1 May 2014
holiday: Vesak Day Tuesday 13 May 2014
holiday: Hari Raya Puasa Monday 28 July 2014
holiday: National Day  Saturday 9 August 2014
holiday: Hari Raya Haji  Sunday* 5 October 2014
holiday: Deepavali  Thursday** 23 October 2014
holiday: Christmas Day Thursday 25 December 2014
问题是java中的一些日期缺失,一些匹配,我在这里也测试了,同样的错误

更新:

  Found: Wednesday 1 January 2014
holiday: New Year's Day Wednesday 1 January 2014
 Found: Saturday 1 February 2014
holiday: Chinese New Year Friday 31 January 2014 Saturday 1 February 2014
holiday: Good Friday Friday 18 April 2014
 Found: Thursday 1 May 2014
holiday: Labour Day Thursday 1 May 2014
holiday: Vesak Day Tuesday 13 May 2014
holiday: Hari Raya Puasa Monday 28 July 2014
holiday: National Day  Saturday 9 August 2014
 Found: Sunday* 5 October 2014
holiday: Hari Raya Haji  Sunday* 5 October 2014
holiday: Deepavali  Thursday** 23 October 2014
 Found: Thursday 25 December 2014
holiday: Christmas Day Thursday 25 December 2014
holiday:  
holiday:  
我的代码的完整版本:

import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class Tester {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {

        updateSingaporeHolidayCalendar();
    }
    
public static void updateSingaporeHolidayCalendar() throws IOException{
        
        String url = "http://www.mom.gov.sg/employment-practices/leave-and-holidays/Pages/public-holidays-2014.aspx";
        Document document = Jsoup.connect(url).get();

        Elements holidays = document.select("#contentarea table tr");
        // System.out.println("12312312");
        //System.out.println("web page context: " + question);
        List<String> el = new ArrayList<String>();
        for(int i = 2; i < holidays.size() + 1; i++){
            if((i&1) == 1) continue;
            Elements threeGroup = holidays.get(i-2).getElementsByTag("td");
            
            int j = 2;
            for(Element e : threeGroup){
                if(j-- != 0) continue;
                j = 2;
                el.add(e.text());
            }
        }
        
        
        Pattern pattern = Pattern.compile("(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\\*){0,2}\\s+\\d{1,2}\\s+(January|February|March|April|May|June|July|August|September|October|November|December)\\s+\\d{4}");
      
        //out put
        for(int k = 0; k < el.size(); k++){
            
            Matcher matcher = pattern.matcher(el.get(k));
            // Check all occurrences
            while (matcher.find()) {
                //System.out.print("Start index: " + matcher.start());
                //System.out.print(" End index: " + matcher.end());
                System.out.println(" Found: " + matcher.group());
            }
            System.out.println("holiday: " + el.get(k));
        }

    }

}
已解决

holiday: New Year's Day Wednesday 1 January 2014
holiday: Chinese New Year Friday 31 January 2014 Saturday 1 February 2014
holiday: Good Friday Friday 18 April 2014
holiday: Labour Day Thursday 1 May 2014
holiday: Vesak Day Tuesday 13 May 2014
holiday: Hari Raya Puasa Monday 28 July 2014
holiday: National Day  Saturday 9 August 2014
holiday: Hari Raya Haji  Sunday* 5 October 2014
holiday: Deepavali  Thursday** 23 October 2014
holiday: Christmas Day Thursday 25 December 2014
正如@Pshemo所说,“您从站点获取的数据也不包含可以用HTML编写的空格,显然它不属于\s类。要解决此问题,请将每个\s替换为[\s\u00A0]以包含此字符(使用Unicode标识符编写)。”

因此,将表达式更改为:

 Pattern pattern = Pattern
        .compile("(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\\*){0,2}[\\s\u00A0]+\\d{1,2}[\\s\u00A0]+(January|February|March|April|May|June|July|August|September|October|November|December)[\\s\u00A0]+\\d{4}");

解决了这个问题。

我可以肯定地说,您将错过最后两个,因为您的迭代会短停两个元素,即更改:

k < el.size() - 2
您的正则表达式看起来正常。

for(int k=0;k被限制为
el.size()-2

尝试删除-2以循环浏览列表中的所有元素

您从站点获得的数据也包含
无中断空间
,可以用HTML编写为
 \\s
类。要解决此问题,请将每个
\\s
替换为
[\\s\u00A0]
以包含此字符(使用Unicode标识符写入)

所以你的正则表达式看起来像

Pattern pattern = Pattern
        .compile("(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\\*){0,2}[\\s\u00A0]+\\d{1,2}[\\s\u00A0]+(January|February|March|April|May|June|July|August|September|October|November|December)[\\s\u00A0]+\\d{4}");

对不起,你们能不能再多谈谈你们面临的问题?为了我。。。没有最后两行,但由于您使用了size-2,我认为这是正确的。它看起来也不错。您的程序打印到控制台的输出是什么?@JasonC您尝试了我最少的代码吗?它有效吗?不,我没有JSoup。我删除最后2个的原因是我不需要它们,我认为这不是问题所在,我用java测试了“假日:耶稣受难日2014年4月18日星期五”、“假日:卫塞节2014年5月13日星期二”、“假日:2014年7月28日星期一”、“假日:2014年8月9日国庆日星期六”,“假日:迪帕瓦利星期四**2014年10月23日“我们错过了。其他的都是匹配的。@user1837485我刚刚测试了你的代码,它似乎工作得很好。请看这里。@Pshemo您的代码可以工作,等一下,我将粘贴所有完整的代码,实际上我从一个网站获取了字符串列表。@Pshemo有没有办法将外部lib导入IDEONE?@Pshemo我将代码粘贴在那里,但是JSoup.jar丢失了,+1非常有趣,使用regex非中断空格不是一个“空格”@波希米亚人:是的,这很有趣,但它很有意义,因为它不用于分隔新行标记、制表符等单词,而是用于分隔两个单词的空格,所以它是一种“字母/字符”,而不是空格:)
Pattern pattern = Pattern
        .compile("(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)(\\*){0,2}[\\s\u00A0]+\\d{1,2}[\\s\u00A0]+(January|February|March|April|May|June|July|August|September|October|November|December)[\\s\u00A0]+\\d{4}");