Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 仅从字符串标记中提取值_Java_Regex_String - Fatal编程技术网

Java 仅从字符串标记中提取值

Java 仅从字符串标记中提取值,java,regex,string,Java,Regex,String,我试图从一个给定的字符串中提取值,该字符串可能包含许多具有该模式的标记,例如 <element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on. 我收到的不是这个: Item: Item: content1 Item: Item: content2 我只想使用一个liner-one m

我试图从一个给定的字符串中提取值,该字符串可能包含许多具有该模式的标记,例如

<element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on.
我收到的不是这个:

Item: 
Item: content1
Item:
Item: content2
我只想使用一个liner-one magicregexp除去这些空元素。我的意思是——在给定的字符串上,我将这个表达式作为一行应用,然后神奇地在数组中接收到我的期望值——而无需进一步处理或分组。
甚至可以实现吗?

您可以使用捕获组反向引用和惰性量词来动态获取所有内容:

<(element\d+)>(.*?)<\/\1>

您可以使用将我的正则表达式适配到Java代码中



免责声明:Regex绝对是错误的工具,您应该仔细研究XPath,但如果您不介意被边缘情况绊倒,这是一个快速而肮脏的解决方案。

使用现有的代码片段,您可以使用一些Regex应用以下代码来实现它。 看看下面的代码

导入正则表达式util

import java.util.regex.Matcher;
import java.util.regex.Pattern;

String pattern = "\\w+";
Pattern r = Pattern.compile(pattern);
String tempString =
  "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = tempString.split ("(<\\w+>)|(</\\w+>)");
for (String item:tempArray)
  {
      Matcher matcher = r.matcher(item);
      //check if the pattern matches 
      if(matcher.matches()){
          System.out.println ("Item: " + item);
      }
  }
import java.util.regex.Matcher;
导入java.util.regex.Pattern;
字符串模式=“\\w+”;
Pattern r=Pattern.compile(Pattern);
字符串tempString=
“内容1内容2”;
字符串[]tempArray=tempString.split(“()|()”);
for(字符串项:tempArray)
{
匹配器匹配器=r.匹配器(项目);
//检查模式是否匹配
if(matcher.matches()){
System.out.println(“项:“+项”);
}
}
希望这有帮助


谢谢

如果您可以使用streams,您可以使用您的正则表达式,只需过滤掉空的正则表达式:

String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
                            .filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));
String tempString=“content1content2”;
String[]tempArray=Pattern.compile(“()|()”).splitAsStream(tempString)
.filter(s->!s.isEmpty()).toArray(字符串[]:::新建);
System.out.println(Arrays.toString(tempArray));

另一个解决方案是使用Xpath:

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Extract {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
        // Q 57876359

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + 
                "<elements>\r\n" + 
                "   <element1>content1</element1>\r\n" + 
                "   <element2>content2</element2>\r\n" + 
                "   <element3>content3</element3>\r\n" + 
                "</elements>");
        InputSource is = new InputSource(new StringReader(xml));
        Document doc = builder.parse(is);   
        XPathFactory xpathfactory = XPathFactory.newInstance();
        XPath xpath = xpathfactory.newXPath();

        int nodes = doc.getChildNodes().getLength();    

        NodeList nodeList = doc.getChildNodes();    
        //To get <elements> root node
        Node firstNode = nodeList.item(0);

        //To get childs element0...elementN
        NodeList elementNodes = firstNode.getChildNodes();

        //Last node is a text node
        Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);

        //To extract index of last tag
        String lastInnerNodeName = lastInnerNode.getNodeName(); 
        int lastNodeIndex =  Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));

        XPathExpression xpathexpression;

        //To extract every content
        for (int i = 1; i <= lastNodeIndex; i++) {
            xpathexpression = xpath.compile("//element"+i+"/text()");
            Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
            String texto = (String) result;
            System.out.println("Item: "+texto);
        }
  }
import java.io.IOException;
导入java.io.StringReader;
导入javax.xml.parsers.DocumentBuilder;
导入javax.xml.parsers.DocumentBuilderFactory;
导入javax.xml.parsers.parserConfiguration异常;
导入javax.xml.xpath.xpath;
导入javax.xml.xpath.XPathConstants;
导入javax.xml.xpath.XPathFactory;
导入org.w3c.dom.Document;
导入org.w3c.dom.Node;
导入org.w3c.dom.NodeList;
导入javax.xml.xpath.XPathExpression;
导入javax.xml.xpath.XPathExpressionException;
导入org.xml.sax.InputSource;
导入org.xml.sax.SAXException;
公共类摘录{
公共静态void main(字符串[]args)抛出ParserConfiguration异常、SAXException、IOException、XPathExpressionException{
//问题57876359
DocumentBuilderFactory工厂=DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder=factory.newDocumentBuilder();
字符串xml=新字符串(“\r\n”+
“\r\n”+
“content1\r\n”+
“content2\r\n”+
“内容3\r\n”+
"");
InputSource is=新的InputSource(新的StringReader(xml));
文档doc=builder.parse(is);
XPathFactory XPathFactory=XPathFactory.newInstance();
XPath=xpathfactory.newXPath();
int nodes=doc.getChildNodes().getLength();
NodeList NodeList=doc.getChildNodes();
//获取根节点
Node firstNode=nodeList.item(0);
//要获取childs元素0…元素n
NodeList elementNodes=firstNode.getChildNodes();
//最后一个节点是文本节点
节点lastInnerNode=elementNodes.item(elementNodes.getLength()-2);
//提取最后一个标记的索引
字符串lastInnerNodeName=lastInnerNode.getNodeName();
int lastNodeIndex=Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1,lastInnerNodeName.length());
XPathExpression XPathExpression;
//提取每一个内容

对于(int i=1;我建议阅读这个主题:我认为不需要正则表达式,在这里通过对非空字符串应用check就可以了。是的,也可以。但是您提到了正则表达式,所以我在示例中使用了正则表达式。嗯,可以应用“匹配器”吗给定正则表达式中的功能?我希望只使用返回期望值的表达式,而不需要更多的代码行(匹配器、分组)。
String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
                            .filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Extract {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
        // Q 57876359

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + 
                "<elements>\r\n" + 
                "   <element1>content1</element1>\r\n" + 
                "   <element2>content2</element2>\r\n" + 
                "   <element3>content3</element3>\r\n" + 
                "</elements>");
        InputSource is = new InputSource(new StringReader(xml));
        Document doc = builder.parse(is);   
        XPathFactory xpathfactory = XPathFactory.newInstance();
        XPath xpath = xpathfactory.newXPath();

        int nodes = doc.getChildNodes().getLength();    

        NodeList nodeList = doc.getChildNodes();    
        //To get <elements> root node
        Node firstNode = nodeList.item(0);

        //To get childs element0...elementN
        NodeList elementNodes = firstNode.getChildNodes();

        //Last node is a text node
        Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);

        //To extract index of last tag
        String lastInnerNodeName = lastInnerNode.getNodeName(); 
        int lastNodeIndex =  Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));

        XPathExpression xpathexpression;

        //To extract every content
        for (int i = 1; i <= lastNodeIndex; i++) {
            xpathexpression = xpath.compile("//element"+i+"/text()");
            Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
            String texto = (String) result;
            System.out.println("Item: "+texto);
        }
  }