Java 仅从字符串标记中提取值_Java_Regex_String

Java 仅从字符串标记中提取值

java regex string

Java 仅从字符串标记中提取值,java,regex,string,Java,Regex,String,我试图从一个给定的字符串中提取值，该字符串可能包含许多具有该模式的标记，例如 <element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on. 我收到的不是这个： Item: Item: content1 Item: Item: content2 我只想使用一个liner-one m

我试图从一个给定的字符串中提取值，该字符串可能包含许多具有该模式的标记，例如

<element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on.

我收到的不是这个：

Item: 
Item: content1
Item:
Item: content2

我只想使用一个liner-one magicregexp除去这些空元素。我的意思是——在给定的字符串上，我将这个表达式作为一行应用，然后神奇地在数组中接收到我的期望值——而无需进一步处理或分组。

甚至可以实现吗？

您可以使用捕获组反向引用和惰性量词来动态获取所有内容：

<(element\d+)>(.*?)<\/\1>

您可以使用将我的正则表达式适配到Java代码中

免责声明：Regex绝对是错误的工具，您应该仔细研究XPath，但如果您不介意被边缘情况绊倒，这是一个快速而肮脏的解决方案。

使用现有的代码片段，您可以使用一些Regex应用以下代码来实现它。看看下面的代码

导入正则表达式util

import java.util.regex.Matcher;
import java.util.regex.Pattern;

String pattern = "\\w+";
Pattern r = Pattern.compile(pattern);
String tempString =
  "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = tempString.split ("(<\\w+>)|(</\\w+>)");
for (String item:tempArray)
  {
      Matcher matcher = r.matcher(item);
      //check if the pattern matches 
      if(matcher.matches()){
          System.out.println ("Item: " + item);
      }
  }

import java.util.regex.Matcher；
导入java.util.regex.Pattern；
字符串模式=“\\w+”；
Pattern r=Pattern.compile（Pattern）；
字符串tempString=
“内容1内容2”；
字符串[]tempArray=tempString.split（“（）|（）”）；
for（字符串项：tempArray）
{
匹配器匹配器=r.匹配器（项目）；
//检查模式是否匹配
if（matcher.matches（））{
System.out.println（“项：“+项”）；
}
}

希望这有帮助

谢谢

如果您可以使用streams，您可以使用您的正则表达式，只需过滤掉空的正则表达式：

String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
                            .filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));

String tempString=“content1content2”；
String[]tempArray=Pattern.compile（“（）|（）”）.splitAsStream（tempString）
.filter（s->！s.isEmpty（））.toArray（字符串[]：：：新建）；
System.out.println（Arrays.toString（tempArray））；

另一个解决方案是使用Xpath：

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Extract {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
        // Q 57876359

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + 
                "<elements>\r\n" + 
                "   <element1>content1</element1>\r\n" + 
                "   <element2>content2</element2>\r\n" + 
                "   <element3>content3</element3>\r\n" + 
                "</elements>");
        InputSource is = new InputSource(new StringReader(xml));
        Document doc = builder.parse(is);   
        XPathFactory xpathfactory = XPathFactory.newInstance();
        XPath xpath = xpathfactory.newXPath();

        int nodes = doc.getChildNodes().getLength();    

        NodeList nodeList = doc.getChildNodes();    
        //To get <elements> root node
        Node firstNode = nodeList.item(0);

        //To get childs element0...elementN
        NodeList elementNodes = firstNode.getChildNodes();

        //Last node is a text node
        Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);

        //To extract index of last tag
        String lastInnerNodeName = lastInnerNode.getNodeName(); 
        int lastNodeIndex =  Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));

        XPathExpression xpathexpression;

        //To extract every content
        for (int i = 1; i <= lastNodeIndex; i++) {
            xpathexpression = xpath.compile("//element"+i+"/text()");
            Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
            String texto = (String) result;
            System.out.println("Item: "+texto);
        }
  }

import java.io.IOException；
导入java.io.StringReader；
导入javax.xml.parsers.DocumentBuilder；
导入javax.xml.parsers.DocumentBuilderFactory；
导入javax.xml.parsers.parserConfiguration异常；
导入javax.xml.xpath.xpath；
导入javax.xml.xpath.XPathConstants；
导入javax.xml.xpath.XPathFactory；
导入org.w3c.dom.Document；
导入org.w3c.dom.Node；
导入org.w3c.dom.NodeList；
导入javax.xml.xpath.XPathExpression；
导入javax.xml.xpath.XPathExpressionException；
导入org.xml.sax.InputSource；
导入org.xml.sax.SAXException；
公共类摘录{
公共静态void main（字符串[]args）抛出ParserConfiguration异常、SAXException、IOException、XPathExpressionException{
//问题57876359
DocumentBuilderFactory工厂=DocumentBuilderFactory.newInstance（）；
factory.setNamespaceAware（true）；
DocumentBuilder=factory.newDocumentBuilder（）；
字符串xml=新字符串（“\r\n”+
“\r\n”+
“content1\r\n”+
“content2\r\n”+
“内容3\r\n”+
"");
InputSource is=新的InputSource（新的StringReader（xml））；
文档doc=builder.parse（is）；
XPathFactory XPathFactory=XPathFactory.newInstance（）；
XPath=xpathfactory.newXPath（）；
int nodes=doc.getChildNodes（）.getLength（）；
NodeList NodeList=doc.getChildNodes（）；
//获取根节点
Node firstNode=nodeList.item（0）；
//要获取childs元素0…元素n
NodeList elementNodes=firstNode.getChildNodes（）；
//最后一个节点是文本节点
节点lastInnerNode=elementNodes.item（elementNodes.getLength（）-2）；
//提取最后一个标记的索引
字符串lastInnerNodeName=lastInnerNode.getNodeName（）；
int lastNodeIndex=Integer.parseInt（lastInnerNodeName.substring（lastInnerNodeName.length（）-1，lastInnerNodeName.length（））；
XPathExpression XPathExpression；
//提取每一个内容
对于（int i=1；我建议阅读这个主题：我认为不需要正则表达式，在这里通过对非空字符串应用check就可以了。是的，也可以。但是您提到了正则表达式，所以我在示例中使用了正则表达式。嗯，可以应用“匹配器”吗给定正则表达式中的功能？我希望只使用返回期望值的表达式，而不需要更多的代码行（匹配器、分组）。
String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
                            .filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class Extract {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
        // Q 57876359

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + 
                "<elements>\r\n" + 
                "   <element1>content1</element1>\r\n" + 
                "   <element2>content2</element2>\r\n" + 
                "   <element3>content3</element3>\r\n" + 
                "</elements>");
        InputSource is = new InputSource(new StringReader(xml));
        Document doc = builder.parse(is);   
        XPathFactory xpathfactory = XPathFactory.newInstance();
        XPath xpath = xpathfactory.newXPath();

        int nodes = doc.getChildNodes().getLength();    

        NodeList nodeList = doc.getChildNodes();    
        //To get <elements> root node
        Node firstNode = nodeList.item(0);

        //To get childs element0...elementN
        NodeList elementNodes = firstNode.getChildNodes();

        //Last node is a text node
        Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);

        //To extract index of last tag
        String lastInnerNodeName = lastInnerNode.getNodeName(); 
        int lastNodeIndex =  Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));

        XPathExpression xpathexpression;

        //To extract every content
        for (int i = 1; i <= lastNodeIndex; i++) {
            xpathexpression = xpath.compile("//element"+i+"/text()");
            Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
            String texto = (String) result;
            System.out.println("Item: "+texto);
        }
  }