Java 仅从字符串标记中提取值
我试图从一个给定的字符串中提取值,该字符串可能包含许多具有该模式的标记,例如Java 仅从字符串标记中提取值,java,regex,string,Java,Regex,String,我试图从一个给定的字符串中提取值,该字符串可能包含许多具有该模式的标记,例如 <element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on. 我收到的不是这个: Item: Item: content1 Item: Item: content2 我只想使用一个liner-one m
<element1>content1</element1><element2>content2</element2><element3>content3</element3>... and so on.
我收到的不是这个:
Item:
Item: content1
Item:
Item: content2
我只想使用一个liner-one magicregexp除去这些空元素。我的意思是——在给定的字符串上,我将这个表达式作为一行应用,然后神奇地在数组中接收到我的期望值——而无需进一步处理或分组。
甚至可以实现吗?您可以使用捕获组反向引用和惰性量词来动态获取所有内容:
<(element\d+)>(.*?)<\/\1>
您可以使用将我的正则表达式适配到Java代码中
免责声明:Regex绝对是错误的工具,您应该仔细研究XPath,但如果您不介意被边缘情况绊倒,这是一个快速而肮脏的解决方案。使用现有的代码片段,您可以使用一些Regex应用以下代码来实现它。 看看下面的代码 导入正则表达式util
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String pattern = "\\w+";
Pattern r = Pattern.compile(pattern);
String tempString =
"<element1>content1</element1><element2>content2</element2>";
String[] tempArray = tempString.split ("(<\\w+>)|(</\\w+>)");
for (String item:tempArray)
{
Matcher matcher = r.matcher(item);
//check if the pattern matches
if(matcher.matches()){
System.out.println ("Item: " + item);
}
}
import java.util.regex.Matcher;
导入java.util.regex.Pattern;
字符串模式=“\\w+”;
Pattern r=Pattern.compile(Pattern);
字符串tempString=
“内容1内容2”;
字符串[]tempArray=tempString.split(“()|()”);
for(字符串项:tempArray)
{
匹配器匹配器=r.匹配器(项目);
//检查模式是否匹配
if(matcher.matches()){
System.out.println(“项:“+项”);
}
}
希望这有帮助
谢谢如果您可以使用streams,您可以使用您的正则表达式,只需过滤掉空的正则表达式:
String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
.filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));
String tempString=“content1content2”;
String[]tempArray=Pattern.compile(“()|()”).splitAsStream(tempString)
.filter(s->!s.isEmpty()).toArray(字符串[]:::新建);
System.out.println(Arrays.toString(tempArray));
另一个解决方案是使用Xpath:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Extract {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
// Q 57876359
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
"<elements>\r\n" +
" <element1>content1</element1>\r\n" +
" <element2>content2</element2>\r\n" +
" <element3>content3</element3>\r\n" +
"</elements>");
InputSource is = new InputSource(new StringReader(xml));
Document doc = builder.parse(is);
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
int nodes = doc.getChildNodes().getLength();
NodeList nodeList = doc.getChildNodes();
//To get <elements> root node
Node firstNode = nodeList.item(0);
//To get childs element0...elementN
NodeList elementNodes = firstNode.getChildNodes();
//Last node is a text node
Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);
//To extract index of last tag
String lastInnerNodeName = lastInnerNode.getNodeName();
int lastNodeIndex = Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));
XPathExpression xpathexpression;
//To extract every content
for (int i = 1; i <= lastNodeIndex; i++) {
xpathexpression = xpath.compile("//element"+i+"/text()");
Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
String texto = (String) result;
System.out.println("Item: "+texto);
}
}
import java.io.IOException;
导入java.io.StringReader;
导入javax.xml.parsers.DocumentBuilder;
导入javax.xml.parsers.DocumentBuilderFactory;
导入javax.xml.parsers.parserConfiguration异常;
导入javax.xml.xpath.xpath;
导入javax.xml.xpath.XPathConstants;
导入javax.xml.xpath.XPathFactory;
导入org.w3c.dom.Document;
导入org.w3c.dom.Node;
导入org.w3c.dom.NodeList;
导入javax.xml.xpath.XPathExpression;
导入javax.xml.xpath.XPathExpressionException;
导入org.xml.sax.InputSource;
导入org.xml.sax.SAXException;
公共类摘录{
公共静态void main(字符串[]args)抛出ParserConfiguration异常、SAXException、IOException、XPathExpressionException{
//问题57876359
DocumentBuilderFactory工厂=DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder=factory.newDocumentBuilder();
字符串xml=新字符串(“\r\n”+
“\r\n”+
“content1\r\n”+
“content2\r\n”+
“内容3\r\n”+
"");
InputSource is=新的InputSource(新的StringReader(xml));
文档doc=builder.parse(is);
XPathFactory XPathFactory=XPathFactory.newInstance();
XPath=xpathfactory.newXPath();
int nodes=doc.getChildNodes().getLength();
NodeList NodeList=doc.getChildNodes();
//获取根节点
Node firstNode=nodeList.item(0);
//要获取childs元素0…元素n
NodeList elementNodes=firstNode.getChildNodes();
//最后一个节点是文本节点
节点lastInnerNode=elementNodes.item(elementNodes.getLength()-2);
//提取最后一个标记的索引
字符串lastInnerNodeName=lastInnerNode.getNodeName();
int lastNodeIndex=Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1,lastInnerNodeName.length());
XPathExpression XPathExpression;
//提取每一个内容
对于(int i=1;我建议阅读这个主题:我认为不需要正则表达式,在这里通过对非空字符串应用check就可以了。是的,也可以。但是您提到了正则表达式,所以我在示例中使用了正则表达式。嗯,可以应用“匹配器”吗给定正则表达式中的功能?我希望只使用返回期望值的表达式,而不需要更多的代码行(匹配器、分组)。
String tempString = "<element1>content1</element1><element2>content2</element2>";
String[] tempArray = Pattern.compile("(<\\w+>)|(</\\w+>)").splitAsStream(tempString)
.filter(s -> !s.isEmpty()).toArray(String[]::new);
System.out.println(Arrays.toString(tempArray));
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Extract {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
// Q 57876359
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
String xml = new String("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" +
"<elements>\r\n" +
" <element1>content1</element1>\r\n" +
" <element2>content2</element2>\r\n" +
" <element3>content3</element3>\r\n" +
"</elements>");
InputSource is = new InputSource(new StringReader(xml));
Document doc = builder.parse(is);
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
int nodes = doc.getChildNodes().getLength();
NodeList nodeList = doc.getChildNodes();
//To get <elements> root node
Node firstNode = nodeList.item(0);
//To get childs element0...elementN
NodeList elementNodes = firstNode.getChildNodes();
//Last node is a text node
Node lastInnerNode = elementNodes.item(elementNodes.getLength()-2);
//To extract index of last tag
String lastInnerNodeName = lastInnerNode.getNodeName();
int lastNodeIndex = Integer.parseInt(lastInnerNodeName.substring(lastInnerNodeName.length()-1, lastInnerNodeName.length()));
XPathExpression xpathexpression;
//To extract every content
for (int i = 1; i <= lastNodeIndex; i++) {
xpathexpression = xpath.compile("//element"+i+"/text()");
Object result = xpathexpression.evaluate(doc, XPathConstants.STRING);
String texto = (String) result;
System.out.println("Item: "+texto);
}
}