如何仅获取<；html>；数据</html>；从互联网使用java？_Java_Html

如何仅获取<；html>；数据</html>；从互联网使用java？

java html

如何仅获取<；html>；数据</html>；从互联网使用java？,java,html,Java,Html,我使用以下代码从internet检索数据，但我也得到HTTP头，这对我来说是无用的 URL url = new URL(webURL); URLConnection conn = url.openConnection(); BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream())); String inputLine

我使用以下代码从internet检索数据，但我也得到HTTP头，这对我来说是无用的

URL url = new URL(webURL);
            URLConnection conn = url.openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
            String inputLine;

            while ((inputLine = in.readLine()) != null) 
                System.out.println(inputLine);
            in.close();

如何只获取html数据而不获取任何标题或任何内容

关于

您想将html翻译成文本吗？如果是这样，您可以使用

org.htmlparser.*

。在

进行loo您可以解析完整数据以搜索字符串，并仅接受html标记之间的数据

使用以下方法检索和解析文档：

TagSoup和SAX2DOM包包括：

import org.ccil.cowan.tagsoup.Parser;
import org.apache.xalan.xsltc.trax.SAX2DOM;

将内容写入

系统.out

：

TransformerFactory tFact = TransformerFactory.newInstance();
Transformer transformer = tFact.newTransformer();
Source source = new DOMSource(doc);
Result result = new StreamResult(System.out);
transformer.transform(source, result);

这些都来自导入javax.xml.transform.

您正在使用URLConnecton检索正确的数据。但是，如果您想读取/访问特定的html标记，则必须使用html解析器。我建议你使用

例如：

org.jsoup.nodes.Document doc = org.jsoup.Jsoup.connect("http://your_url/").get();
org.jsoup.nodes.Element head=doc.head(); // <head> tag content
org.jsoup.nodes.Element body=doc.body(); // <body> tag content

System.out.println(doc.text()); // Only text inside the <html>

org.jsoup.nodes.Document doc=org.jsoup.jsoup.connect（“http://your_url/）.get（）；
org.jsoup.nodes.Element head=doc.head（）；//标签内容
org.jsoup.nodes.Element body=doc.body（）；//标签内容
System.out.println（doc.text（））；//仅文本在

您能显示输出吗？你不应该用这个来获取标题。标题由conn.getHeader（）获取（或类似的内容）。InputStream应该只是请求中的数据。请求的正文中可能有类似标题的内容。请使用html解析器-jsoup。如果您回答问题，请使用完整的句子。对不起，我是这个网站的新手。我会提高我的回答技巧。

org.jsoup.nodes.Document doc = org.jsoup.Jsoup.connect("http://your_url/").get();
org.jsoup.nodes.Element head=doc.head(); // <head> tag content
org.jsoup.nodes.Element body=doc.body(); // <body> tag content

System.out.println(doc.text()); // Only text inside the <html>