Android 提取网页的一部分_Android

Android 提取网页的一部分

android

Android 提取网页的一部分,android,Android,我正在Android上做一个应用程序我有一个字符串的网页内容（所有的HTML），我需要提取段落（p元素）内的所有文本，带有class=“content” 例如： La la la Le le le Li li li 最好的方法是什么？正则表达式是最好的选择 import java.i

我正在Android上做一个应用程序

我有一个字符串的网页内容（所有的HTML），我需要提取段落（p元素）内的所有文本，带有class=“content”

例如：

<p class="content">La la la</p>
<p class="another">Le le le</p>
<p class="content">Li li li</p>

最好的方法是什么？

正则表达式是最好的选择

import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;


public class Test {
    void readScreen () //reads from server
      {
        try
        {
          URL                url;
          URLConnection      urlConn;
          DataInputStream    dis;

          //Open url
          url = new URL("http://somewebsite.com");

          // Note:  a more portable URL:
          //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");

          urlConn = url.openConnection();
          urlConn.setDoInput(true);
          urlConn.setUseCaches(false);

          dis = new DataInputStream(urlConn.getInputStream());
          String s;

          while ((s = dis.readLine()) != null)
          {
            System.out.println(s); //this is where it reads from the screen
          }
            dis.close();
          }

          catch (MalformedURLException mue) {}
          catch (IOException ioe) {}
        }

    public static void main(String[] args){

        Test thisTest = new Test();
        thisTest.readScreen();

    }
}

首先，感谢您的帮助：）我已经这样做了，我的问题是我不知道如何只提取web的某些部分（在我的例子中，所有带有class=“content”的段落）。我知道我可以在所有行中进行手动搜索，但必须有更好的方法来完成它。下载html文件，然后解析其中的文本可能会更好。您可以使用一些xml实用程序来查找所需的标记。这和我在web和Java上所做的差不多。对不起，我帮不上什么忙了。

import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;


public class Test {
    void readScreen () //reads from server
      {
        try
        {
          URL                url;
          URLConnection      urlConn;
          DataInputStream    dis;

          //Open url
          url = new URL("http://somewebsite.com");

          // Note:  a more portable URL:
          //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");

          urlConn = url.openConnection();
          urlConn.setDoInput(true);
          urlConn.setUseCaches(false);

          dis = new DataInputStream(urlConn.getInputStream());
          String s;

          while ((s = dis.readLine()) != null)
          {
            System.out.println(s); //this is where it reads from the screen
          }
            dis.close();
          }

          catch (MalformedURLException mue) {}
          catch (IOException ioe) {}
        }

    public static void main(String[] args){

        Test thisTest = new Test();
        thisTest.readScreen();

    }
}