Android 如何从html中提取文本和图像？_Android

Android 如何从html中提取文本和图像？

android

Android 如何从html中提取文本和图像？,android,Android,我用它来提取SPAN标签，并告诉有多少 ublic class HtmlparserExampleActivity extends Activity { String outputtext; TagFindingVisitor visitor; Parser parser = null; private static final String TAG = "TVGuide"; private static final boolean D = true; TextView output

我用它来提取SPAN标签，并告诉有多少

ublic class HtmlparserExampleActivity extends Activity {
String outputtext;
  TagFindingVisitor visitor;
  Parser parser = null;
private static final String TAG = "TVGuide";

private static final boolean D = true;

TextView outputTextView;

/** Called when the activity is first created. */
@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.main);




     outputTextView = (TextView) findViewById(R.id.outputTextView);


    if(D) Log.e(TAG, "+++ ON CREATE +++");


    try {
        Log.e(TAG, "In doInBackground");

        parser = new Parser ("http://www.johandegraeve.net/android");

            String tags[] = { "SPAN" };

          visitor = new TagFindingVisitor(tags);

            try {

                parser.visitAllNodesWith (visitor);

                outputtext =  "there are " + visitor.getTags(0).length + " SPAN nodes.\n";

                for (int i = 0;i<visitor.getTags(0).length;i++) {

                    outputtext = outputtext + visitor.getTags(0)[i].toHtml();  


                }

              outputTextView.setText(outputtext);

            } catch (ParserException e) {

                if(D) Log.e(TAG, "Exception in +++ ON CREATE +++ \n" +

                        "parser.visitAllNodesWith (visitor) failed\n" +

                        e.toString());

            }

    } catch (ParserException e1) {

        if(D) Log.e(TAG, "Exception in +++ ON CREATE +++ \n" +

                "creation of parser failed\n" +

                e1.toString());

}

我试过这个，但没用。也许我做错了什么。文本视图中未显示任何文本。但是我在网页的调试中看到了一些标签

您可以尝试JSoup解析器

您可以尝试使用JSoup解析器

使用

JSoup

解析器并通过

标记来解析元素。JSoup对于此类小型部件非常有效且简单
编辑：我不知道你的情况，但我会试试：
Document doc = Jsoup.connect("someurl").get();
        Log.i("DOC", doc.toString().toString());
        Elements elementsHtml = doc.getElementsByTag("tr");  <--- here you specify the html tag where is the text is located
        String[] temp1 = new String[99];    
        int i =0;
        for(Element element: elementsHtml)
        {

            temp1[i] = element.text();
            i++;

        }
//After you have collected all the elements, you set the textview

这是您要分析的文本吗
08-11 21:08:02.095:INFO/PARSED ELEMENTS（200）：这是一个时代的终结，《哈利波特与死亡圣器-第二部分》本周上映，结束了这部跨越八部电影和十年的史诗系列电影。为了纪念这一时刻，我们决定再看一看剧中精彩的角色，再次排名前25名。自从几年前我们第一次运行这个列表以来，您会注意到它的一些调整和更改，因为我们检查并重新评估了我们所看到的所有角色。在我们展示我们的选择之前，先简单介绍一下选择过程…
使用JSoup
解析器并通过标记来解析元素。JSoup对于此类小型部件非常有效且简单
编辑：我不知道你的情况，但我会试试：
Document doc = Jsoup.connect("someurl").get();
        Log.i("DOC", doc.toString().toString());
        Elements elementsHtml = doc.getElementsByTag("tr");  <--- here you specify the html tag where is the text is located
        String[] temp1 = new String[99];    
        int i =0;
        for(Element element: elementsHtml)
        {

            temp1[i] = element.text();
            i++;

        }
//After you have collected all the elements, you set the textview

这是您要分析的文本吗
08-11 21:08:02.095:INFO/PARSED ELEMENTS（200）：这是一个时代的终结，《哈利波特与死亡圣器-第二部分》本周上映，结束了这部跨越八部电影和十年的史诗系列电影。为了纪念这一时刻，我们决定再看一看剧中精彩的角色，再次排名前25名。自从几年前我们第一次运行这个列表以来，您会注意到它的一些调整和更改，因为我们检查并重新评估了我们所看到的所有角色。在我们公布我们的选择之前，先简单介绍一下选择过程…
我认为有一个.text（）
方法。选中此复选框，只要您指定url，并且HTML响应包含您在.getElementsByTag（标记）中指定的标记，代码就会生成结果方法。当然，一旦你提取了文本，你就可以在你的文本视图中使用它了。我想你应该试着获取主要的文章，这是它的html标签
。在这种情况下，您可以使用doc.getElementById（id）
whereid=“main article content”
或者另一个好方法是使用doc.getElementsByAttributeValue（key，value）
其中key=“id”
和value=“main article content”
是的，直截了当地说：D.getElementById（）
[和其他类似方法]只接受字符串类型的参数。temp1
是字符串数组，因此，您需要记录数组的所有元素，然后设置所需的数组元素，例如myTextView.setText（temp1[0]）[假设所需内容位于temp1
]的第一个位置]。记住，还有另一种方法可以做到这一点，我刚才给了你一个例子。我认为有一个.text（）
方法。选中此复选框，只要您指定url，并且HTML响应包含您在.getElementsByTag（标记）中指定的标记，代码就会生成结果方法。当然，一旦你提取了文本，你就可以在你的文本视图中使用它了。我想你应该试着获取主要的文章，这是它的html标签
。在这种情况下，您可以使用doc.getElementById（id）
whereid=“main article content”
或者另一个好方法是使用doc.getElementsByAttributeValue（key，value）
其中key=“id”
和value=“main article content”
是的，直截了当地说：D.getElementById（）
[和其他类似方法]只接受字符串类型的参数。temp1
是字符串数组，因此，您需要记录数组的所有元素，然后设置所需的数组元素，例如myTextView.setText（temp1[0]）[假设所需内容位于temp1]的第一个位置]。请记住，还有另一种方法可以做到这一点，我只是给了你一个例子。看我的编辑…我希望这是你想要的…看我的编辑…我希望这是你想要的。。。
try{
        Document doc = Jsoup.connect("http://movies.ign.com/articles/100/1002569p1.html").get();

        Elements elementsHtml = doc.getElementsByAttributeValue("id", "main-article-content");

        for(Element element: elementsHtml)
        {
            Log.i("PARSED ELEMENTS:",URLDecoder.decode(element.text(), HTTP.UTF_8));
                 outputTextView.setText(element.text());


        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }