Java 从 HTML 文件中提取内容：XPath、字符串、正则表达式和 JSON 示例

以下是 Java 根据 XPath、字符串、正则表达式、JSON 提取 HTML 文件中内容的示例代码：

使用 XPath 提取 HTML 文件中的内容：

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class HtmlParser {
    public static void main(String[] args) throws Exception {
        String html = "<html><body><div><h1>Title</h1><p>Content</p></div></body></html>";

        Document doc = Jsoup.parse(html);
        Elements elements = doc.select("div");

        for (Element element : elements) {
            String title = element.select("h1").text();
            String content = element.select("p").text();

            System.out.println("Title: " + title);
            System.out.println("Content: " + content);
        }
    }
}

使用字符串提取 HTML 文件中的内容：

public class HtmlParser {
    public static void main(String[] args) {
        String html = "<html><body><h1>Title</h1><p>Content</p></body></html>";

        String title = html.substring(html.indexOf("<h1>") + 4, html.indexOf("</h1>"));
        String content = html.substring(html.indexOf("<p>") + 3, html.indexOf("</p>"));

        System.out.println("Title: " + title);
        System.out.println("Content: " + content);
    }
}

使用正则表达式提取 HTML 文件中的内容：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HtmlParser {
    public static void main(String[] args) {
        String html = "<html><body><h1>Title</h1><p>Content</p></body></html>";

        Pattern pattern = Pattern.compile("<h1>(.*?)</h1>");
        Matcher matcher = pattern.matcher(html);

        if (matcher.find()) {
            String title = matcher.group(1);
            System.out.println("Title: " + title);
        }

        pattern = Pattern.compile("<p>(.*?)</p>");
        matcher = pattern.matcher(html);

        if (matcher.find()) {
            String content = matcher.group(1);
            System.out.println("Content: " + content);
        }
    }
}

使用 JSON 提取 HTML 文件中的内容：

import org.json.JSONObject;
import org.json.XML;

public class HtmlParser {
    public static void main(String[] args) {
        String xml = "<html><body><h1>Title</h1><p>Content</p></body></html>";
        JSONObject jsonObject = XML.toJSONObject(xml);

        String title = jsonObject.getJSONObject("html").getJSONObject("body").getString("h1");
        String content = jsonObject.getJSONObject("html").getJSONObject("body").getString("p");

        System.out.println("Title: " + title);
        System.out.println("Content: " + content);
    }
}

这些示例代码分别演示了使用 XPath、字符串、正则表达式和 JSON 提取 HTML 文件中的内容的方法。你可以根据自己的需求选择适合的方法来提取内容。

Java 从 HTML 文件中提取内容：XPath、字符串、正则表达式和 JSON 示例