Java小说爬虫：从网站抓取小说内容并存储到数据库

以下是一个使用Java实现爬取小说内容并存储到数据库的例子：

爬取小说内容的实现

首先，我们需要使用Java中的Jsoup库来实现爬取小说内容。下面是一个简单的方法，可以根据指定的小说章节链接获取章节内容：

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public String getChapterContent(String chapterUrl) throws Exception {
    Document doc = Jsoup.connect(chapterUrl).get();
    Element contentElement = doc.selectFirst('div#content');
    return contentElement.text();
}

这个方法会返回指定章节的文本内容。我们可以将它保存到数据库中。

数据库存储的实现

在这个例子中，我们将使用MySQL数据库存储小说的章节内容。我们需要使用一个Java的数据库驱动程序，例如MySQL Connector/J。

下面是一个简单的方法，可以将章节内容保存到数据库中：

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;

public void saveChapterContent(String chapterTitle, String chapterContent) throws Exception {
    String jdbcUrl = 'jdbc:mysql://localhost:3306/novel';
    String username = 'root';
    String password = 'password';
    
    Connection conn = DriverManager.getConnection(jdbcUrl, username, password);
    PreparedStatement stmt = conn.prepareStatement('INSERT INTO chapters (title, content) VALUES (?, ?)');
    stmt.setString(1, chapterTitle);
    stmt.setString(2, chapterContent);
    stmt.executeUpdate();
    
    stmt.close();
    conn.close();
}

这个方法会将章节标题和内容保存到一个名为“chapters”的表中。

接口的实现

为了使这个程序更有可扩展性，我们可以使用接口来定义爬取小说内容和保存到数据库的方法。

首先，我们定义一个“NovelCrawler”接口，它包含一个获取小说章节内容的方法：

public interface NovelCrawler {
    public String getChapterContent(String chapterUrl) throws Exception;
}

然后，我们定义一个“NovelDatabase”接口，它包含一个将小说章节内容保存到数据库的方法：

public interface NovelDatabase {
    public void saveChapterContent(String chapterTitle, String chapterContent) throws Exception;
}

最后，我们实现这些接口的方法，将它们组合在一起，将小说内容从网站爬取并保存到数据库中：

public class NovelCrawlerImpl implements NovelCrawler {
    public String getChapterContent(String chapterUrl) throws Exception {
        // implementation
    }
}

public class NovelDatabaseImpl implements NovelDatabase {
    public void saveChapterContent(String chapterTitle, String chapterContent) throws Exception {
        // implementation
    }
}

public class NovelScraper {
    private NovelCrawler crawler;
    private NovelDatabase database;
    
    public NovelScraper(NovelCrawler crawler, NovelDatabase database) {
        this.crawler = crawler;
        this.database = database;
    }
    
    public void scrape(String novelUrl) throws Exception {
        // scrape the novel chapters
        for (String chapterUrl : novelChapterUrls) {
            String chapterTitle = getChapterTitle(chapterUrl);
            String chapterContent = crawler.getChapterContent(chapterUrl);
            database.saveChapterContent(chapterTitle, chapterContent);
        }
    }
}

这个程序将小说章节内容从网站爬取并保存到数据库中。我们可以在构造函数中传入具体的实现，使它更加灵活。