You can use the BeautifulSoup library in Python to parse the HTML file and extract the required information. Here's an example code snippet that fulfills your requirement:
\

from bs4 import BeautifulSoup\
\
def get_heading_and_body(html_file):\
    with open(html_file, 'r') as file: \
        soup = BeautifulSoup(file, 'html.parser')\
        \
        # Find the heading text\
        heading = soup.find(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']).text\
        \
        # Find all body text under the heading\
        body_text = []\
        for p_tag in soup.find_all('p'):\
            body_text.append(p_tag.text)\
        \
        return heading, body_text\
\
# Call the function with the HTML file path\
heading, body_text = get_heading_and_body('index.html')\
\
# Print the results\
print(f"Heading: {heading}")\
print("Body Text:")\
for text in body_text: \
    print(text)\
```\
\
Make sure to replace `'index.html'` with the actual path to your HTML file. The code first reads the HTML file and then uses BeautifulSoup to parse the HTML content. It finds the first occurrence of `<h1>` to `<h6>` tags to retrieve the heading text and then finds all `<p>` tags under that heading to extract the body text. Finally, it returns the heading text and a list of body text.
Python BeautifulSoup: Extract Heading and Body Text from HTML

原文地址: https://www.cveoy.top/t/topic/qc8R 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录