Python: Extract Paragraphs by Heading in Markdown HTML

To achieve this, you can use the 'BeautifulSoup' library in Python to parse the HTML, and the 'markdown' library to convert it to HTML before parsing. Here's an example code that retrieves the paragraphs based on their headings from a local HTML file written in Markdown:

import markdown
from bs4 import BeautifulSoup

# Read the Markdown file
with open('example.md', 'r') as file:
    markdown_text = file.read()

# Convert Markdown to HTML
html_text = markdown.markdown(markdown_text)

# Parse the HTML
soup = BeautifulSoup(html_text, 'html.parser')

# Find paragraphs by heading
headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
paragraphs = [heading.find_next_sibling('p') for heading in headings]

# Get the text of each paragraph
paragraph_texts = [paragraph.get_text() if paragraph else '' for paragraph in paragraphs]

# Print the paragraph texts
for text in paragraph_texts:
    print(text)

Make sure to replace 'example.md' with the path to your local HTML file. This code assumes that the paragraphs are directly following the headings in the HTML structure.

Python: Extract Paragraphs by Heading in Markdown HTML