Related Tags

What is the HTML parser in Python?


The HTML parser is a structured markup processing tool. It defines a class called HTMLParser, ​which is used to parse HTML files. It comes in handy for web crawling​.

svg viewer


  • HTMLParser.feed(data): used to input data to the HTML parser.

  • HTMLParser.handle_starttag(tag, attrs): used to handle the start tags in the HTML. The parameter tag contains the opening tag, and the attrs parameter contains the attribute of that tag.

  • HTMLParser.handle_endtag(tag, attrs): used to handle the end tags in the HTML. The parameter tag contains the closing tag, ​and the attrs parameter contains the attribute of that tag.

  • HTMLParser.handle_data(data): used to handle the data contained between the HTML tags.

  • HTMLParser.handle_comment(data):used to handle HTML comments.


The functions of HTMLParser will be overridden​ to provide the desired functionality. Note that the class Parser() inherits from the HTMLParser class.

from html.parser import HTMLParser
class Parser(HTMLParser):
  # method to append the start tag to the list start_tags.
  def handle_starttag(self, tag, attrs):
    global start_tags
    # method to append the end tag to the list end_tags.
  def handle_endtag(self, tag):
    global end_tags
  # method to append the data between the tags to the list all_data.
  def handle_data(self, data):
    global all_data
  # method to append the comment to the list comments.
  def handle_comment(self, data):
    global comments
start_tags = []
end_tags = []
all_data = []
comments = []
# Creating an instance of our class.
parser = Parser()
# Poviding the input.
            'I am a fan of frozen yoghurt.</p><'
            '/body><!--My first webpage--></html>')
print("start tags:", start_tags)
print("end tags:", end_tags)
print("data:", all_data)
print("comments", comments)

To learn more, refer to the official documentation.

View all Courses
Related Courses
Related Courses
View all Courses

Keep Exploring