Project 2: Parsing data from a HTML file with Python and REGEX

In this project we use REGEX to find some values from a HTML file

In this project, we want to extract tabular information from a HTML file (see below). Our goal is to extract information available between and except the first numerical index (1..6).

Consider the data.html file below:

<html>
<head>
<style>
table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
}
th, td {
    padding: 5px;
}
th {
    text-align: left;
}
</style>
</head>
<body>
<table style="width:100%">
<tr align="center"><td>1</td> <td>England</td> <td>English</td></tr>
<tr align="center"><td>2</td> <td>Japan</td> <td>Japanese</td></tr>
<tr align="center"><td>3</td> <td>China</td> <td>Chinese</td></tr>
<tr align="center"><td>4</td> <td>Middle-east</td> <td>Arabic</td></tr>
<tr align="center"><td>5</td> <td>India</td> <td>Hindi</td></tr>
<tr align="center"><td>6</td> <td>Thailand</td> <td>Thai</td></tr>
</table>
</body>
</html>

If we load the HTML file onto a browser it should look like below:

Get hands-on with 1200+ tech skills courses.