Feature #2: Locating Stock Data

Implementing the "Locating Stock Data" feature for our "Stock Scraper" project.

Description

Now, we need to identify which nodes of the website’s DOM tree contain the stock data. The data we are looking for is the dates on which a certain stock price went up or down. Identifying stock data in arbitrary HTML can be hard, so we’ll use the following technique.

Like the previous lesson we’ll traverse the DOM tree, assigning a score to nodes on how likely they are to be a date or a stock percentage based on the text inside of them. To make the process efficient, we also want to limit the DOM subtree that we are processing.

Here’s the scoring criteria for how likely a node is a date:

  • A node whose text starts with a capital letter

  • A node whose text ends in a number

  • A node whose text contains the # symbol

  • A node whose text is under ten characters

Here’s the scoring criteria for how likely a node is a stock percentage:

  • A node whose text is short

  • A node whose text contains a number

  • A node whose text contains the + or - sign

  • A node whose text contains the % sign

After this step, we’ll find two nodes: one node with a high date score and one with a high stock percentage score. We’ll calculate the LCA(Lowest Common Ancestor) of these two nodes. In most cases, the subtree of the LCA node will have all the dates and their respective stock percentages. This saves us time for searching the rest of the DOM tree.

Let’s try to understand this better with an illustration:

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.