Create a Web Crawler

Project Description

Build a web crawler based on celery, requests, and tenacity.

Until now in this course, you have learned tools used to develop scalable, efficient, and failure resilient applications. This project will give you a hands-on experience of combining all of these concepts to develop a real-world web crawler.

This project is broken down into three components that can help you progressively build your web crawler.

The details of the three steps are as follows:

Create a celery queue. Create several tasks where each task has to fetch a single web page (using requests). Add each of the tasks to the queue.
You will launch the celery workers that will fetch the task from the queue, process them and store the results in the common store (i.e., results list).
Web crawling is not an error-free task. It is quite possible that some webpages may respond in a non-intended way. In this step, you will incorporate failure handling (using tenacity) in your web crawler so that it will wait and retry a few times before giving up on a certain webpage.

Let’s get started!

1.Scaling

2.CPU Scaling

3.Event Loops

4.Functional Programming

5.Queue-Based Distribution

6.Designing for Failure

Mini Project

7.Project Walkthrough

8.Lock Management

9.Group Membership

10.REST Interfaces

Project

11.Deploying on PaaS

12.Testing Distributed Systems

13.Caching

14.Performance

15.Conclusion

Create a Web Crawler

Project Description