Organizing our code

Before we dive into coding, it’s important to discuss how to organize the code. Novice programmers tend to write everything in one file, no matter how large the file gets. While this might work, it’s not a good idea for readability or maintainability. A typical ML project can require thousands of lines of code, including those for data processing, model training, and other tasks.

Moreover, multiple people work on different aspects of a project. Some of the code that a data scientist writes may be used in more than one project. In these conditions, it’s useful to have code organized in a way that’s logically consistent and amenable to collaborative development. How can we organize our code to make it logically sound and readable?

We already started the process when we decided on our directory structure. Here’s our directory tree.

ml_pipeline_tutorial/ ...

Introduction

Getting Started

Structuring the ML Pipeline

Directed Acyclic Graphs (DAGs)

The ML Library

Create Your First Data Pipeline with a Dashboard

The Pipeline Core

Extending the Pipeline

Build a News ETL Data Pipeline Using Python and SQLite

Testing

Deployment

Other Considerations

Wrapping Up

Appendix

Final Assessment

Code Organization

Organizing our code