Code Organization

Learn how we will organize our code.

We'll cover the following

Organizing our code

Before we dive into coding, it’s important to discuss how to organize the code. Novice programmers tend to write everything in one file, no matter how large the file gets. While this might work, it’s not a good idea for readability or maintainability. A typical ML project can require thousands of lines of code, including those for data processing, model training, and other tasks.

Moreover, multiple people work on different aspects of a project. Some of the code that a data scientist writes may be used in more than one project. In these conditions, it’s useful to have code organized in a way that’s logically consistent and amenable to collaborative development. How can we organize our code to make it logically sound and readable?

We already started the process when we decided on our directory structure. Here’s our directory tree.

  • ml_pipeline_tutorial/

    • config/

      • projects/

    • data/

    • ml_pipeline/

      • datasets/

      • mixins/

      • models/

      • tests/

    • tests/

Our directory structure dictates the location of code corresponding to the functionalities we have to include in the pipeline. First, we need a top-level code file containing a main section. This is the Python file that we would run on the command line. Let’s call it pipeline.py and create it under the top-level directory ml_pipeline_tutorial. In Linux we can use the touch command to create this file, as shown below.

Get hands-on with 1200+ tech skills courses.