Deep learning

While I generally recommend starting with simple approaches when building model pipelines, deep learning is becoming a popular tool for data scientists to apply to new problems.

It’s great to explore this capability when tackling new problems, but scaling up deep learning in data science pipelines presents a new set of challenges. For example, PySpark does not currently have a native way of distributing the model application phase to big data.But fortunately, there are plenty of books for getting started with deep learning in Python, such as Deep Learning with Python, Chollet, 2017.

In this lesson, we’ll repeat the same task from the prior lessons, which is predicting which users are likely to buy a game based on their prior purchases. Instead of using a shallow learning approach to predict propensity scores, we’ll use the Keras framework to build a neural network for predicting this outcome.

In our pre-configured execution environment, the keras and tensorflow libraries are already installed. To skip local installation instructions and jump to building the model, click here.

Installing Keras

Keras is a general framework for working with deep learning implementations. We can install these dependencies from the command line:

Get hands-on with 1200+ tech skills courses.