Read Data from Statistical Software Files

Learn to read data from statistical software files like SAS, Stata, and SPSS,

Introduction

Before the proliferation of open-source software for data science and analytics, most data practitioners relied on commercial software for statistical analysis. These software systems offer proprietary computational methods for different use cases. Furthermore, these systems come with professional customer support and are backed by extensive documentation and literature validated by experts.

However, the issues of expensive commercial licenses, subpar interfaces, and relatively complicated scripting languages have led to a drop in their usage. Nonetheless, these systems still play a significant role in certain fields, such as clinical research and econometrics. Let’s look at the three most popular statistical software whose files can be read by pandas—SAS, SPSS, and Stata.

Read from SAS files

Data from SAS is commonly stored in the SAS7BDAT binary file format with the extension .sas7bdat. Another SAS file storage format type is XPORT, which has the extension .xpt.

We can use the read_sas() function to load a SAS file (.sas7bdat or .xpt) from the file path and return a pandas DataFrame.

Get hands-on with 1200+ tech skills courses.