Learn about the preparation of data using R.

There are two things we must do to before we move forward. First, we make adequate logistic preparations, such as setting up a project folder, writing and executing a program in R, and locating and downloading necessary data and codebook files. Second, we must develop a conceptual roadmap for the different possible tasks to be completed in the next section and their interconnections. We discuss each of these in detail.

Logistic preparations

As a matter of principle and good practice, we should always obtain the codebook or readme file for any dataset we use. The codebook or readme file should include important information such as the format of a dataset, the sample information (e.g., year and country coverage), the unit of analysis (for example, country year, individual respondent, etc.), the number of variables, the number of observations, variable names, variable definitions and measurements, variable types, value labels if any, data sources, and sometimes, descriptive statistics for the variables (mean, maximum, minimum, variance or standard deviation, number of observations). A codebook is important for three reasons:

  • First, we ensure a dataset suits our purpose in terms of variable and sample coverage.
  • Second, we choose the right command or function to import a dataset into R and verify that it’s read into R correctly.
  • Finally, we refer to the codebook when preparing data for analysis.

R doesn’t do a very good job in handling variable labels. So, we need the codebook to know what variables we have and how to manage them.

Get hands-on with 1200+ tech skills courses.