Managing variables or columns of a data frame often involves creating new variables, renaming variable names, recoding variables in terms of variable values, and creating variable labels. This section relies heavily on the earlier discussion on variable types.

Create new variables

To conduct data analysis to answer a research question, we often create new variables. Here, we provide some examples on how to create numeric, character, and factor variables. We also discuss how to construct leading, lagging, and growth rate variables. We also show how to compute a new variable representing group mean.

Numeric variables: Real investment per capita and total real investment

We begin with some simple examples of numeric variables. Suppose we want to use pwt7 to create two new variables on investment: real investment per capita and total real investment in a country. For this task, the relevant variables include ki, rgdpl, and POP, which are defined in the readme file as follows:

  • The variable ki is “Investment Share of PPP Converted GDP Per Capita at 2005 constant prices rgdpl in percent.”
  • The variable rgdpl is "PPP Converted GDP Per Capita (Laspeyres), derived from growth rates of c, g, i, at 2005 constant prices (2005 International dollar per person).’’
  • The variable POP is “Population (in thousands).”

Therefore, real investment per capita (in 2005 international dollars) should be computed as rgdplki/100, and total real investment (in 2005 international dollar) should be computed as rgdplPOP ∗ 1000 ∗ ki/100 = rgdplPOPki ∗ 10. The R code for creating these two variables is as follows:

Get hands-on with 1200+ tech skills courses.