Create Well-documented and Annotated Code

One of the most important things you can do is write orderly, well-annotated code that not only functions well but explains what is happening and why it is happening and does so in easy-to-read and understandable language. This idea was first introduced by computer scientist Donald Knuth and is known as literate programming. Literate programming is the process of interspersing your computer code, in this case, R code, with plain-language descriptions of what the code is doing. This allows a reader to have a fully formed idea of what is going on. In R, you do this with annotation, which is simply the process of leaving notes within the code that is not actually the code itself. It’s like you are Hansel and Gretel getting dragged into the woods: you want to leave plenty of clues for your future self (or others) to be able to discern the trail you took.

Write good code for yourself and others

For any bit of R code you write, you should consider that you are writing for three audiences:

  1. Your future self
  2. Your collaborators
  3. Everyone else that might look at your code one day

1. Your future self: Seldom will you have the opportunity to sit down with a dataset and analyze it from start to finish in a single sitting. It is rare that you will even have the opportunity to work on it on consecutive days where what you did yesterday is still fresh in your mind. What is more realistic is that you work on something for some period of time (hours, days, maybe even weeks if you are really lucky!) and then have to put it down for some time because you are distracted by other tasks (teaching, other research demands, manuscript revisions, parenting, a pandemic, etc.). By the time you come back to your code, even a week later, you will likely have to invest some substantial time getting back to where you were. Writing good, clear code will reduce that restart time considerably.

2. Writing for collaborators: If you are, or are planning to be, a professional scientist, you are unlikely to work exclusively by yourself. There will be times when you collaborate with others. Maybe it’s your graduate advisor, maybe a colleague at another institution. Whatever the scenario, it means you might be responsible for analyzing or organizing some set of data, then sharing it with others. If that’s the case, you want to make sure when you send your code; it is clear what you did and why you did it. Imagine the embarrassment of your collaborator sending you question after question trying to figure out what your code means!

3. Writing for folks in the future who might want to see your code: Increasingly, it is necessary to post both the data that go into a scientific article and also the code that was used to analyze it. This is a tremendous step towards increasing transparency in science and is to be applauded for sure. But it also means that some stranger might look at your code a month or a year or more down the road, even after you thought you were long done with it all. Thus, just like writing for your future self or your collaborators, you want to make sure that your code is clean and organized, and well-annotated.

Working from the script window

The biggest mistake that most new R users make is to just type commands into the command prompt. The problem with this is that once you hit enter the command is gone. If you hit the up-arrow, R will scroll through the previously executed commands, but aside from this what you typed is gone, and it cannot be edited! It is of course reasonable to run lines from the command line from time to time, but it is much better to work from a script window.

The script window allows you to easily save and edit your code, and execute one or multiple lines of code at once. To open a blank script window, go to the “File” menu and click on “New Document,” or just hit command-N (Mac) or control-N (PC) on your keyboard.

In the script window, you can type in your commands and then execute them by hitting command-enter (Mac) or control-R (PC). This means you type code into the script window, and then the program sends the line of code to the command prompt for you. Do not cut and paste code from the script window to the command prompt; that is a waste of time. You can also highlight multiple lines of code and execute them all at once. To save your code simply go to the “File” menu and save as you would any other file (or just hit command-S or control-S on your keyboard). A script allows you to edit, run, and tweak your code, save it, return to it later, send it to collaborators or mentors, and so on. Anything you think will want to run more than once, or that you might want to edit, should be typed into a script window (which is pretty much everything).

Get hands-on with 1200+ tech skills courses.