How to read delimited data files in R

What is delimited data?

In delimited data, data fields are separated by characters to indicate a structure of columns and rows. This is commonly used to portray data in an unbiased fashion. Any character can be used as a delimiter; however, the comma, tab, and colon are the most widely used, and such data files can be read in R as follows.

Reading delimited data files in R

The read.table() function in R reads a file into a data frame. The file can be comma-delimited, tab-delimited, or any other delimiter given by the sep= argument. If the argument header= is set to TRUE, the first row is used as the column name.

Comma-separated values aka CSVs:

  • Use commas to separate the data fields.
  • A newline character marks the end of each row.
  • It has the file extension .csv.
C1,C2,C3
1,A,a
2,B,b
3,C,c
4,D,d
5,E,e
6,F,f
7,G,g
8,H,h

That can be read in R as follows:

main.r
data.csv
data<-read.table('data.csv',
sep=',',
header=TRUE)
data

In the above-given code block, the argument sep=',' specifies a COMMA as the variable delimiter.

Tab-separated values aka TSVs:

  • Use tabs to separate the data fields.
  • A newline character marks the end of each row.
  • It has the file extension .tsv.
food   calories
Apple        95
Banana      111
Cherries      4
Dates        20
Grapes      104
Lime         20
Mango       202
Orange       62

That can be read in R as follows:

main.r
data.tsv
data<-read.table('data.tsv',
sep='\t',
header=TRUE)
data

In the above-given code block, the argument sep='\t' specifies a TAB as the variable delimiter.

Commas are common in data, while tab characters resemble spaces, increasing confusion for human editing. As a result, any character that isn’t often used in data is commonly used to delimit files.

For example, the pipe character | is a popular way to delimit fields that can be read in R as follows:

color|weight
red|6
green|1
blue|5
cyan|3
yellow|2
purple|7
pink|4
brown|8
gold|9
main.r
data.txt
data<-read.table('data.txt',
sep='|',
header=TRUE)
data

Similarly, one can read any x-delimited file in R by specifying its delimiter as: sep='x' (where x is the delimiter).

Summary

File

R Command

Comma-delimited (.CSV)

read.table(<filename>, sep=',', header=TRUE)

Tab-delimited (.TSV)

read.table(<filename>, sep='\t', header=TRUE)

Pipe-delimited (.TXT)

read.table(<filename>, sep='|', header=TRUE)

Any x-delimited (.TXT)

read.table(<filename>, sep='x', header=TRUE)

Copyright ©2024 Educative, Inc. All rights reserved