In delimited data, data fields are separated by characters to indicate a structure of columns and rows. This is commonly used to portray data in an unbiased fashion. Any character can be used as a delimiter; however, the comma, tab, and colon are the most widely used, and such data files can be read in R as follows.
The read.table()
function in R reads a file into a data frame. The file can be comma-delimited, tab-delimited, or any other delimiter given by the sep=
argument. If the argument header=
is set to TRUE
, the first row is used as the column name.
Comma-separated values aka CSVs:
.csv
.C1,C2,C3
1,A,a
2,B,b
3,C,c
4,D,d
5,E,e
6,F,f
7,G,g
8,H,h
That can be read in R as follows:
data<-read.table('data.csv',sep=',',header=TRUE)data
In the above-given code block, the argument
sep=','
specifies a COMMA as the variable delimiter.
Tab-separated values aka TSVs:
.tsv
.food calories
Apple 95
Banana 111
Cherries 4
Dates 20
Grapes 104
Lime 20
Mango 202
Orange 62
That can be read in R as follows:
data<-read.table('data.tsv',sep='\t',header=TRUE)data
In the above-given code block, the argument
sep='\t'
specifies a TAB as the variable delimiter.
Commas are common in data, while tab characters resemble spaces, increasing confusion for human editing. As a result, any character that isn’t often used in data is commonly used to delimit files.
For example, the pipe character |
is a popular way to delimit fields that can be read in R as follows:
color|weight
red|6
green|1
blue|5
cyan|3
yellow|2
purple|7
pink|4
brown|8
gold|9
data<-read.table('data.txt',sep='|',header=TRUE)data
Similarly, one can read any x-delimited file in R by specifying its delimiter as:
sep='x'
(wherex
is the delimiter).
File | R Command |
Comma-delimited (.CSV) |
|
Tab-delimited (.TSV) |
|
Pipe-delimited (.TXT) |
|
Any x-delimited (.TXT) |
|