The easiest form of data to import into R is a simple text file, and this will often be acceptable for problems
of small or medium scale. The primary function to import from a text file is scan, and this underlies most of the
more convenient functions data import functions (e.g., read.table()) discussed below.
Reading Rectangular Data Matrices
The function read.table is the most convenient way to read in a rectangular data matrix. Because of the many possibilities, there are several other functions that call read.table but change a group of default arguments (e.g., read.csv, read.delim). Beware that read.table is an inefficient way to read in very large data sets
The simplest call to the read.table function only requires a file name.
> X=read.table("C:/MyData/Cereal.dat")
Note that we are using "/" and not "\" in the file description.
There are several other options that can be passed to the function. Some of the most important options are summarized below:
Variable names. By default R assumes that the data file does not have a header in the first row. If however the first row contains the variable names you need to invoke the header=T option.
> X=read.table(�C:/MyData/Cereal.dat�,header=T)
If the file does not contain a header, but you�d like to give the names explicitly use the col.names option.
> X=read.table(�C:/MyData/Cereal.dat�,col.names=c(�Name�,�Man�,�H/C�))
The separator. The sep option is used to tell R how the columns or variables are separated in the text file. By default R assumes that columns are separated by white space, that is, spaces or tabs. However, sometimes other separators are used:
> read.table(�C:/MyData/Cereal.dat�,sep=�,�)
Missing data. Many times there is missing data in our data matrix. Different programs use different symbols to represent such data. SAS for example uses �.� (without the quotes) to represent missing data. R on the other hand uses the character string NA. If your data file contains a character string other than NA to represent missing data, then you need to tell R with the na.string option.
> read.table(�C:/MyData/Cereal.dat�,sep=�,�,na.string=�.�)
Comments. Sometimes a data file might contain comment lines that should be ignored when the data is being imported into R . By default R assumes that comments begin with the # symbol. To change the symbol used for comments, simply use the comment.char option.
Skipping rows of the data file Sometimes there may be several rows of text preceding the actual data set in the text file, and you want to ignore those lines when reading the data matrix. This can easily be accomplished by: (a) using the comment character in the text file; or (b) using the skip option.
> X=read.table(�C:/MyData/junk.dat�,skip=5)
Or you may only want to read in the first n rows of data. This is accomplished using the nrow option. The command
> X=read.table(�C:/MyData/stuff.dat�,nrows=5)
will read in the first five rows of data.
All of the options discussed above can be used separately or together depending on the format of your data.
Describing Matrix Data
The object X is a matrix containing the data (X=read.table(�...). You can do various things with matrices.