Regular Expressions and Apply Functions — R for Data Science

Aruna Singh
Artificial Intelligence in Plain English
4 min readJan 8, 2021

--

If you have acquired the expertise in the basics of R, you would find this tutorial quite helpful for enhancing few more concepts in R. I have tried to keep as simple as it could. So, Let’s dive into it.

Regular Expressions

A ‘regular expression’ is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression. Some of them are mentioned below:

grep() and grepl()

To search and matches to the argument pattern within each element of a character vector. They differ in the format and amount of detail in the results.

For example:

sub() and gsub()

To perform the replacement of the first and all matches respectively.

Apply Functions

Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data like list or vector. An apply function is essentially a loop, but run faster than loops and often require less code.

lapply, sapply, vapply, tapply and mapply

They are all functions that will loop a function through data in a list or vector.

  1. lapply: The output is equivalent to the list.

If you have a custom function to incorporate in the lapply() then you need to pass the function name and argument value if required.

To unlist the list retrieved from lapply function, do this:

2. sapply(): It tries to simplify the list to an array. Look at the difference between lapply and sapply to understand better:

3. vapply(): Explicitly specify the output format.

Note: If you are trying to decide which of these three functions to use, because it is the simplest, I would suggest to use sapply if possible. If you do not want your results to be simplified to a vector, lapply should be used. If you want to specify the type of result you are expecting, use vapply.

4. tapply()

Sometimes you may want to perform the apply function on some data, but have it separated by factor. In that case, you should use tapply. Let’s take a look at the information for tapply .

The arguments for tapply are tapply(X, INDEX, FUN). The only new argument is INDEX, which is the factor you want to use to separate the data.

Now, let's use column 1 as the index and find the mean of column 2.

5. mapply() : Lastly, I would cover mapply(). The arguments for mapply are mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE). First you list the function, followed by the vectors you are using the rest of the arguments have default values so they don’t need to be changed for now. When you have a function that takes 2 arguments, the first vector goes into the first argument and the second vector goes into the second argument.

Well, we have covered the regular expressions and apply functions in this tutorial. Do share your feedback, hope you find it useful.

--

--

As a BIE at Amazon, I explore why we call data, the new oil by interpreting and generating meaningful insights.