If you are thinking of getting into R, this tutorial will give you a brief idea about how you should begin with. Through this tutorial, I have tried to give a basic insight into data science using R.

Installation of R and R Studio

You can download the detailed setup for R and R Studio from “here”.

After downloading and installing the aforementioned software, you are all set to begin your programming journey with R. Now, you may open R Studio, click on File, New File, and lastly on R Script.

Installing Packages and Importing Libraries

Let’s understand packages & libraries and how they play a significant role in R programming.


In the previous article, we covered Fundamentals of BIG DATA with PySpark. In this article, we will explore PySpark SQL which is Spark’s high level API for working with structured data. You’ll learn how to interact with PySparkSQL using DataFrame API and SQL query. Also, cover some visualization methods that can help us make sense of our data in PySpark DataFrames.In the end, you’ll learn important Machine Learning algorithms.

PySpark SQL

PySpark SQL is a Spark library for structured data. Unlike the PySpark RDD API, PySpark SQL provides more information about the structure of data and its computation. …


This article introduces the exciting world of Big Data, as well as the various concepts and different frameworks for processing Big Data. You will understand why Apache Spark is considered the best framework for BigData

Big Data concepts and Terminology

What exactly is Big Data? It is a term which refers to the study and applications of data sets that are too complex for traditional data-processing software.

There are three Vs of Big data that are used to describe its characteristics: Volume refers to the size of data, Variety refers to different sources and formats of data and Velocity is the speed at which data…

It is majorly used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). Tree-based models use a series of if-then rules to generate predictions from one or more decision trees. All tree-based models can be used for either regression (predicting numerical values) or classification (predicting categorical values). Hence, We’ll explore five different types of tree-based models.

Supervised Learning and Unsupervised Learning

Supervised learning is the subfield of machine learning in which you train a model using input data and…

In this article, you’ll learn the intertwined processes of data manipulation, extraction and visualization using the tools dplyr and ggplot2. You’ll learn to manipulate data by filtering, sorting, and summarizing a real dataset in order to answer exploratory questions. Henceforth, you’ll get a taste of the exploratory data analysis and the power of Tidyverse tools. If you have prior experience in R, you can continue with this article. Otherwise, I would recommend to have a glance in this tutorial Quick Tutorial on R for better understanding.

Data Wrangling

In this section, you’ll learn to do three things with a table: filter for…

If you have acquired the expertise in the basics of R, you would find this tutorial quite helpful for enhancing few more concepts in R. I have tried to keep as simple as it could. So, Let’s dive into it.

Regular Expressions

A ‘regular expression’ is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression. Some of them are mentioned below:

grep() and grepl()

To search…

In this course, you would understand some tasks for the demonstrations and practices in SAS Viya for Learners. SAS Viya is the latest extension of the SAS Platform. Due to the distributed nature of this flexible computing environment, results from different sessions will be similar but likely not identical.


SAS Visual Analytics coupled with the power of Cloud Analytic Services (CAS) enables you to spend less time preparing and accessing the data and more time discovering trends, developing insights, and creating visually stunning reports to showcase your data.

1. Purpose and features of SAS Visual Analytics

Visual Analytics uses SAS high-performance technologies to accelerate analytic computations, which helps…

Aruna Singh

As an IT Analyst in RIL, I explored why data is considered as the new oil which eventually realized my affection towards its analogy and interpretation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store