PySpark SQL and DataFrames

Aruna Singh
Published in
8 min readJun 12, 2021

--

In the previous article, we covered Fundamentals of BIG DATA with PySpark. In this article, we will explore PySpark SQL which is Spark’s high level API for working with structured data. You’ll learn how to interact with PySparkSQL using DataFrame API and SQL query. Also, cover some visualization methods that can help us make sense of our data in PySpark DataFrames.In the end, you’ll learn important Machine Learning algorithms.

--

--

As a BIE at Amazon, I explore why we call data, the new oil by interpreting and generating meaningful insights.