Apache Spark Introduction

This article was posted on Data Flair. Below is a quick overview of the original article.


This tutorial provides introduction to Apache Spark, what are its ecosystem components, Spark abstraction – RDD, transformation and action. The objective of this introductory guide is to provide detailed overview of spark, its history, architecture, deployment mode and RDD.


Apache Spark was introduced in 2009 in the UC Berkeley R&D Lab, later it become AMPLab. It was open sourced in 2010 under BSD license. In 2013 spark was donated to Apache Software Foundation where it became top-level project in 2014. Apache Spark became most popular project at apache in 2015.


Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API like Java, Scala, Python and R. Apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than Hadoop and 10 times faster than accessing data from disk. Spark is written in Scala but provides rich APIs in Scala, Java, Python and R. I can be integrated with Hadoop and can process existing HDFS data.





Anuncio publicitario

Deja una respuesta

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Salir /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Salir /  Cambiar )

Conectando a %s

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios.