INTEGRATE SPARKR AND R FOR BETTER DATA SCIENCE WORKFLOW

by Yanbo Liang

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, which lets data scientists interpret data efficiently.

However, data analysis using R is limited by the amount of memory available on a single machine and further as R is single threaded it is often impractical to use R on large datasets. To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users. Spark provides distributed processing engine, data source, off-memory data structures. R provides a dynamic environment, interactivity, packages, visualization. SparkR combines the advantages of both Spark and R.

Here’s the complete article:

https://es.hortonworks.com/blog/integrate-sparkr-and-r-for-better-data-science-workflow/

NoeliaGorod | Inteligencia Artificial aplicada a negocio

Estrategia, casos reales y gobernanza de IA para empresas y líderes.

INTEGRATE SPARKR AND R FOR BETTER DATA SCIENCE WORKFLOW

by Yanbo Liang

Deja un comentario Cancelar la respuesta

INTEGRATE SPARKR AND R FOR BETTER DATA SCIENCE WORKFLOW

by Yanbo Liang

Comparte esto:

Relacionado

Deja un comentario Cancelar la respuesta