Introduction to Machine Learning for non-developers


About Machine Learning

Introduction to Machine Learning for non-developers

We all know that machine learning is about handling data, but it also can be seen as:

The art of finding order in data by browsing its inner information.

Some background on predictive models

There are several types of predictive models. These models usually have several input columns and one target or outcome column, which is the variable to be predicted.

Introduction to Machine Learning for non-developers

So basically, a model performs mapping between inputs and an output, finding-mysteriously, sometimes-the relationships between the input variables in order to predict any other variable.

As you may notice, it has some commonalities with a human being who reads the environment => processes the information => and performs a certain action.

So what is this post about?

It’s about becoming familiar with one of the most-used predictive models: Random Forest (official algorithm site), implemented in R, one of the most-used models due to its simplicity in tuning and robustness across many different types of data.

If you’ve never done a predictive model before and you want to, this may be a good starting point 😉

Don’t get lost in the forest!

Introduction to Machine Learning for non-developers

The basic idea behind it is to build hundreds or even thousands of simple and less-robust models (aka decision trees) in order to have a less-biased model.

But how?

Every ‘tiny’ branch of these decision tree models will see just part of the whole data to produce their humble predictions. So the final decision produced by the random forest model is the result of voting by all the decision trees. Just like democracy.

And what is a decision tree?

You’re already familiar with decision tree outputs: they produce IF-THEN rules, such as, If the user has more than five visits, he or she will probably use the app.

Introduction to Machine Learning for non-developers

Putting all together…

If a random forest has three trees (but normally 500-plus) and a new customer arrives, then the prediction whether said customer will buy a certain product will be ‘yes’ if ‘two trees’ predict ‘yes’.

Introduction to Machine Learning for non-developers

Having hundreds of opinions –decision trees– tends to produce a more accurate result on average –random forest-.

But don’t panic, all of the above is encapsulated to the data scientist.

With this model, you will not be able to easily know how the model comes to assign a high or low probability to each input case. It acts more like a black box, similar to what is used for deep learning with neural networks, where every neuron contributes to the whole.

You can practice with these two friendly random forest tutorials using R from blopig.com and datascienceplus.com.

What language is convenient for learning machine learning?

If you want to develop your own data science projects, you could start with R. It has a enormous community from which you can learn (and teach). It’s not always just a matter of complex algorithms, but also about having support when things don’t go as expected.

And this occurs often when you’re doing new things.

Introduction to Machine Learning for non-developers

Finally, some numbers about community support

Despite the fact that R (and Python with pandas and numpy) has lots of packages, libraries, free books, and free courses, check these metrics: There are more than 236,000 questions in stackoverflow.com, and another ~18,000 in stats.stackexchange.com are tagged with R (as of May 2018).

The R community grew a lot!

By the time I posted this at https://auth0.com (Dec 2016), there were 160,000 questions at stackoverflow.

Now the 236,000 questions represents an increasing of 47% in just a one and a half years! 🎉

Definitely R community rocks.

That’s all by now, thanks! 🙂

POST HERE

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión /  Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )

Conectando a %s