1. 程式人生 > >How To Get Started With Machine Learning in R (get results in one weekend)

How To Get Started With Machine Learning in R (get results in one weekend)

How do you get started with machine learning in R?

R is a large and complex platform. It is also the most popular platform for the best data scientists in the world.

In this post you will discover the step-by-step process that you can use to get started using machine learning for predictive modeling on the R platform.

The steps are practical and so simple that you could be able to build accurate predictive models after one weekend.

The process does assume that you are a developer, know a little machine learning and will actually do the work, but the process does deliver results.

Let’s get started.

How To Get Started With Machine Learning in R

How To Get Started With Machine Learning in R
Photo by

Sebastiaan ter Burg, some rights reserved.

Learn R The Wrong Way

Here is how I DON’T think you should study machine learning in R.

  • Step 1: Get really good at R programming and R syntax.
  • Step 2: Know the deep theory of every possible algorithm you could use in R.
  • Step 3: Study to¬†great detail how to use each machine learning algorithm in R.
  • Step 4: Only lightly touch on how to evaluate models.

I think this is the wrong way.

  • It teaches you that you need to spend all your time learning how to use individual machine learning algorithms.
  • It does not teach you the process of building predictive machine learning models in R that you can actually use in practice to make predictions.

Sadly, this is the approach used to teach machine learning in R that I see in almost all books and online courses on the topic.

You don’t want to be a badass at R or even at machine learning algorithms in R. You want to be a badass at building accurate predictive models using R. This is the context.

You can take time to learn individual machine learning algorithms in great detail, so long as it aids you in building more accurate predictive models, more reliably.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Good Background For Machine Learning in R

You can just dive into R. Go for it.

In my opinion though, I think you will get a lot more out of it if you have some background.

R is an advanced platform and you can get a lot out of it as a beginner. But, if you have a little machine learning and a little programming as a foundation, R will become a superpower for building accurate predictive models very quickly.

General Suggestions

Here are some suggestions for getting the most out of getting started with machine learning in R. I think these are reasonable for a modern developer interested in machine learning.

A developer who knows how to program. This helps because it won’t be a big deal to pick up the syntax of R, which at times can be a little odd. It is also helpful to know who to whip up scripts or script-lets (mini scripts) to do this or that task. R is a programming language after all.

Interested in predictive modeling machine learning. Machine learning is a big field that covered a variety of interesting algorithms. Predictive modeling is a subset that is only concerned with building models that make predictions on new data. Not explaining the relationships between data, nor learning from data in general. I predictive modeling is where R really shines as a platform for machine learning.

Familiar with machine learning basics. You understand machine learning as induction problem where all algorithms are really just trying to estimate and underlying mapping function from an input space to an output space. All predictive machine learning makes sense through this lens as do strategies of searching for good and best machine learning algorithms, algorithm parameters and data transforms.

Specific Suggestions

The approach I layout in the next section also makes some assumptions about your background.

You are not an absolute beginner in machine learning. You could be, and the approach may work for you, but the you will get a lot more out of it if you have some additional suggested background.

You want to use a top-down approach to studying machine learning. This is the approach I teach where rather than starting with theory and principles and eventually touch in practical machine learning if there is time, that you start with the goal of working through a project end-to-end and research details as you need them in order to deliver better results.

You are familiar with the steps in a predictive modeling machine learning project. Specifically:

  1. Define Problem
  2. Prepare Data
  3. Evaluate Algorithms
  4. Improve Results
  5. Present Results

You can learn more about this process and these steps here:

You are at least familiar with some machine learning algorithms. Or you may know how to pick them up quickly, for example using the algorithm description template method. I think learning the details of how and why machine learning algorithms is a separate task from learning how to use those algorithms on a machine learning platform like R. They are often conflated in books and course at the determinant of learning.

You can learn more about how to learn any machine learning algorithm using the template method here:

How To Learn Machine Learning in R

This section lays out a process that you can use to get started with building machine learning predictive models on the R platform.

It is divided into two parts:

  1. Map the tasks of a machine learning project onto the R platform.
  2. Work through predictive modeling projects using standard datasets.

1. Map Machine Tasks Onto R

You need to know how to do specific tasks of a machine learning on the R platform. Once you know how to complete a discrete task using the platform and get a result reliably, you can do it again and again on project after project.

This process is straightforward:

  1. List out all of the discrete tasks of a predictive modeling machine learning project.
  2. Create recipes to complete the task reliably that you can copy-paste as a starting point on future projects.
  3. Add to and maintain the recipes are your understanding of the platform and machine learning improves.

Predictive Modeling Tasks

Below is a minimum list of predictive modeling tasks you may want to map to R the R platform and create recipes. This not complete, but does cover the broad strokes of the platform:

  1. Overview of R syntax
  2. Prepare Data
    1. Loading Data
    2. Working With Data
    3. Data Summarization
    4. Data Visualization
    5. Data Cleaning
    6. Feature Selection
    7. Data Transforms
  3. Evaluate Algorithms
    1. Resampling Methods
    2. Evaluation Metrics
    3. Spot-Check Algorithms
    4. Model Selection
  4. Improve Results
    1. Algorithm Tuning
    2. Ensemble Methods
  5. Present Results
    1. Finalize Model
    2. Make New Predictions

You will notice the first task is an overview of R syntax. As a developer, you need to know the basics of the language before you can do anything. Such as assignment, data structures, flow control and creating and calling functions.

Library of Standalone Recipes

I recommend creating recipes that are standalone. That means that each recipe is a complete program that has everything it needs to achieve the task and produce an output. This means that you can copy it directly into a future predictive modeling project.

You can store the recipes in a directory or on GitHub.

2. Small Predictive Modeling Projects

Recipes for common predictive modeling tasks with machine learning are not enough.

Again, this is where most books and courses stop. They leave it to you to piece together the recipes into end-to-end projects.

You need to piece the recipes together into end-to-end projects. This will teach and show you how to actually deliver a result using the platform. I recommend only using small well understood machine learning datasets from the UCI Machine learning repository.

These datasets are available for free as CSV downloads, and most are available directly in R by loading third party libraries. These datasets are excellent for practicing because:

  1. They are small, meaning they fit into memory and algorithms can model them in reasonable time.
  2. They are well behaved, meaning you often don’t need to do a lot of feature engineering to get a good result.
  3. There are standards, meaning that many people have used them before and you can get ideas of good algorithms to try and good results you should expect.

I recommend at least three projects:

  1. Hello World Project (iris flowers). This is a quick pass through the project steps without much tuning or optimizing on a dataset that is widely used as the hello world of machine learning (more on the iris flowers dataset).
  2. Binary Classification end-to-end. Work through each step on a binary classification problem (e.g. the Pima Indians diabetes dataset).
  3. Regression end-to-end. Work through each step of the process with a regression problem (e.g. the Boston housing dataset).

Add and Maintain Recipes

Machine learning with R does not stop at working through a few small standard datasets. You need to take on more and different challenges.

  • Standard Datasets: You could practice on additional standard datasets from the UCI Machine Learning repository, overcoming the challenges of different problem types.
  • Competition Datasets: You could try working through some more challenging datasets, such as those from past Kaggle competitions or those from past KDDCup challenges.
  • Your Own Projects: Ideally, you need to start working through your own projects.

All the while you will be dipping into help, adapting your scripts and learning how to get more out of machine learning on R.

It is important that you fold this knowledge back into your catalog of machine learning recipes. This will let you leverage this knowledge quickly on new projects and contribute greatly to your skill and speed at developing predictive models.

Your Outcomes From This Process

You could work through this process in one weekend. By the end of that weekend, you will have the recipes and project templates that you can use to start modeling your own problems using machine learning in R.

You will go from a developer that is interested in machine learning on R to a developer who has the resources and capability to work through a new dataset end-to-end using R and develop a predictive model to be presented and deployed.

Specifically, you will know:

  • How to achieve the subtasks of a predictive modeling problem in R.
  • How to learn new and different sub tasks in R.
  • How to get help with R.
  • How to work through a small to medium sized dataset end-to-end.
  • How to deliver a model that can make predictions on new unseen data.

From here you can start to dive into the specifics of the functions, techniques and algorithms used with the goal of learning how to use them better in order to deliver more accurate predictive models, more reliably in less time.

Summary

In this post you discovered a step-by-step process that you can use to study and get started with machine learning in R.

The three high-level steps of the process are:

  1. Map the steps of a predictive modeling process onto the R platform with recipes that you can reuse.
  2. Work through small standard machine learning datasets to piece the recipes together into projects.
  3. Work through more and different datasets, ideally your own, and add to your library of recipes.

You also discovered he philosophy behind the process and the reasons why this process is the best process for you.

Next Step

Do you want to get started in machine learning with R?

  1. Download and install R right now.
  2. Use the process outline above, limit yourself to one weekend and go as far as you can.
  3. Report back. Leave a comment. I would love to hear how you went.

Do you have a question about this process? Leave a comment, I’ll do my best to answer it.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

How To Get Started With Machine Learning Algorithms in R

Tweet Share Share Google Plus R is the most popular platform for applied machine learning. When

How to Get Started with Machine Learning in Python

Tweet Share Share Google Plus The Python conference PyCon2014 has held recently and the videos f

How To Get Started With Machine Learning in R (get results in one weekend)

Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an

Cool Factor: How to Steal Styles with Machine Learning, Turi Create, and ResNet

Turi Style TransferFirst of all, follow the Turi Create installation instructions on GitHub. It’s imperative to create a Python 2.7 environment with the sp

How to Get Started with Deep Learning for Natural Language Processing (7

Tweet Share Share Google Plus Deep Learning for NLP Crash Course. Bring Deep Learning methods to

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

How to Clean Text for Machine Learning with Python

Tweet Share Share Google Plus You cannot go straight from raw text to fitting a machine learning

Learn How to Code and Deploy Machine Learning Models on Spark Structured Streaming

This post is a token of appreciation for the amazing open source community of Data Science, to which I owe a lot of what I have learned. For last few month

Convert unstructured data to structured data with machine learning

They stream movies and send texts and pictures to the other side of the world. Each second, a huge amount of data is created and collected. But, still, bus

How to Assess Startups Using Machine Learning: Part II

The GASPBecause there is no standard industry practice in venture capital to assess startups, we took it on ourselves to design a framework that can be use

Getting Started With Machine Learning

Getting Started With Machine LearningWhat are the fundamentals of machine learning, and what are the necessary tools to evaluate risk and other concerns in

How to Prepare Data For Machine Learning

Tweet Share Share Google Plus Machine learning algorithms learn from data. It is critical that y

Get Started with Deep Learning Using the AWS Deep Learning AMI

Whether you’re new to deep learning or want to build advanced deep learning projects in the cloud, it’s easy to get started by using AWS.

How To Fix “Problem with MergeList /var/lib/apt/lists” Error In Ubuntu 11.04

While using the package manager or trying to install applications through Terminal, it is possible to get a nasty error which is something like this : E:E

How to Get Started With Conversational AI

An ever-expanding list of benefits and a growing demand for voice interfaces has placed Conversational AI high on the list as a key component for any digit

9 Ways to Get Help with Deep Learning in Keras

Tweet Share Share Google Plus Keras is a Python deep learning library that can use the efficient

How to Use Metrics for Deep Learning with Keras in Python

Tweet Share Share Google Plus The Keras library provides a way to calculate and report on a suit

Course to Get Started with XGBoost in Python

Tweet Share Share Google Plus XGBoost With Python Mini-Course. XGBoost is an implementation of g

How To Get Better At Machine Learning

Tweet Share Share Google Plus Colorado Reed from Metacademy wrote a great post recently titled “

[Javascript] Classify text into categories with machine learning in Natural

bus easy ann etc hms scrip steps spam not In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression cl