How to Make Predictions with scikit

How to predict classification or regression outcomes
with scikit-learn models in Python.

Once you choose and fit a final machine learning model in scikit-learn, you can use it to make predictions on new data instances.

There is some confusion amongst beginners about how exactly to do this. I often see questions such as:

How do I make predictions with my model in scikit-learn?

In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized machine learning model in the scikit-learn Python library.

After completing this tutorial, you will know:

How to finalize a model in order to make it ready for making predictions.
How to make class and probability predictions in scikit-learn.
How to make regression predictions in scikit-learn.

Let’s get started.

Gentle Introduction to Vector Norms in Machine Learning
Photo by Cosimo, some rights reserved.

Tutorial Overview

This tutorial is divided into 3 parts; they are:

First Finalize Your Model
How to Predict With Classification Models
How to Predict With Regression Models

1. First Finalize Your Model

Before you can make predictions, you must train a final model.

You may have trained models using k-fold cross validation or train/test splits of your data. This was done in order to give you an estimate of the skill of the model on out-of-sample data, e.g. new data.

These models have served their purpose and can now be discarded.

You now must train a final model on all of your available data.

You can learn more about how to train a final model here:

2. How to Predict With Classification Models

Classification problems are those where the model learns a mapping between input features and an output feature that is a label, such as “spam” and “not spam.”

Below is sample code of a finalized LogisticRegression model for a simple binary classification problem.

Although we are using LogisticRegression in this tutorial, the same functions are available on practically all classification algorithms in scikit-learn.

# example of training a final classification model
from sklearn.linear_model import LogisticRegression
from sklearn.datasets.samples_generator import make_blobs
# generate 2d classification dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1)
# fit final model
model = LogisticRegression()
model.fit(X, y)

12345678

After finalizing your model, you may want to save the model to file, e.g. via pickle. Once saved, you can load the model any time and use it to make predictions. For an example of this, see the post:

For simplicity, we will skip this step for the examples in this tutorial.

There are two types of classification predictions we may wish to make with our finalized model; they are class predictions and probability predictions.

Class Predictions

A class prediction is: given the finalized model and one or more data instances, predict the class for the data instances.

We do not know the outcome classes for the new data. That is why we need the model in the first place.

We can predict the class for new data instances using our finalized classification model in scikit-learn using the predict() function.

For example, we have one or more data instances in an array called Xnew. This can be passed to the predict() function on our model in order to predict the class values for each instance in the array.

Xnew = [[...], [...]]
ynew = model.predict(Xnew)

12	Xnew=[[...],[...]]ynew=model.predict(Xnew)

Multiple Class Predictions

Let’s make this concrete with an example of predicting multiple data instances at once.

# example of training a final classification model
from sklearn.linear_model import LogisticRegression
from sklearn.datasets.samples_generator import make_blobs
# generate 2d classification dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1)
# fit final model
model = LogisticRegression()
model.fit(X, y)
# new instances where we do not know the answer
Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
for i in range(len(Xnew)):
	print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

123456789101112131415

# example of training a final classification modelfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets.samples_generator import make_blobs# generate 2d classification datasetX,y=make_blobs(n_samples=100,centers=2,n_features=2,random_state=1)# fit final modelmodel=LogisticRegression()model.fit(X,y)# new instances where we do not know the answerXnew,_=make_blobs(n_samples=3,centers=2,n_features=2,random_state=1)# make a predictionynew=model.predict(Xnew)# show the inputs and predicted outputsforiinrange(len(Xnew)):print("X=%s, Predicted=%s"%(Xnew[i],ynew[i]))

Running the example predicts the class for the three new data instances, then prints the data and the predictions together.

X=[-0.79415228  2.10495117], Predicted=0
X=[-8.25290074 -4.71455545], Predicted=1
X=[-2.18773166  3.33352125], Predicted=0

123	X=[-0.79415228 2.10495117], Predicted=0X=[-8.25290074 -4.71455545], Predicted=1X=[-2.18773166 3.33352125], Predicted=0

Single Class Prediction

If you had just one new data instance, you can provide this as instance wrapped in an array to the predict() function; for example:

# example of making a single class prediction
from sklearn.linear_model import LogisticRegression
from sklearn.datasets.samples_generator import make_blobs
# generate 2d classification dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1)
# fit final model
model = LogisticRegression()
model.fit(X, y)
# define one new instance
Xnew = [[-0.79415228, 2.10495117]]
# make a prediction
ynew = model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

12345678910111213

# example of making a single class predictionfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets.samples_generator import make_blobs# generate 2d classification datasetX,y=make_blobs(n_samples=100,centers=2,n_features=2,random_state=1)# fit final modelmodel=LogisticRegression()model.fit(X,y)# define one new instanceXnew=[[-0.79415228,2.10495117]]# make a predictionynew=model.predict(Xnew)print("X=%s, Predicted=%s"%(Xnew[0],ynew[0]))

Running the example prints the single instance and the predicted class.

X=[-0.79415228, 2.10495117], Predicted=0

1	X=[-0.79415228, 2.10495117], Predicted=0

A Note on Class Labels

When you prepared your data, you will have mapped the class values from your domain (such as strings) to integer values. You may have used a LabelEncoder.

This LabelEncoder can be used to convert the integers back into string values via the inverse_transform() function.

For this reason, you may want to save (pickle) the LabelEncoder used to encode your y values when fitting your final model.

Probability Predictions

Another type of prediction you may wish to make is the probability of the data instance belonging to each class.

This is called a probability prediction where given a new instance, the model returns the probability for each outcome class as a value between 0 and 1.

You can make these types of predictions in scikit-learn by calling the predict_proba() function, for example:

Xnew = [[...], [...]]
ynew = model.predict_proba(Xnew)

12	Xnew=[[...],[...]]ynew=model.predict_proba(Xnew)

This function is only available on those classification models capable of making a probability prediction, which is most, but not all, models.

The example below makes a probability prediction for each example in the Xnew array of data instance.

# example of making multiple probability predictions
from sklearn.linear_model import LogisticRegression
from sklearn.datasets.samples_generator import make_blobs
# generate 2d classification dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1)
# fit final model
model = LogisticRegression()
model.fit(X, y)
# new instances where we do not know the answer
Xnew, _ = make_blobs(n_samples=3, centers=2, n_features=2, random_state=1)
# make a prediction
ynew = model.predict_proba(Xnew)
# show the inputs and predicted probabilities
for i in range(len(Xnew)):
	print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

123456789101112131415

# example of making multiple probability predictionsfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets.samples_generator import make_blobs# generate 2d classification datasetX,y=make_blobs(n_samples=100,centers=2,n_features=2,random_state=1)# fit final modelmodel=LogisticRegression()model.fit(X,y)# new instances where we do not know the answerXnew,_=make_blobs(n_samples=3,centers=2,n_features=2,random_state=1)# make a predictionynew=model.predict_proba(Xnew)# show the inputs and predicted probabilitiesforiinrange(len(Xnew)):print("X=%s, Predicted=%s"%(Xnew[i],ynew[i]))

Running the instance makes the probability predictions and then prints the input data instance and the probability of each instance belonging to the first and second classes (0 and 1).

X=[-0.79415228 2.10495117], Predicted=[0.94556472 0.05443528]
X=[-8.25290074 -4.71455545], Predicted=[3.60980873e-04 9.99639019e-01]
X=[-2.18773166 3.33352125], Predicted=[0.98437415 0.01562585]

123	X=[-0.79415228 2.10495117], Predicted=[0.94556472 0.05443528]X=[-8.25290074 -4.71455545], Predicted=[3.60980873e-04 9.99639019e-01]X=[-2.18773166 3.33352125], Predicted=[0.98437415 0.01562585]

This can be helpful in your application if you want to present the probabilities to the user for expert interpretation.

3. How to Predict With Regression Models

Regression is a supervised learning problem where, given input examples, the model learns a mapping to suitable output quantities, such as “0.1” and “0.2”, etc.

Below is an example of a finalized LinearRegression model. Again, the functions demonstrated for making regression predictions apply to all of the regression models available in scikit-learn.

# example of training a final regression model
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1)
# fit final model
model = LinearRegression()
model.fit(X, y)

12345678

We can predict quantities with the finalized regression model by calling the predict() function on the finalized model.

As with classification, the predict() function takes a list or array of one or more data instances.

Multiple Regression Predictions

The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.

# example of training a final regression model
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# fit final model
model = LinearRegression()
model.fit(X, y)
# new instances where we do not know the answer
Xnew, _ = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
for i in range(len(Xnew)):
	print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))

123456789101112131415

# example of training a final regression modelfrom sklearn.linear_model import LinearRegressionfrom sklearn.datasets import make_regression# generate regression datasetX,y=make_regression(n_samples=100,n_features=2,noise=0.1)# fit final modelmodel=LinearRegression()model.fit(X,y)# new instances where we do not know the answerXnew,_=make_regression(n_samples=3,n_features=2,noise=0.1,random_state=1)# make a predictionynew=model.predict(Xnew)# show the inputs and predicted outputsforiinrange(len(Xnew)):print("X=%s, Predicted=%s"%(Xnew[i],ynew[i]))

Running the example makes multiple predictions, then prints the inputs and predictions side-by-side for review.

X=[-1.07296862 -0.52817175], Predicted=-61.32459258381131
X=[-0.61175641 1.62434536], Predicted=-30.922508147981667
X=[-2.3015387 0.86540763], Predicted=-127.34448527071137

123	X=[-1.07296862 -0.52817175], Predicted=-61.32459258381131X=[-0.61175641 1.62434536], Predicted=-30.922508147981667X=[-2.3015387 0.86540763], Predicted=-127.34448527071137

Single Regression Prediction

The same function can be used to make a prediction for a single data instance as long as it is suitably wrapped in a surrounding list or array.

For example:

# example of training a final regression model
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# fit final model
model = LinearRegression()
model.fit(X, y)
# define one new data instance
Xnew = [[-1.07296862, -0.52817175]]
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

1234567891011121314

How to Make Predictions with scikit

How to predict classification or regression outcomes
with scikit-learn models in Python.

Tutorial Overview

1. First Finalize Your Model

2. How to Predict With Classification Models

Class Predictions

Multiple Class Predictions

Single Class Prediction

A Note on Class Labels

Probability Predictions

3. How to Predict With Regression Models

Multiple Regression Predictions

Single Regression Prediction

How to Make Predictions with scikit

【 InkGenius】Good developers who are familiar with the entire stack know how to make life easier for those around

How to make your iOS apps more secure with SSL pinning

How to make HTTP Post request with JSON body in Swift

How to make the impossible possible in CSS with a little creativity

Listeners with several functions in Kotlin. How to make them shine?

How To Make My Python Code Shorter With Function.

How to Make Blogging Easier with Artificial Intelligence

How to make unit test on Android with Kotlin (KAD 22)

How To Make A Swipeable Table View Cell With Actions – Without Going Nuts With Scroll Views

How to Bind VVDI2 with VVDI Key Tool To Get ID48 Copy Function

How to Add “Open with Notepad” to the Windows Context Menu for All Files

Data Wrangling文摘：How to share data with a statistician

How to make a GroupBox in website development by VS.NET2005

[iOS] How to sort NSMutableDictionary with dynamic keys?

[iOS] How to make a Global function in Swift

Here's how to boost enrollment with chatbots

How to Make a Profit in a Bear Market?

How to use APIs with Pandas and store the results in Redshift

How to Make Sure You Can Trust Your Artificial Intelligence

How to Make Predictions with scikit

How to predict classification or regression outcomes with scikit-learn models in Python.

Tutorial Overview

1. First Finalize Your Model

2. How to Predict With Classification Models

Class Predictions

Multiple Class Predictions

Single Class Prediction

A Note on Class Labels

Probability Predictions

3. How to Predict With Regression Models

Multiple Regression Predictions

Single Regression Prediction

相關推薦

How to predict classification or regression outcomes
with scikit-learn models in Python.