Perceptron

Implementing the perceptron algorithm using numerical programming, and demonstrating its use on synthetic data sets.
Author

Bell

Published

February 26, 2023

Image credits: BruceBlaus under CC BY 3.0 via Wikimedia Commons.

Python source: 451-blog/perceptron.py at main · doabell/451-blog

Instructions can be found at Implementing the Perceptron Algorithm.

Experiments

Linearly separable data

Through trial and error, we generate some linearly separable data and test the perceptron algorithm on it.

# Data
from sklearn.datasets import make_blobs
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import font_manager
font_manager.fontManager.addfont("C:\Windows\Fonts\FiraSans-Regular.ttf")
plt.rcParams["font.family"] = "Fira Sans"
np.random.seed(283)

n = 120
p_features = 3

X, y = make_blobs(
    n_samples=n,
    n_features=p_features - 1,
    centers=[(-1.7, -1.7), (1.7, 1.7)]
)

fig = plt.scatter(X[:, 0], X[:, 1], c=y)
xlab = plt.xlabel("Feature 1")
ylab = plt.ylabel("Feature 2")

There is a clear line that separates the two blobs here.

Let us find out if our perceptron algorithm can find that line.

First, we fit the algorithm with at most 1000 steps:

# Algorithm
from perceptron import Perceptron

p = Perceptron()
p.fit(X, y, max_steps=1000)

Then, we can visualize the accuracy of each iteration:

# score history
fig = plt.plot(p.history)
xlab = plt.xlabel("Iteration")
ylab = plt.ylabel("Accuracy")

We can see that for this example, w only changed 4 times.

Then the model reached 100% accuracy and stopped running.

We can then visualize the dividing line found by the perceptron:

# visualize
def draw_line(w, x_min, x_max):
    x = np.linspace(x_min, x_max, 101)
    y = -(w[0]*x + w[2])/w[1]
    plt.plot(x, y, color="black")


fig = plt.scatter(X[:, 0], X[:, 1], c=y)
fig = draw_line(p.w, -2, 2)

xlab = plt.xlabel("Feature 1")
ylab = plt.ylabel("Feature 2")

# score, should be 1
p.score(X, y)
1.0

That line divided the two blobs right in the middle, resulting in a 1.0 accuracy. Not bad!

Not linearly separable

Next, we consider 2-d data that is not linearly separable.

np.random.seed(227)

n = 120
p_features = 3

X, y = make_blobs(
    n_samples=n,
    n_features=p_features - 1,
    centers=[(-1.7, -1.7), (1.7, 1.7)]
)

fig = plt.scatter(X[:, 0], X[:, 1], c=y)
xlab = plt.xlabel("Feature 1")
ylab = plt.ylabel("Feature 2")

The dividing line should be at about the same position, but now the two blobs overlap with each other.

Let us observe whether the perceptron algorithm still manages to find that line.

# Algorithm
p = Perceptron()
p.fit(X, y, max_steps=1000)

# score history
fig = plt.plot(p.history)
xlab = plt.xlabel("Iteration")
ylab = plt.ylabel("Accuracy")

We can see that given data that is not linearly separable, w does not converge.

Instead, the algorithm iterates the full 1000 iterations, and accuracy fluctuates between around 0.95 and 0.97.

This is expected, because there is no line that perfectly separates these two blobs.

As for the dividing line:

# visualize
def draw_line(w, x_min, x_max):
    x = np.linspace(x_min, x_max, 101)
    y = -(w[0]*x + w[2])/w[1]
    plt.plot(x, y, color="black")


fig = plt.scatter(X[:, 0], X[:, 1], c=y)
fig = draw_line(p.w, -2, 2)

xlab = plt.xlabel("Feature 1")
ylab = plt.ylabel("Feature 2")

# score, should not be 1
p.score(X, y)
0.95

At the cost of ~6 misclassified data points, the algorithm yielded an accuracy of 0.95. Good enough.

Higher dimensions

Finally, we increase the number of features to reach higher dimensions.

Visualization will be vard beyond 3-d, so we will instead look at the score history during training only.

We will be using 8 features (including y) and 1500 data points.

np.random.seed(149)

n = 1500
p_features = 7

# make centers with correct dimensions
c1 = np.empty(p_features - 1)
c1.fill(-1.5)

c2 = np.empty(p_features - 1)
c2.fill(1.5)

X, y = make_blobs(
    n_samples=n,
    n_features=p_features - 1,
    centers=[c1, c2]
)

# Algorithm
p = Perceptron()
p.fit(X, y, max_steps=1000)

# score history
fig = plt.plot(p.history)
xlab = plt.xlabel("Iteration")
ylab = plt.ylabel("Accuracy")

# score
p.score(X, y)
1.0

On this dataset, our perceptron algorithm reached 100% accuracy!

As for the accuracy during model fitting, the process took more than 500 iterations, but ended up achieving a perfect score at the end.

Because our perceptron converged, we can say that this particular dataset is linearly separable.

Note on complexity

What is the runtime complexity of a single iteration of the perceptron algorithm update as described by Equation 1? Assume that the relevant operations are addition and multiplication. Does the runtime complexity depend on the number of data points n? What about the number of features p?

The relevant line is Line 51 in perceptron.py:

self.w = self.w + ((y_[i] * self.w @ X_[i]) < 0) * y_[i] * X_[i]

We can use numpy.ndarray.shape to find out the shape of these variables involved.

# From line 39
X_ = np.append(X, np.ones((X.shape[0], 1)), 1)

print(f"Shape of self.w is {p.w.shape}")
print(f"Shape of y_[i] is {y[0].shape}")
print(f"Shape of X_[i] is {X_[0].shape}")
Shape of self.w is (7,)
Shape of y_[i] is ()
Shape of X_[i] is (7,)

We can see that self.w and X_[i] have length p, while y[i] is a single integer.

Therefore, the runtime complexity is O(p) and does not depend on the number of data points n.

By intuition, we do not have to look at all the data points when updating self.w. We only need to do that when determining the score, but that is elsewhere in the function and not described by Equation 1.