Iterative Algorithms

0. Learning Objectives

Apply the gradient descent algorithm to estimate minima/maxima.
Apply Newton’s method to estimate zeros.
Explain k-means clustering.

1. Definitions

An iterative algorithm:

Starts with an initial guess
Produces approximate solutions based on previous ones
Has a termination condition

Direct methods: Finite operations for exact solutions (assuming exact arithmetic).

2. Convergence

An algorithm converges if its approximate solutions approach a value $S$ within any error tolerance $\epsilon > 0$ after sufficient iterations.

Example: Newton’s Method

For finding roots of $f(x)$: $x_{n+1} = x_n - \frac{f(x_n)}{f’(x_n)}$

Newton's Method

g = lambda x: (x ** 4 / 4 - 2 * x ** 3 / 3 - x ** 2 / 2 + 2 * x + 2)
f = lambda x: g(x - 2)
h = 1e-4
f_prime = lambda x: (f(x + h) - f(x)) / h
gamma = 0.01
guess = 7   # initial guess
for _ in range(10):
    print(guess)
    guess = guess - gamma * f_prime(guess)

Example: Gradient Descent

Local vs Global Optima

Local Minimum: $f(x) \leq f(y)$ for all $y$ near $x$
Global Minimum: $f(x) \leq f(y)$ for all $y$ in domain of $f$

Algorithm

$x_{n+1} = x_n - \eta \nabla f(x_n)$

$\eta$: learning rate
$\nabla f(x_n)$: gradient at $x_n$

Gradient Descent

Converges to local minimum under certain conditions.

Avoiding Local Minima

Multiple starting points
Stochastic jumps
Accepting occasional increases

Stopping Criteria

Maximum iterations
Small changes in guesses
Small changes in function value

4. K-means Clustering

K-means groups data into $k$ clusters, minimizing the Within Clusters Sum of Squares (WCSS):

\[\text{WCSS} = \sum_{i=1}^k \sum_{x \in S_i} \|x - \mu_i\|^2\]

Lloyd’s Algorithm

Pick $k$ random points as initial centers
Assign points to nearest center
Recompute centers
Repeat steps 2-3 until convergence

K-means

Limitations

May converge to local minimum
Sensitive to outliers
Creates only convex clusters

Python implementation

import pandas as pd
import plotnine as p9
import random
import numpy as np

k = 3
df = pd.read_csv('https://gist.githubusercontent.com/netj/8836201/raw/iris.csv')

def normalize(series):
    return (series - series.mean()) / series.std()

df['petal.length.normalized'] = normalize(df['petal.length'])
df['petal.width.normalized'] = normalize(df['petal.width'])

pts = [np.array(pt) for pt in zip(df['petal.length.normalized'], df['petal.width.normalized'])]
centers = random.sample(pts, k)
old_cluster_ids, cluster_ids = None, [] # arbitrary but different
while cluster_ids != old_cluster_ids:
    old_cluster_ids = list(cluster_ids)
    cluster_ids = []
    for pt in pts:
        min_cluster = -1
        min_dist = float('inf')
        for i, center in enumerate(centers):
            dist = np.linalg.norm(pt - center)
            if dist < min_dist:
                min_cluster = i
                min_dist = dist
        cluster_ids.append(min_cluster)
    df['cluster'] = cluster_ids
    cluster_pts = [[pt for pt, cluster in zip(pts, cluster_ids) if cluster == match] for match in range(k)]
    centers = [sum(pts)/len(pts) for pts in cluster_pts]

(p9.ggplot(df, p9.aes(x="petal.length.normalized", y="petal.width.normalized", color="cluster"))
+ p9.geom_point()).draw()