Time Series EDA with STUMPY

About Me

  • Senior Machine Learning Engineer @ Union.ai
  • Maintainer for scikit-learn


Contents

  • Motivation 💭
  • Matrix Profile 🪪
  • Applications 🚀
  • Computation 💻

Motivation 💭

Matrix Profile 🪪

Matrix Profile 🪪

Detail: Z-normalized

x = np.asarray([0, 1, 3, 2], dtype=np.float64)
y = np.asarray([1, 2, 2, 10], dtype=np.float64)

x_normed = (x - np.mean(x)) / np.std(x)
y_normed = (y - np.mean(y)) / np.std(y)

z_normed_distance = np.sqrt(np.sum((x_normed - y_normed)**2))

Matrix Profile 🪪

Applications 🚀

Applications 🚀

  • Motify Discovery 💡
  • Anomaly / Novelty Discovery 👽
  • Semantic Segmentation ✂
  • Fast Pattern Matching 🪢
  • Time Series Chains ⛓️
  • Simliarties Between Two Time Series 🧬
  • Shapelet + ML 💠

Tutorials: stumpy.readthedocs.io/en/latest/tutorials.html

Motify Discovery 💡

Motify Discovery 💡

Motify Discovery 💡

stumpy.stump

m = 640
mp_steam = stumpy.stump(steam_flow, m)

motif_idx = np.argsort(mp_steam[:, 0])[0]
nearest_neighbor_idx = mp_steam[motif_idx, 1]

Anomaly / Novelty Discovery 👽

Anomaly / Novelty Discovery 👽

Semantic Segmentation ✂

Semantic Segmentation ✂

Semantic Segmentation ✂

Semantic Segmentation ✂

stump.fluss

m = L = 210
mp_adp = stumpy.stump(abp, m=m)

cac, regime_locations = stumpy.fluss(
    mp_adp[:, 1],
    L=L,
    n_regimes=2,
    excl_factor=1,
)

Fast Pattern Matching 🪢

Fast Pattern Matching 🪢

Fast Pattern Matching 🪢

Query

Fast Pattern Matching 🪢

stumpy.mass

distance_profile = stumpy.mass(query, ts)

idx = np.argmin(distance_profile)

Fast Pattern Matching 🪢

Fast Pattern Matching 🪢

stump.match

matches = stumpy.match(
    query, ts,
    max_distance=lambda D: max(np.mean(D) - 4 * np.std(D), np.min(D))
)

Time Series Chains ⛓️

Time Series Chains ⛓️

Time Series Chains ⛓️

Time Series Chains ⛓️

Time Series Chains ⛓️

stump.allc

m = 20
mp_volume = stumpy.stump(volume, m=m)

all_chain_set, unanchored_chain = stumpy.allc(
    mp_volume[:, 2],
    mp_volume[:, 3],
)

Time Series Chains ⛓️

Simliarties Between Two Time Series 🧬

Simliarties Between Two Time Series 🧬

Simliarties Between Two Time Series 🧬

stumpy.stump

m = 500
under_pressure_mp = stumpy.stump(
  T_A = under_pressure,
  m = m,
  T_B = ice_ice_baby,
  ignore_trivial = False
)

Simliarties Between Two Time Series 🧬

under_pressure_motif_index = under_pressure_mp[:, 0].argmin()

vanilla_ice_motif_index = under_pressure_mp[under_pressur_motif_index, 1]

Shapelet 💠

Shapelet 💠

Shapelet 💠

m = 38

P_Point_Point_mp = stumpy.stump(T_A=point, m=m)

P_Point_Gun_mp = stumpy.stump(T_A=point, m=m, T_B=gun, ignore_trivial=False)

Shapelet 💠

Shapelet + ML 💠

print(train_ts.shape)
# (50, 150)

# Uses `stumpy.mass`
X_train = distance_to_shapelets(train_ts, point_shapelets)
print(X_train.shape)
# (50, 10)

X_test = distance_to_shapelets(test_ts, point_shapelets)

Shapelet + ML 💠

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

clf = RandomForestClassifier()

clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print(f"Accuracy for ML model: {round(accuracy_score(y_test, y_pred), 3)}")
# Accuracy for ML model: 0.927

Computation 💻

GPUs

stumpy.gpu_stump

import stumpy

mp = stumpy.gpu_stump(time_series, m=m)

Distributed STUMP

import stumpy
from dask.distributed import Client

with Client() as dask_client:
    mp = stumpy.stumped(dask_client, time_series, m=m)

Fast Approximate Matrix Profiles

  • stumpy.scrump
approx = stumpy.scrump(time_series, m, ...)

Pan Matrix

stump.stimp

Streaming Data

stump.stumpi

stream = stumpy.stumpi(initial_time_series, m, egress=False)

stream.update(new_data_point)

Time Series EDA with STUMPY

  • Motivation 💭
  • Matrix Profile 🪪
  • Applications 🚀
  • Computation 💻

Time Series EDA with STUMPY