Kalman Filter

Originally developed for tracking rocket trajectories, the Kalman Filter has become the gold-standard for dynamically estimating linear model parameters. Ideal for estimating hedge ratios in pairs trading applications.

Density Estimation / Anomaly Detection

Kernel Density Estimator - The KDE is a non-parametric method which learns the shape of the underlying distribution of the input dataset. KDE's are extremely useful in outlier / anomaly detection and for simulating correlated random variables.

BasicA2D

This model is for univariate anomaly detection. Outputs a score for each new datapoint between 0 and 1, where 0.5 is defined as "normal" and either extreme is anomalously low or high, respectively. Perfect for distributed anomaly detection in a trading or social media setting.

PNN Classifier

The Probabilistic neural network (or PNN for short) is a model for determining the label of new datapoints. Based on the combination of a Bayesian network and Fisher discriminant analysis, the PNN is ideal for non-linear classification problems.

Minimum Spanning Tree (MST)

A simple yet powerful algorithm for inducing network graphs from input data. The graph model treats every datapoint as a node in a network. Similar nodes are be connected to each other nodes via links. The MST must maintain a tree structure without any loops. Especially useful for modeling the correlation structure of a dataset.

Correlation Filtered Graph (CFG)

Similar to an MST, the CFG begins by inducing a network graph from the correlation matrix of the input dataset. Unlike a MST, CFG's can contain loops and contain far more link information. Ideal for analyzing correlation structure and cluster dynamics within time series data.

Isolation Forest

Multivariate anomaly detection algorithm based on random forests. Perfect for detecting anomalous / outlier data points in complex high dimension datasets.

Neighbor Network Graph

Similar to a CFG but even more flexible in its construction. While the MST and CFG look primarily at correlation, the Neighbor Network can be constructed using a number of different distance metrics. Perfect for clustering.

Lazy Matrix

A specialized distance matrix for larger matrices (up to a million cells). Optimized for recommendation system applications.

Matrix Minimum Spanning Tree

Creates a minimum spanning tree directly from a distance matrix. This provides more flexibility as any distance metric / kernel can be employed. Useful for abstracting the similarity structure of a dataset

Matrix Kernel PCA

Performs matrix decomposition directly on a distance matrix model as input.

Matrix Agglomerator

An integral part of supervised manifold learning, the matrix agglomerator model transforms a distance matrix model using label class data. This model increases the differentiation in a distance matrix by pulling points with the same label together.

Kernel PCA

Non-linear dimensionality reduction. KPCA employs the kernel trick to provide a nonlinear extension of Principal Components Analysis. Select from Linear, RBF, and Polynomial kernels.

Local Linear Embedder

Embeds high dimension data in a lower dimension space by preserving the distances in the local neighborhood of each point.Embeds high dimension data in a lower dimension space by preserving the distances in the local neighborhood of each point.

Laplacian Eigen mapper

Performs non-linear dimensionality reduction with the graph laplacian.

Isomap

Embeds high dimension data in a low dimension space by aiming to preserve the global geodesic distance matrix. The geodesic distance is the number of edges on the shortest path between two nodes. Because of its inate network structure, the isomap is an incredibly versatile model, useful for network graph statistics, classification, visualization, and regression.

KNN Regressor

Multi-output regression model based on the K-Nearest neighbor algorithm. It can learn highly nonlinear relationships, but can also be subject to the curse of dimensionality.

Random Forest Regressor

Multi-output regression model based on the random forest algorithm. A powerful algorithm for nonlinear datasets.

Kernel Ridge Regressor

Multi-output regression model based on the kernel trick and ridge regression. This model can learn both linear and nonlinear relationships.