Originally developed for tracking rocket trajectories, the Kalman Filter has become the gold-standard for dynamically estimating linear model parameters. Ideal for estimating hedge ratios in pairs trading applications.
Density Estimation / Anomaly Detection
Kernel Density Estimator - The KDE is a non-parametric method which learns the shape of the underlying distribution of the input dataset. KDE's are extremely useful in outlier / anomaly detection and for simulating correlated random variables.
This model is for univariate anomaly detection. Outputs a score for each new datapoint between 0 and 1, where 0.5 is defined as "normal" and either extreme is anomalously low or high, respectively. Perfect for distributed anomaly detection in a trading or social media setting.
The Probabilistic neural network (or PNN for short) is a model for determining the label of new datapoints. Based on the combination of a Bayesian network and Fisher discriminant analysis, the PNN is ideal for non-linear classification problems.
Minimum Spanning Tree (MST)
A simple yet powerful algorithm for inducing network graphs from input data. The graph model treats every datapoint as a node in a network. Similar nodes are be connected to each other nodes via links. The MST must maintain a tree structure without any loops. Especially useful for modeling the correlation structure of a dataset.
Correlation Filtered Graph (CFG)
Similar to an MST, the CFG begins by inducing a network graph from the correlation matrix of the input dataset. Unlike a MST, CFG's can contain loops and contain far more link information. Ideal for analyzing correlation structure and cluster dynamics within time series data.
Multivariate anomaly detection algorithm based on random forests. Perfect for detecting anomalous / outlier data points in complex high dimension datasets.
Neighbor Network Graph
Similar to a CFG but even more flexible in its construction. While the MST and CFG look primarily at correlation, the Neighbor Network can be constructed using a number of different distance metrics. Perfect for clustering.
A specialized distance matrix for larger matrices (up to a million cells). Optimized for recommendation system applications.
Matrix Minimum Spanning Tree
Creates a minimum spanning tree directly from a distance matrix. This provides more flexibility as any distance metric / kernel can be employed. Useful for abstracting the similarity structure of a dataset
Matrix Kernel PCA
Performs matrix decomposition directly on a distance matrix model as input.
An integral part of supervised manifold learning, the matrix agglomerator model transforms a distance matrix model using label class data. This model increases the differentiation in a distance matrix by pulling points with the same label together.
Non-linear dimensionality reduction. KPCA employs the kernel trick to provide a nonlinear extension of Principal Components Analysis. Select from Linear, RBF, and Polynomial kernels.
Local Linear Embedder
Embeds high dimension data in a lower dimension space by preserving the distances in the local neighborhood of each point.Embeds high dimension data in a lower dimension space by preserving the distances in the local neighborhood of each point.
Laplacian Eigen mapper
Performs non-linear dimensionality reduction with the graph laplacian.
Embeds high dimension data in a low dimension space by aiming to preserve the global geodesic distance matrix. The geodesic distance is the number of edges on the shortest path between two nodes. Because of its inate network structure, the isomap is an incredibly versatile model, useful for network graph statistics, classification, visualization, and regression.
Multi-output regression model based on the K-Nearest neighbor algorithm. It can learn highly nonlinear relationships, but can also be subject to the curse of dimensionality.
Random Forest Regressor
Multi-output regression model based on the random forest algorithm. A powerful algorithm for nonlinear datasets.
Kernel Ridge Regressor
Multi-output regression model based on the kernel trick and ridge regression. This model can learn both linear and nonlinear relationships.