StatSimVis Version: 0.3.1

Visualize high-dimensional data online

Explore CSV datasets using dimensionality reduction and feature importance methods

Feature extraction is the process of reducing the number of variables (i.e., columns) in a dataset by obtaining a smaller representative set of features using dimensionality reduction methods (Wiki). You can apply dimensionality reduction to any tabular CSV dataset using StatSim Vis, a 100% free and open-source tool for feature extraction and visualization. It supports PCA, t-SNE, UMAP, SOM, and Autoencoders. To understand your data even better, try the feature importance method to know how strong the dependency between input variables and the target.

Dimensionality reduction

In many cases, real-world datasets contain more than 2 or 3 variables (i.e., columns). For us humans, it's tough to analyze and reason in high dimensions. Computers work nicely with high-dimensional data, but sometimes we still need to get some bird-view over a dataset. Luckily various projections methods can map data with many variables to a low-dimensional space we understand.

2D/3D view of a dataset

All dimensionality reductions techniques support mapping from a high-dimensional space to two dimensions. That works for most datasets. In 2D mode, you can also select and save a data subset using the lasso tool: . However, plotting data in three dimensions makes it possible to recognize patterns even better. You can rotate, zoom and then choose the most effective 3D to 2D projection.

Feature importance

Sometimes we are interested in a specific column of a dataset and how it depends on other variables. Let's call that column a target variable. Historically a correlation coefficient was used as a measure of such dependency. However, correlation works only with linear relationships and fails in many cases. Feature importance relies on a more complex model under the hood. It estimates non-linear dependencies and variable interactions.

All processing and visualization happens in your browser. We don't see, collect or sell data you explore

Star Issue