StatSimSelect Version: 0.2.0

Feature selection with Boruta.js online

Find all relevant variables in a dataset using a robust feature importance method

Feature selection methods try to identify important variables in a dataset. Many of those methods search for a minimal feature subset that can predict a target variable well. When you want to understand relationships between variables, it's more important to select all relevant variables, not just the minimal set. Boruta (Kursa, Rudnicki 2010) is one of the most advanced methods for such all-relevant feature selection. We ported the original Boruta algorithm to JavaScript, so you can use it online without installing R, Python, or sending data to some web server. All processing happens in the browser, on your local machine.

Select all relevant variables

Boruta makes it possible to select all critical features, not a minimal subset of them. That works even when some of them are correlated.

Compare with shadow features

The script adds additional "shadow" variables during data processing. Comparing with them, you can check how variables of a dataset are better than noize.

Choose from multiple models

The original Boruta lib uses the Random Forest method under the hood. We added more models that Boruta can use to calculate feature importance.

Make statistically grounded conclusions

Instead of having a hard threshold for feature selection, the script uses a probabilistic approach to reject or accept variables in a dataset.

All processing and visualization happens in your browser. We don't see, collect or sell data you explore

Star Issue

`