Learn how to efficiently apply state-of-the-art Dimensionality Reduction methods and boost your Machine Learning models.
When approaching a Machine Learning task, have you ever felt stunned by the massive number of features?
Most Data Scientists experience this overwhelming challenge on a daily basis. While adding features enriches data, it often slows the training process and makes it harder to detect hidden patterns, resulting in the (in)famous Curse of Dimensionality.
Moreover, in high-dimensional spaces surprising phenomena happen. To depict this concept with an analogy, think of the novel Flatland, where characters living in a flat (2-dimensional) world find themselves stunned when they encounter a 3-dimensional being. In the same way, we struggle to comprehend that, in high-dimensional spaces, most of the points are outliers, and distances between points are usually larger than expected. All these phenomena, if not treated correctly, may have disastrous implications for our Machine Learning models.
In this post, I will explain some advanced Dimensionality Reduction techniques used to mitigate this issue.
In my previous post, I introduced the relevance of Dimensionality Reduction in Machine Learning problems, and how to tame the Curse of Dimensionality, and I explained both the theory and Scikit-Learn implementation of the Principal Component Analysis algorithm.
In this follow-up, I will dive into additional Dimensionality Reduction algorithms, like kPCA or LLE, that overcome the limitations of PCA.
Do not worry if you haven’t read my Dimensionality Reduction introduction yet. This post is a stand-alone guide as I will detail each concept in simple terms. However, if you prefer to know more about PCA, I’m positive this guide will serve your goal: