As a rule of thumb, you typically want a dataset used for machine learning to have at least 5 times as many rows as it has columns. And in datasets with a low ratio of samples (rows) to features (columns), PCA can be used to increase that ratio. A side effect of applying PCA to a dataset is that less important features – columns of data that have less relevance to the outcome of a predictive model – are removed, while multicollinearity among columns is eliminated. Obfuscating datasets so they can be shared with others without revealing the nature or meaning of the dataĪnd that’s not all.Reducing data to n dimensions and then restoring the original number of dimensions, which finds application in anomaly detection and noise filtering.Reducing high-dimensional data to 2 or 3 dimensions so it can be plotted and explored.And it lends itself to a variety of practical uses, including: What if I told you that you could take a dataset with 1,000 columns, use PCA to reduce it to 100 columns, and retain 90% or more of the information in the original dataset? That’s pretty common, believe it or not. While that might seem underwhelming on the face of it, it has profound implications for engineers and software developers working to build predictive models from their data. It’s a dimensionality-reduction technique that reduces the number of dimensions in a dataset without sacrificing a commensurate amount of information. Principal Component Analysis, or PCA, is one of the minor miracles of machine learning.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |