1. What is feature scaling? Feature scaling is a method used to normalize all independent values of our data. We also call it as data normalization and is generally performed before running machine learning algorithms. 2. Why do we need to use feature scaling? In practice the range of raw data is very wide and hence the object functions will not work properly (means that it will stuck at local optimum), or they are time-consuming without normalization. For example: K-Means, might give you totally different solutions depending on the preprocessing methods that you used. This is because an affine transformation implies a change in the metric space: the Euclidean distance between two samples will be different after that transformation. When we apply gradient descent, feature scaling also helps it to converge much faster that without normalized data. With and without feature scaling in gradient descent 3. Methods in feature scaling? Rescaling The simp...
I. What is manifold learning Dimensionality Reduction The accuracy of the training algorithms is directly proportional to the amount of data we have. However, managing a large number of features is usually a burden to our algorithm. Some of these features may be irrelevant, so it's important to make sure that the final model doesn't get affected by this. What is Dimensionality? Dimensionality refers to the minimum number of coordinates needed to specify any point within a space or an object. Why do we need Dimensionality Reduction? If you keep the dimensionality high, it will be nice and unique but it may not be easy to analyse because of complexity involved. Apart from simplifying data, visualization is also an interesting and challenging application. Linear Dimensionality Reduction Fig 1 : PCA illustration (source: http://evelinag.com/blog ) Principle Component Analysis Given a data set, PCA finds the directions along which the d...
I. Introduction Feature selection is the process of selecting feature or sub set of features for build model in machine learning. A good feature selection method leads to build a better performing model, and hence helps to improve our results. In particular, there are two reasons explaining why feature selection is important: If we are able to reduce number of features, it means that we reduce 'over-fitting' and improve the generalization of models. Moreover, it also helps to reduce the measurement and storage requirements, reducing training and utilization times. We gain a better understanding of the underlying structure and characteristics of the data and leads to better intuition of the algorithm. In this blog, we briefly present some popular methods in the feature selection. We use python and scikit-learn to run all the experiments for feature selection. The data including 506 samples and 13 features is available at (http://scikit-learn.org/stable/datasets/ ). ...
Comments
Post a Comment