I. What is manifold learning Dimensionality Reduction The accuracy of the training algorithms is directly proportional to the amount of data we have. However, managing a large number of features is usually a burden to our algorithm. Some of these features may be irrelevant, so it's important to make sure that the final model doesn't get affected by this. What is Dimensionality? Dimensionality refers to the minimum number of coordinates needed to specify any point within a space or an object. Why do we need Dimensionality Reduction? If you keep the dimensionality high, it will be nice and unique but it may not be easy to analyse because of complexity involved. Apart from simplifying data, visualization is also an interesting and challenging application. Linear Dimensionality Reduction Fig 1 : PCA illustration (source: http://evelinag.com/blog ) Principle Component Analysis Given a data set, PCA finds the directions along which the d...
Problem: Write a program to find all pairs of integers whose sum is equal to a given number. For example if input integer array is {1, 2, 3, 4, 5}, and the given sum is 5, the code should return two pairs {1, 4} and {2, 3}. This is a common interview question related to array structure. A quick search on Google can return several pages discussing the problem, to list but a few [1,2, 3, 4]. Those pages only discuss the solutions at the conceptual level, without supporting comparing numbers, although they also provide the code. In this note I would re-implement all these approaches and compare their efficiency in practice. Common approaches There are four approaches which are often mentioned together: Brute-force, HashSet, sorting with search and sorting with binary search. Each has its own pros and cons. Interested readers can find the details in the list of sources below. Here is a summary of the storage complexity and computational complexity. Approach Storage co...
1. What is feature scaling? Feature scaling is a method used to normalize all independent values of our data. We also call it as data normalization and is generally performed before running machine learning algorithms. 2. Why do we need to use feature scaling? In practice the range of raw data is very wide and hence the object functions will not work properly (means that it will stuck at local optimum), or they are time-consuming without normalization. For example: K-Means, might give you totally different solutions depending on the preprocessing methods that you used. This is because an affine transformation implies a change in the metric space: the Euclidean distance between two samples will be different after that transformation. When we apply gradient descent, feature scaling also helps it to converge much faster that without normalized data. With and without feature scaling in gradient descent 3. Methods in feature scaling? Rescaling The simp...
Comments
Post a Comment