Posts

Showing posts from September, 2015

Find all pairs in array of integers whose sum is equal to a given number ?

Problem: Write a program to find all pairs of integers whose sum is equal to a given number. For example if input integer array is {1, 2, 3, 4, 5}, and the given sum is 5, the code should return two pairs {1, 4} and {2, 3}. This is a common interview question related to array structure. A quick search on Google can return several pages discussing the problem, to list but a few [1,2, 3, 4]. Those pages only discuss the solutions at the conceptual level, without supporting comparing numbers, although they also provide the code. In this note I would re-implement all these approaches and compare their efficiency in practice. Common approaches There are four approaches which are often mentioned together: Brute-force, HashSet, sorting with search and sorting with binary search. Each has its own pros and cons. Interested readers can find the details in the list of sources below. Here is a summary of the storage complexity and computational complexity. Approach Storage co

Feature selection [TODO]

Image
I. Introduction Feature selection is the process of selecting feature or sub set of features for build model in machine learning. A good feature selection method leads to build a better performing model, and hence helps to improve our results. In particular, there are two reasons explaining why feature selection is important: If we are able to reduce number of features, it means that we reduce 'over-fitting' and improve the generalization of models. Moreover, it also helps to reduce the measurement and storage requirements, reducing training and utilization times. We gain a better understanding of the underlying structure and characteristics of the data and leads to better intuition of the algorithm. In this blog, we briefly present some popular methods in the feature selection. We use python and scikit-learn to run all the experiments for feature selection. The data including 506 samples and 13 features is available at (http://scikit-learn.org/stable/datasets/ ).