OPTIMIZATION ON SPHERE Problem: Given a unit sphere $S^n$ in $R^{n+1}$ and a smooth function $f: S^n \rightarrow R$ Find an element $x_{*} \in S^n$ such that there is a neighbourhood $V$ of $x_{*}$ with $f(x_{*}) \leq f(x), \forall x \in V$. In this slides, I will introduce some notations and the line-search method to perform minimization task on sphere. slides: CSE_SMU seminar 30/01/2016
Posts
MM Algorithm[unfinished]
- Get link
- X
- Other Apps
By
Unknown
-
MM algorithm 1. Overview of MM algorithm The MM algorithm is not an algorithm, but a prescription for constructing optimization algorithms. The EM algorithm from statistics is a special case. An MM algorithm operates by creating a surrogate function that minorizes or majorizes the objective function. When the surrogate function is optimized, the objective function is driven uphill or downhill as needed. In minimization MM stands for majorize/minimize, and in maximization MM stands for minorize/maximize. 2. Rationale for the MM principle It can generate an algorithm that avoids matrix inversion. It can separate the parameters of a problem. It can linearize an optimization problem. It can deal gracefully with equality and inequality constraints. It can turn a non-differentiable problem into a smooth problem. 3. Majorization and definition of algorithms A function $g(θ | θ^n)$ is said to majorize the function f(θ) ...
Conditional Random Fields (CRF) -- unfinished
- Get link
- X
- Other Apps
By
James Hoang
-
Parameter estimation in text modeling [unfinished]
- Get link
- X
- Other Apps
By
James Hoang
-
In this post, we would like to present some parameter estimation methods common with discrete probability distribution, which is very popular in text modeling. Then we explain the model of Latent Dirichlet Allocation (LDA) in detail. I. Introduction There are two inference problems in parameter estimation: (1) how to estimate values for a set of distribution parameters theta that can best explain a set of observation data. (2) calculate the probability of a new observation given by previous observation. We introduce Bayes' rule to solve these problem above. Bayes' rule is defined as: and may be called: We will introduce maximum likelihood, a posteriori and Bayesian estimation, central concepts like conjugate distributions and Bayesian networks to tackle two problems. II. Parameter estimation methods 1. Maximum likelihood estimation Maximum likelihood (ML) tries to find the parameters that maximize the likelihood The common way to obtain the parameter es...
Neuron Network in Machine Learning [unfinished]
- Get link
- X
- Other Apps
By
Loc Do
-
(Last updated: 05/12/2015) 1. Why we need to study neural computation ? - To understand how the brain actually works. - To understand a style of parallel computation inspired by neurons and their adaptive connections - To solve practice problems by using novel learning algorithms inspired by the brain 2. What are neural networks ? A typical cortical neuron has a gross physical structure consisting of 1) a cell body 2) an axon where it sends messages to other neurons 3) a dendritic tree where it receives messages from other neurons - Axon contacts dendritic trees at synapses - Spike generation: enough charge in its dendritic tree, depolarize an axon hillock (part of a cell body), sends a spike out along its axon. (Spike: a wave of depolarization travelling along the axom). Synapses - contains little vesicles of transmitter chemical (ones implementing positive weights and ones implementing negative weights) ...
Feature scaling
- Get link
- X
- Other Apps
By
James Hoang
-
1. What is feature scaling? Feature scaling is a method used to normalize all independent values of our data. We also call it as data normalization and is generally performed before running machine learning algorithms. 2. Why do we need to use feature scaling? In practice the range of raw data is very wide and hence the object functions will not work properly (means that it will stuck at local optimum), or they are time-consuming without normalization. For example: K-Means, might give you totally different solutions depending on the preprocessing methods that you used. This is because an affine transformation implies a change in the metric space: the Euclidean distance between two samples will be different after that transformation. When we apply gradient descent, feature scaling also helps it to converge much faster that without normalized data. With and without feature scaling in gradient descent 3. Methods in feature scaling? Rescaling The simp...
Notes on Statistics
- Get link
- X
- Other Apps
By
Loc Do
-
Topic #1. Distributions of functions of random variables Given random variables drawn from some known distribution, and a function of these random variables. We are interested in finding the distribution of this function's value. We are going through three techniques to find the probability distributions of random variables: distribution function technique, change of variable technique and moment-generating function technique. First, we consider function of one random variable with two techniques: distribution function technique and change of variables technique. Then, we extend the same technique to transformation of two random variables. Next, we consider cases where the random variables are independent, and sometimes identically distributed. After that, we spend sometimes with the third technique, moment generating function, to find the distribution of functions of random variables. - Source 1) https://onlinecourses.science.psu.edu/stat414/node/127