Cracking Machine Learning Interviews - 01 Feature Engineering
1. Why do we need to apply normalization to numerical features? There are two common ways of normalization: a. Min-Max Scaling $$X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}}$$ This method can scale the data into a range of [0,1). b. Z-Score Normalization $$z = \frac{x-\mu}{\rho}, \quad \rho=\sqrt{\sum\frac{(x_i-\mu)^2}{N}}$$ This method will scale the data and make the mean value and standard deviation of the new data become 0 and 1 respectively. When the scales of features are different, the gradients of weights of features can be very different, leading to a different ’learning pace’ of each weight, shown as a zig-zag on the gradient plot....