Cracking Machine Learning Interviews - 01 Feature Engineering

1. Why do we need to apply normalization to numerical features? There are two common ways of normalization: a. Min-Max Scaling $$X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}}$$ This method can scale the data into a range of [0,1). b. Z-Score Normalization $$z = \frac{x-\mu}{\rho}, \quad \rho=\sqrt{\sum\frac{(x_i-\mu)^2}{N}}$$ This method will scale the data and make the mean value and standard deviation of the new data become 0 and 1 respectively. When the scales of features are different, the gradients of weights of features can be very different, leading to a different ’learning pace’ of each weight, shown as a zig-zag on the gradient plot....

May 15, 2022 · 3 min · Weipeng Zhang

Cracking Machine Learning Interviews - 02 Model Evaluation

The limitation of metrics. 1. What’s the limitation of accuracy? When the positive samples and negative samples are imbalanced, accuracy may not correctly reflect the performance of the model. 2. How do we balance precision and recall rate? We can use the Precision-Recall curve, ROC or F1 score to evaluate the performance of a ranking/classification model. $$F1 = \frac{2\times precision \times recall}{precision + recall}$$ 3. The RMSE of the model is high but 95% of samples in the test set are predicted with a small error, why is that?...

May 15, 2022 · 2 min · Weipeng Zhang