hw 4

Due date: April 5 (midnight), on Github.

Build a complete pipeline with a data set of your choice and a tree-based model of your choice in R (using tidymodels) or Python (using scikit-learn). For each step, include a paragraph explaining why you did that step the way you did (what components were included and, possibly, what you decided not to do).

a brief description of where the data came from
some initial investigation of the data [which textual or graphical summaries did you investigate? Did you find anything unusual?]
preprocessing step(s) [scaling, feature engineering/variable selection {based on predictors only}, lumping or dropping categories from predictors, one-hot encoding, etc.]
model choice [What model classes did you pick? Why?]
model tuning [What hyperparameters did you tune? How? What loss function did you use and why? What was the range of achieved/minimized loss functions?]
determining and fitting the best model
evaluate and explain the results of the model [partial dependence plots, variable importance, etc.]

For the gradient boosting algorithm, we want to unpack step 2(c) of algorithm 10.3 (ESL) to derive the optimal value of the weights (\(\gamma_{jm}\)) for each leaf \(j\) at boosting step \(m\). (The derivations in Chen and Guestrin (2016) or Bujokas (2022) may be clearer.)

Derive \(\gamma_{jm}\) for both the MSE (\(L_2\) norm) and binomial deviance loss functions.
Do the same for Newton boosting (Chen and Guestrin 2016), where we use a second-order rather than a first-order approximation to the loss function.

Cite and comment on all references that you used

references

Bujokas, Eligijus. 2022. “Gradient Boosting in Python from Scratch.” Medium. https://towardsdatascience.com/gradient-boosting-in-python-from-scratch-788d1cf1ca7.

Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–94. https://doi.org/10.1145/2939672.2939785.