Salo-Zeros¶
Bucket path¶
gs://pyregence-tree-mortality/zeros/
Preprocessing¶
The extent of salo-naip
response data is rather small, but we want to be able to apply the model statewide. In order to capture a broader range of feature data, we created a series of "zeros" datasets. These are blank (all zero) raster files created over a series of different ecotypes in the state (ag
, high-peaks
, desert
, forest
, woodland
).
The goal of including these data was to increase the diversity of feature data samples so the model could generalize. When we first included them in our modeling attempts, we accidentally created a modeling problem for ourselves. Since the data were so imbalanced to begin with (<2% of the pixels in the Sierra National Forest regions), adding many new sites with all zero values created such an imbalanced dataset that all model predictions were zeros.
So we'll probably want to only draw a limited number of samples from these zero sites, increasing the range of features the model can see while ensuring we don't create a model that exclusively predicts zeros.