ELMFIRE¶
ELMFIRE
is a computationally efficient fire behavior simulator written by Chris Lautenberger. We use ELMFIRE as our in-house fire behavior model, primarily for generating quantitative estimates of burn probability, flame length, spread rate, and wildfire hazard. It is difficult to work with, however, and we've written a number of scripts and wrappers to parameterize and run ELMFIRE
simulations, merge the outputs of each run into composite metrics (e.g. times_burned
), and mosaic the outputs from multiple, spatially-independent tile runs.
We completed the first state-wide runs of wildfire hazard in July 2021, which covered the full state of California. We simulated 1.8 million fires using hourly weather data, 10m resolution Forest Observatory fuels data and a lot of custom fire behavior model settings.
These simulations are run using tiled raster data, which cover a small extent with a large buffer to allow fires to spread beyond the border of each tile's internal boundary. Ignitions are only seeded inside of the internal tile boundaries, then the perimeters of each simulated fire grow based on the fuel, topography, and weather conditions at the time of the simulation. Many simulations are run for each tile, which then get merged into per-tile metrics like times_burned
(the count across all simulations that each pixel on the landscape is burned) during postprocessing. These per-tile metrics are then merged together in a process we creatively and descriptively call post-postprocessing.
California tiles¶
In order to run the hundred million+ fire simulations across the state of California, the fuels and weather data were clipped to small square "tiles" of data. This was done to reduce memory use, disk, disk reads and cpu costs on a per-simulation basis, and it also enables distributed processing across a cluster of compute nodes.
The image above shows the corners of the internal boundary for each tile. The full spatial extent of each tile, however, is buffered by 1 tile in each direction (i.e. a 3x3 external boundary for a 1x1 internal boundary). Ignitions start within the internal boundary and allowed to spread to areas within the extenal boundary.
There were 1,434 total tiles for California. After the first sweep of statewide fire simlations, 1,323 tiles ignited fires, while 111 had zero ignitions. This is because the ignition probability system uses distance to road to seed ignitions, and the tiles with no roads had no ignitions.
Running a simulation for an individual tile¶
To run a suite of fire simulations for a given tile and generate the
times_burned.tif
and related outputs, refer to
elmfire/slurm-scripts/elmfire-pipeline.sh
. At a high level,
elmfire-pipeline.sh
manages the SLURM job coordination and file system setup
for running the fire simulations (i.e. elmfire
run) and the post-processing
of the fire simulation results.
The arguments are as follows:
Simulate a single tile.
Usage: ./slurm-scripts/elmfire-pipeline.sh [-s|--simulate <arg>] [-p|--post <arg>] [-o|--out <arg>] [-j|--job <arg>] [--sim-dir <arg>] [--gcs <arg>] [--(no-)tar] [--(no-)cleanup] [--(no-)run-tile] [--(no-)postprocess] [-h|--help]
-s, --simulate: Arguments to run_tile.py. Required. (no default)
-p, --post: Arguments to times_burned.py. (no default)
-o, --out: Directory to output the completed results. Default is /jobs/$SLURM_JOBID/outputs. (no default)
-j, --job: Directory to output the logs and to use as a scratch directory during the run. Default is /jobs/$SLURM_JOBID/job. (no default)
--sim-dir: Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim. (default: '/home/alex_adamson_salo_ai/salo-wildfire/elmfire/slurm-scripts/sim')
--gcs: GCS bucket to copy results of run tile step to. If not provided, the results will not be copied to GCS. (no default)
--tar, --no-tar: Compress and upload as tar.gz GCS bucket instead of rsync to GCS bucket at end of postprocessing. Disabled by default. (off by default)
--cleanup, --no-cleanup: If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default. (on by default)
--run-tile, --no-run-tile: If enabled, run the simulation. Enabled by default. (on by default)
--postprocess, --no-postprocess: If enabled, run the postprocessing. Enabled by default. (on by default)
-h, --help: Prints help
You'll notice that the --simulate
and --post
arguments allow you to specify
the arguments provided to elmfire/src/run_tile.py
and
elmfire/src/times_burned.py
respectively. Naturally, in order to know what to
provide here, you'll need to know what these scripts do.
run_tile.py
¶
run_tile.py
manages mounting the California Forest Observatory data that are
used as inputs to elmfire, the process of configuring elmfire to use the
desired wildfire simulation parameters, the launching of an elmfire run on
SLURM, and finally, the collection and storage of the outputs of each ignition
simulation.
Arguments¶
The arguments are as follows:
usage: run_tile.py [-h] [-j JOBDIR] [-l LOGDIR] [--no-cleanup] [--partition SLURM_PARTITION] [--cpus SLURM_CPUS] [-o OUTDIR] [--format {raw,tgz}] [--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
[-b BURNTIME] [-f FUELS_TYPE] [-v VERSION] [-i INDIR] [-s SIMDIR] [--gcs GCS_DIR] [-g IGNITION_SCALER] [--mpi-cpus MPI_CPUS] [--debug]
tile year start stop
Simulate a single tile
positional arguments:
tile The tile to simulate in format "XX_YY".
year The year of weather data to load from "in".
start The hour in the weather raster to start the simulation at.
stop The hour in the weather raster to stop the simulation at.
optional arguments:
-h, --help show this help message and exit
-b BURNTIME, --burntime BURNTIME
How long (in hours) to allow fires to sim. Default is 24.
-f FUELS_TYPE, --fuels_type FUELS_TYPE
What type of fuels input to use: cfo or landfire. Default is cfo (forest observatory).
-v VERSION, --version VERSION
Version of ELMFIRE to use. Default is 0.6550.
-i INDIR, --in INDIR Directory containing the weather, fuel, etc. data to use for the simulation. Default is gs://elmfire/input.
-s SIMDIR, --sim-dir SIMDIR
Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim.
--gcs GCS_DIR GCS bucket upload path.
-g IGNITION_SCALER, --igscaler IGNITION_SCALER
Value for Ignition mask scaling factor. Default is None (use elmfire.data default)
--mpi-cpus MPI_CPUS Number of CPUs to have MPI use
--debug If enabled, ELMFIRE will run in debug mode. Disabled by default.
Slurm:
-j JOBDIR, --job-dir JOBDIR
Directory to use for temporary storage and logging. If not provided, a directory will be created.
-l LOGDIR, --log-dir LOGDIR
Directory to store logs in. If not provided, logs will be stored in jobdir.
--no-cleanup If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default.
--partition SLURM_PARTITION
The SLURM partition to use when scheduling jobs. Default is None (use SLURM default).
--cpus SLURM_CPUS The number of CPUs (tasks) to request from SLURM. Default is None (use SLURM default).
-o OUTDIR, --out OUTDIR
Directory to output the intermediate and completed results. Default is None (do not output anywhere).
--format {raw,tgz} Format to use when outputting the results. 'raw' means upload the results as they are. 'tgz' (default) means pack the results in a tar.gz archive before uploading.
--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Provide logging level. Default is "warning".
If used through elmfire_pipeline.sh
, --sim-dir
(as --sim-dir
),
--gcs
(as --gcs
), --format
(as --raw
), --out
(as <--job>/intermediate
),
and --job-dir
(as <--job>/run_tile
) are provided by elmfire-pipeline.sh
and should not be modified through the --simulate
argument.
--burntime
and --version
control the elmfire configuration file that will
be used. elmfire configuration files are located in
elmfire/slurm-scripts/sim/elmfire_data
and have the format
elmfire-<--version>-<--burntime>hr.data
. --in
specifies the location of the
GCS bucket containing the CFO input data.
Additionally, --version
is used to set which elmfire binary will be used to
run the simulation.
The positional arguments tile
and year
control which tile to simulate for
what year, and therefore which subdirectory of --in
to load input data from
since the bucket is expected to be organized by tile and year
(see elmfire/slurm-scripts/sim/01-setup-cfo.sh
and its usage in run_tile.py
).
The remaining positional arguments are start
and stop
. These arguments
control the endpoint of the half-open range ([start, stop)
) of weather band
rasters that will be used during the simulation. start
and stop
control
the METEOROLOGY_BAND_START
and METEOROLOGY_BAND_STOP
parameter values
of the elmfire-[...].data
file (discussed above, see
elmfire/src/elmfire-simulate-tile.sh
for how they're provided to elmfire).
Similarly, --igscaler
controls the IGNITION_MASK_SCALE_PARAMETER
parameter
value in the elmfire data file.
The --partition
argument controls which SLURM cluster the invocations of
elmfire-simulate-tile.sh
(where elmfire itself is finally invoked) will run
on.
Respectively, the --cpus
and --mpi-cpus
arguments control how many CPUs to
request from SLURM for each run (you should specify the number of logical cores
on each machine in the SLURM cluster) and how many CPUs to expose to MPI (you
should specify the number of physical cores on each machine, i.e. half of the
number of logical cores since hyperthreading is disabled).
Summary of operation¶
When running in Monte Carlo mode, elmfire produces a binary file for each
simulated ignition. The goal of run_tile.py
is to get elmfire to successfully
simulate some number of ignitions (determined by weather conditions,
the ignition scale factor, and other inputs) for each weather band, and then
collect the binary files (formatted as toa_<band>_<ignition number>.bin
) for
post-processing (see below). elmfire can be extremely flaky and is prone to
falling over unexpectedly, so we need a wrapper script to babysit, monitor
progress, and get it to resume when it falls over. That is where run_tile.py
comes in.
run_tile.py
begins by mounting the GCS bucket (or local directory) and
then calling 01-setup-cfo.sh
, a script which will extract the appropriate
inputs for the tile and year to be simulated. Once this is done, run_tile.py
will repeatedly submit SLURM jobs to launch elmfire-simulate-tile.sh
(which
will run elmfire itself) until all ignitions for all weather bands have
successfully completed. When launched, elmfire will output a file called
cases_to_run.csv
into its output directory which contains a listing of how
many ignitions (and therefore binary files) are expected per weather band.
When the SLURM job completes, we can check the output directory for how many
ignitions (and therefore binary files) have been completed per band (since,
as mentioned above, the naming scheme for the files indicates the corresponding
band). If the job did not complete successfully, some bands will be missing some
ignitions. In this case, we take note of the first incomplete band, and for all
prior successful bands, collect the binary files into a separate output
directory. We then launch a new SLURM job specifying to
elmifre-simulate-tile.sh
that the new start band should be the previous first
incomplete start band. This process continues until all bands have been
successfully simulated.
Once all bands have been successfully simulated, we collect the binary files and output them to the specified directory.
postprocess_tile.py
¶
postprocess_tile.py
manages converting the binary files output by
run_tile.py
into the final times_burned
, flame_length
, etc. output
rasters for the tile.
Arguments¶
usage: postprocess_tile.py [-h] [-j JOBDIR] [-l LOGDIR] [--no-cleanup] [--partition SLURM_PARTITION] [--cpus SLURM_CPUS] [-o OUTDIR] [--format {raw,tgz}]
[--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-i INTERMEDIATE_DIR] [-v VERSION] [--mpi-cpus MPI_CPUS] [-s SIMDIR] [--gcs GCS_DIR] [--sync-dir SYNC_DIR] [--rerun-postproc]
tile
Postprocess a tile simulation
positional arguments:
tile The tile to simulate in format "XX_YY".
optional arguments:
-h, --help show this help message and exit
-i INTERMEDIATE_DIR, --intermediate INTERMEDIATE_DIR
Directory containing the intermediate results to be postprocessed.
-v VERSION, --version VERSION
Version of ELMFIRE to use. Default is 0.6550.
--mpi-cpus MPI_CPUS Number of CPUs to have MPI use
-s SIMDIR, --sim-dir SIMDIR
Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim.
--gcs GCS_DIR GCS bucket upload path.
--sync-dir SYNC_DIR Directory to sync to GCS bucket.
--rerun-postproc Flag to rerun postprocessing after initial failure
Slurm:
-j JOBDIR, --job-dir JOBDIR
Directory to use for temporary storage and logging. If not provided, a directory will be created.
-l LOGDIR, --log-dir LOGDIR
Directory to store logs in. If not provided, logs will be stored in jobdir.
--no-cleanup If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default.
--partition SLURM_PARTITION
The SLURM partition to use when scheduling jobs. Default is None (use SLURM default).
--cpus SLURM_CPUS The number of CPUs (tasks) to request from SLURM. Default is None (use SLURM default).
-o OUTDIR, --out OUTDIR
Directory to output the intermediate and completed results. Default is None (do not output anywhere).
--format {raw,tgz} Format to use when outputting the results. 'raw' means upload the results as they are. 'tgz' (default) means pack the results in a tar.gz archive before uploading.
--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Provide logging level. Default is "warning".
If used through elmfire_pipeline.sh
, --sim-dir
(as --sim-dir
),
--gcs
(as --gcs
), --format
(as --raw
), --out
(as <--out>
),
--intermediate
(as <--job>/intermediate
) and --job-dir
(as
<--job>/postprocess_tile
) are provided by elmfire-pipeline.sh
and should not be modified through the --post
argument.
The usage of --partition
, --cpus
, and --mpi-cpus
is identical to their
usage in run_tile.py
.
--version
controls which binary of elmfire_post
will be used (see
elmfire/slurm-scripts/sim/postprocess.sh
).
Summary of operation¶
postprocess_tile.py
manages launching a SLURM job that will invoke
postprocess.sh
. postprocess.sh
will construct the elmfire.data
file using
input parameters gathered from the metadata files placed (by run_tile.py
) in
the directory specified by the --intermediate
argument. It will collect the
CELLSIZE
, location of the lower-left corner, and a few other inputs from the
elmfire.data used by elmfire when running the simulations. The script will then
run elmfire_post_<--version>
to create the final output rasters from the
binary files output by run_tile.py
and copy them to the directory specified by
--out
.
Using elmfire-pipeline.sh
¶
Taking together the above, a typical invocation will then look something like
slurm-scripts/elmfire-pipeline.sh --out "mydir/postprocessed_out" --job "mydir" \
--gcs "gs://elmfire/my_runs/${tile}/${ignition_factor}" \
-s "--log DEBUG --burntime 24 --version 0.6551.salo --in gs://elmfire/inputs ${tile} 2020 2925 3000 --partition=elmfire-pe --cpus 32 --mpi-cpus 16 -g ${ignition_factor}" \
-p "--log DEBUG --version 0.6551.salo --partition=elmfire-pe --cpus 32 --mpi-cpus 16"
Simulation post-processing¶
Someone should document this!
Merging tile runs in post-postprocessing¶
The postprocessing routines generate a series of composite files that are then merged across tiles into statewide mosaic products. This is mostly handled by the elmfire/src/merge_tiles.py
script.
merge_tiles.py¶
merge_tiles.py [-h] [--local data_directory] [--mosaic output_directory]
[--remote elmfire_directory] [--split tile_index] [-p {10,20,30,40,50,60,70,80,90}]
[--te xmin ymin xmax ymax] [--tr xres yres] [--blocksize xsize ysize]
[--just_burned] [--just_center] [-f] [--multithread num_threads] [--skip]
Merges tiled ELMFIRE outputs into mosaics of burn probability, flame length, spread rate & hazard.
optional arguments:
-h, --help show this help message and exit
--local data_directory
Local directory to store tiled ELMFIRE data.
--mosaic output_directory
Local directory to store mosaicked output data.
--remote elmfire_directory
Remote directory where the xx_yy format ELMFIRE tiles are stored.
--split tile_index The index for the number of subdirectories in which to find the TILE_IDs.
-p {10,20,30,40,50,60,70,80,90}, --percentile {10,20,30,40,50,60,70,80,90}
Percentile aggregation for flame length and spread rate.
--te xmin ymin xmax ymax
Output spatial extent.
--tr xres yres Output spatial resolution.
--blocksize xsize ysize
The raster internal data read block size.
--just_burned Compute burn probability using only the pixels that burned.
--just_center Compute burn probability using ignition counts inside the internal tile boundary.
-f, --force Force overwriting local data if already downloaded.
--multithread num_threads
Number of threads to use for multithreading. Set to -1 to use all threads.
--skip Skips the tile download step.
This script takes an input directory of ELMFIRE postprocessing outputs, downloads the tiled data locally, computes derivative products, and merges the tiles into large, single-band mosaics. The current outputs are [burn_probability
, flame_length
, spread_rate
and wildfire_hazard
].
Here are a few tips for passing commmand line arguments that may not be immediately obvious from the help tips above.
--remote
- Requires a path to a cloud storage bucket that contains directories with each tiled, postprocessed dataset (e.g.gs://elmfire/ca-2020-hazard-v1/
).--local
- The local directory where all tile outputs will be downloaded.--mosaic
- An alternative local directory to write mosaic data products. Default is to use the--local
directory. We set this up to reduce consecutive read/write operations on disks to speed up the merge process.--split
- the number of "splits" in the file path to make to find the directory with the tile outputs. The default (5
) assumes a structure likegs://{BUCKET}/{HAZARD_RUN}/{TILE_IDS}
, where the TILE_IDs are the 5th indexed element if you were to run.split('/')
on the remote directory string. If the tiles are undergs://{BUCKET}/{HAZARD_RUN}/{SUBDIRECTORY}/{TILE_IDS}
then set--split 6
.--blocksize
- sets the tile read/write size. Increase this to perform fewer i/o operations.--just_center
- uses inverse distance weighting in the burn probability calculations. This reduces tile edge effects because it discounts the number of potential ignitions by the distance from where the ignitions can start (the internal tile boundary).--force
is useful for rewriting existing files (currently, this script will not overwrite). But it will also force re-downloading the data, which could be time intensive. To re-run just the local data operations like data merging, run both--force
and--skip
, which will skip the download step.
CreateMosaicBlocks.ipynb¶
We were still generating postprocesssed tiles with about 36 hours to go before the webinar, and running merge_tiles.py
on the full state simultaneously would have taken more time than we had. So we created an intermediate mosaic product using a set of six blocks
.
If you squint you can make out the shape of California.
These blocks were included all X
blocks within 10-tile Y
blocks, which break the state down into the shape above. This also included a buffer of one set of Y
tiles to eliminate edge effects between blocks. The notebook for block generation can be found here:
elmfire/notebooks/CreateMosaicBlocks.ipynb
This notebook computes the spatial extents for each block and copies the associated tiles into separate block subdirectories. We then created six separate compute instances to run merge_tiles.py
for each block (using the elmfire/launch_elmfire_mosaic_nodes.sh
as an instance creation template). We logged in to each node and ran commands like this to process each block.
BLOCK_NUMBER=6
conda activate wildfire
python /home/cba/salo-wildfire/elmfire/src/merge_tiles.py --split 6 --remote gs://elmfire/ca-2020-hazard-v1-fusion/5-${BLOCK_NUMBER} --local /scratch/read --mosaic /scratch/write --multithread 1 --blocksize 2048 2048 --just_center
Once each block finished processing, all merged datasets were downloaded to an instance, merged into statewide mosaics, and pushed to the gs://elmfire
bucket. This was done with the following scripts:
elmfire/block-scripts/download-merge-burn-probability.sh
elmfire/block-scripts/download-merge-flame-length.sh
elmfire/block-scripts/download-merge-spread-rate.sh
elmfire/block-scripts/download-merge-hazard.sh
These scripts require a copy of ca-landmask.shp
in the directory where the scripts are run. This group of shapefiles can be found in gs://elmfire/vector/ca-landmask.*
.
What does the ELMFIRE acronym stand for?¶
ELMFIRE is short for "chris lautenberger and the rest of the fire modeling community can eat shit for making us debug and work with their shitty code and their shitty paradigms and their we-know-nothing-despite-acting-like-we-know-lots-of-things attitudes which are all encoded in this Eulerian Level set Method for wildland FIRE modeling."