ELMFIRE¶

Central California flame length

ELMFIRE is a computationally efficient fire behavior simulator written by Chris Lautenberger. We use ELMFIRE as our in-house fire behavior model, primarily for generating quantitative estimates of burn probability, flame length, spread rate, and wildfire hazard. It is difficult to work with, however, and we've written a number of scripts and wrappers to parameterize and run ELMFIRE simulations, merge the outputs of each run into composite metrics (e.g. times_burned), and mosaic the outputs from multiple, spatially-independent tile runs.

We completed the first state-wide runs of wildfire hazard in July 2021, which covered the full state of California. We simulated 1.8 million fires using hourly weather data, 10m resolution Forest Observatory fuels data and a lot of custom fire behavior model settings.

These simulations are run using tiled raster data, which cover a small extent with a large buffer to allow fires to spread beyond the border of each tile's internal boundary. Ignitions are only seeded inside of the internal tile boundaries, then the perimeters of each simulated fire grow based on the fuel, topography, and weather conditions at the time of the simulation. Many simulations are run for each tile, which then get merged into per-tile metrics like times_burned (the count across all simulations that each pixel on the landscape is burned) during postprocessing. These per-tile metrics are then merged together in a process we creatively and descriptively call post-postprocessing.

California tiles¶

California ELMFIRE tiles

In order to run the hundred million+ fire simulations across the state of California, the fuels and weather data were clipped to small square "tiles" of data. This was done to reduce memory use, disk, disk reads and cpu costs on a per-simulation basis, and it also enables distributed processing across a cluster of compute nodes.

The image above shows the corners of the internal boundary for each tile. The full spatial extent of each tile, however, is buffered by 1 tile in each direction (i.e. a 3x3 external boundary for a 1x1 internal boundary). Ignitions start within the internal boundary and allowed to spread to areas within the extenal boundary.

There were 1,434 total tiles for California. After the first sweep of statewide fire simlations, 1,323 tiles ignited fires, while 111 had zero ignitions. This is because the ignition probability system uses distance to road to seed ignitions, and the tiles with no roads had no ignitions.

Running a simulation for an individual tile¶

To run a suite of fire simulations for a given tile and generate the times_burned.tif and related outputs, refer to elmfire/slurm-scripts/elmfire-pipeline.sh. At a high level, elmfire-pipeline.sh manages the SLURM job coordination and file system setup for running the fire simulations (i.e. elmfire run) and the post-processing of the fire simulation results.

The arguments are as follows:

Simulate a single tile.
Usage: ./slurm-scripts/elmfire-pipeline.sh [-s|--simulate <arg>] [-p|--post <arg>] [-o|--out <arg>] [-j|--job <arg>] [--sim-dir <arg>] [--gcs <arg>] [--(no-)tar] [--(no-)cleanup] [--(no-)run-tile] [--(no-)postprocess] [-h|--help]
        -s, --simulate: Arguments to run_tile.py. Required. (no default)
        -p, --post: Arguments to times_burned.py. (no default)
        -o, --out: Directory to output the completed results. Default is /jobs/$SLURM_JOBID/outputs. (no default)
        -j, --job: Directory to output the logs and to use as a scratch directory during the run. Default is /jobs/$SLURM_JOBID/job. (no default)
        --sim-dir: Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim. (default: '/home/alex_adamson_salo_ai/salo-wildfire/elmfire/slurm-scripts/sim')
        --gcs: GCS bucket to copy results of run tile step to. If not provided, the results will not be copied to GCS. (no default)
        --tar, --no-tar: Compress and upload as tar.gz GCS bucket instead of rsync to GCS bucket at end of postprocessing. Disabled  by default. (off by default)
        --cleanup, --no-cleanup: If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default. (on by default)
        --run-tile, --no-run-tile: If enabled, run the simulation. Enabled by default. (on by default)
        --postprocess, --no-postprocess: If enabled, run the postprocessing. Enabled by default. (on by default)
        -h, --help: Prints help

You'll notice that the --simulate and --post arguments allow you to specify the arguments provided to elmfire/src/run_tile.py and elmfire/src/times_burned.py respectively. Naturally, in order to know what to provide here, you'll need to know what these scripts do.

`run_tile.py`¶

run_tile.py manages mounting the California Forest Observatory data that are used as inputs to elmfire, the process of configuring elmfire to use the desired wildfire simulation parameters, the launching of an elmfire run on SLURM, and finally, the collection and storage of the outputs of each ignition simulation.

Arguments¶

The arguments are as follows:

usage: run_tile.py [-h] [-j JOBDIR] [-l LOGDIR] [--no-cleanup] [--partition SLURM_PARTITION] [--cpus SLURM_CPUS] [-o OUTDIR] [--format {raw,tgz}] [--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
                   [-b BURNTIME] [-f FUELS_TYPE] [-v VERSION] [-i INDIR] [-s SIMDIR] [--gcs GCS_DIR] [-g IGNITION_SCALER] [--mpi-cpus MPI_CPUS] [--debug]
                   tile year start stop

Simulate a single tile

positional arguments:
  tile                  The tile to simulate in format "XX_YY".
  year                  The year of weather data to load from "in".
  start                 The hour in the weather raster to start the simulation at.
  stop                  The hour in the weather raster to stop the simulation at.

optional arguments:
  -h, --help            show this help message and exit
  -b BURNTIME, --burntime BURNTIME
                        How long (in hours) to allow fires to sim. Default is 24.
  -f FUELS_TYPE, --fuels_type FUELS_TYPE
                        What type of fuels input to use: cfo or landfire. Default is cfo (forest observatory).
  -v VERSION, --version VERSION
                        Version of ELMFIRE to use. Default is 0.6550.
  -i INDIR, --in INDIR  Directory containing the weather, fuel, etc. data to use for the simulation. Default is gs://elmfire/input.
  -s SIMDIR, --sim-dir SIMDIR
                        Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim.
  --gcs GCS_DIR         GCS bucket upload path.
  -g IGNITION_SCALER, --igscaler IGNITION_SCALER
                        Value for Ignition mask scaling factor. Default is None (use elmfire.data default)
  --mpi-cpus MPI_CPUS   Number of CPUs to have MPI use
  --debug               If enabled, ELMFIRE will run in debug mode. Disabled by default.

Slurm:
  -j JOBDIR, --job-dir JOBDIR
                        Directory to use for temporary storage and logging. If not provided, a directory will be created.
  -l LOGDIR, --log-dir LOGDIR
                        Directory to store logs in. If not provided, logs will be stored in jobdir.
  --no-cleanup          If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default.
  --partition SLURM_PARTITION
                        The SLURM partition to use when scheduling jobs. Default is None (use SLURM default).
  --cpus SLURM_CPUS     The number of CPUs (tasks) to request from SLURM. Default is None (use SLURM default).
  -o OUTDIR, --out OUTDIR
                        Directory to output the intermediate and completed results. Default is None (do not output anywhere).
  --format {raw,tgz}    Format to use when outputting the results. 'raw' means upload the results as they are. 'tgz' (default) means pack the results in a tar.gz archive before uploading.
  --log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Provide logging level. Default is "warning".

If used through elmfire_pipeline.sh, --sim-dir (as --sim-dir), --gcs (as --gcs), --format (as --raw), --out (as <--job>/intermediate), and --job-dir (as <--job>/run_tile) are provided by elmfire-pipeline.sh and should not be modified through the --simulate argument.

--burntime and --version control the elmfire configuration file that will be used. elmfire configuration files are located in elmfire/slurm-scripts/sim/elmfire_data and have the format elmfire-<--version>-<--burntime>hr.data. --in specifies the location of the GCS bucket containing the CFO input data. Additionally, --version is used to set which elmfire binary will be used to run the simulation.

The positional arguments tile and year control which tile to simulate for what year, and therefore which subdirectory of --in to load input data from since the bucket is expected to be organized by tile and year (see elmfire/slurm-scripts/sim/01-setup-cfo.sh and its usage in run_tile.py).

The remaining positional arguments are start and stop. These arguments control the endpoint of the half-open range ([start, stop)) of weather band rasters that will be used during the simulation. start and stop control the METEOROLOGY_BAND_START and METEOROLOGY_BAND_STOP parameter values of the elmfire-[...].data file (discussed above, see elmfire/src/elmfire-simulate-tile.sh for how they're provided to elmfire).

Similarly, --igscaler controls the IGNITION_MASK_SCALE_PARAMETER parameter value in the elmfire data file.

The --partition argument controls which SLURM cluster the invocations of elmfire-simulate-tile.sh (where elmfire itself is finally invoked) will run on.

Respectively, the --cpus and --mpi-cpus arguments control how many CPUs to request from SLURM for each run (you should specify the number of logical cores on each machine in the SLURM cluster) and how many CPUs to expose to MPI (you should specify the number of physical cores on each machine, i.e. half of the number of logical cores since hyperthreading is disabled).

Summary of operation¶

When running in Monte Carlo mode, elmfire produces a binary file for each simulated ignition. The goal of run_tile.py is to get elmfire to successfully simulate some number of ignitions (determined by weather conditions, the ignition scale factor, and other inputs) for each weather band, and then collect the binary files (formatted as toa_<band>_<ignition number>.bin) for post-processing (see below). elmfire can be extremely flaky and is prone to falling over unexpectedly, so we need a wrapper script to babysit, monitor progress, and get it to resume when it falls over. That is where run_tile.py comes in.

run_tile.py begins by mounting the GCS bucket (or local directory) and then calling 01-setup-cfo.sh, a script which will extract the appropriate inputs for the tile and year to be simulated. Once this is done, run_tile.py will repeatedly submit SLURM jobs to launch elmfire-simulate-tile.sh (which will run elmfire itself) until all ignitions for all weather bands have successfully completed. When launched, elmfire will output a file called cases_to_run.csv into its output directory which contains a listing of how many ignitions (and therefore binary files) are expected per weather band. When the SLURM job completes, we can check the output directory for how many ignitions (and therefore binary files) have been completed per band (since, as mentioned above, the naming scheme for the files indicates the corresponding band). If the job did not complete successfully, some bands will be missing some ignitions. In this case, we take note of the first incomplete band, and for all prior successful bands, collect the binary files into a separate output directory. We then launch a new SLURM job specifying to elmifre-simulate-tile.sh that the new start band should be the previous first incomplete start band. This process continues until all bands have been successfully simulated.

Once all bands have been successfully simulated, we collect the binary files and output them to the specified directory.

`postprocess_tile.py`¶

postprocess_tile.py manages converting the binary files output by run_tile.py into the final times_burned, flame_length, etc. output rasters for the tile.

Arguments¶

usage: postprocess_tile.py [-h] [-j JOBDIR] [-l LOGDIR] [--no-cleanup] [--partition SLURM_PARTITION] [--cpus SLURM_CPUS] [-o OUTDIR] [--format {raw,tgz}]
                           [--log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-i INTERMEDIATE_DIR] [-v VERSION] [--mpi-cpus MPI_CPUS] [-s SIMDIR] [--gcs GCS_DIR] [--sync-dir SYNC_DIR] [--rerun-postproc]
                           tile

Postprocess a tile simulation

positional arguments:
  tile                  The tile to simulate in format "XX_YY".

optional arguments:
  -h, --help            show this help message and exit
  -i INTERMEDIATE_DIR, --intermediate INTERMEDIATE_DIR
                        Directory containing the intermediate results to be postprocessed.
  -v VERSION, --version VERSION
                        Version of ELMFIRE to use. Default is 0.6550.
  --mpi-cpus MPI_CPUS   Number of CPUs to have MPI use
  -s SIMDIR, --sim-dir SIMDIR
                        Directory containing the simulation scripts and elmfire.data files. Default is $HOME/salo-wildfire/elmfire/slurm-scripts/sim.
  --gcs GCS_DIR         GCS bucket upload path.
  --sync-dir SYNC_DIR   Directory to sync to GCS bucket.
  --rerun-postproc      Flag to rerun postprocessing after initial failure

Slurm:
  -j JOBDIR, --job-dir JOBDIR
                        Directory to use for temporary storage and logging. If not provided, a directory will be created.
  -l LOGDIR, --log-dir LOGDIR
                        Directory to store logs in. If not provided, logs will be stored in jobdir.
  --no-cleanup          If enabled, we will leave the intermediate outputs and the scratch space in place. Enabled by default.
  --partition SLURM_PARTITION
                        The SLURM partition to use when scheduling jobs. Default is None (use SLURM default).
  --cpus SLURM_CPUS     The number of CPUs (tasks) to request from SLURM. Default is None (use SLURM default).
  -o OUTDIR, --out OUTDIR
                        Directory to output the intermediate and completed results. Default is None (do not output anywhere).
  --format {raw,tgz}    Format to use when outputting the results. 'raw' means upload the results as they are. 'tgz' (default) means pack the results in a tar.gz archive before uploading.
  --log {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Provide logging level. Default is "warning".

If used through elmfire_pipeline.sh, --sim-dir (as --sim-dir), --gcs (as --gcs), --format (as --raw), --out (as <--out>), --intermediate (as <--job>/intermediate) and --job-dir (as <--job>/postprocess_tile) are provided by elmfire-pipeline.sh and should not be modified through the --post argument.

The usage of --partition, --cpus, and --mpi-cpus is identical to their usage in run_tile.py.

--version controls which binary of elmfire_post will be used (see elmfire/slurm-scripts/sim/postprocess.sh).

Summary of operation¶

postprocess_tile.py manages launching a SLURM job that will invoke postprocess.sh. postprocess.sh will construct the elmfire.data file using input parameters gathered from the metadata files placed (by run_tile.py) in the directory specified by the --intermediate argument. It will collect the CELLSIZE, location of the lower-left corner, and a few other inputs from the elmfire.data used by elmfire when running the simulations. The script will then run elmfire_post_<--version> to create the final output rasters from the binary files output by run_tile.py and copy them to the directory specified by --out.

Using `elmfire-pipeline.sh`¶

Taking together the above, a typical invocation will then look something like

slurm-scripts/elmfire-pipeline.sh --out "mydir/postprocessed_out" --job "mydir" \
      --gcs "gs://elmfire/my_runs/${tile}/${ignition_factor}" \
      -s "--log DEBUG --burntime 24 --version 0.6551.salo --in gs://elmfire/inputs ${tile} 2020 2925 3000 --partition=elmfire-pe --cpus 32 --mpi-cpus 16 -g ${ignition_factor}" \
      -p "--log DEBUG --version 0.6551.salo --partition=elmfire-pe --cpus 32 --mpi-cpus 16"

Simulation post-processing¶

Someone should document this!

Merging tile runs in post-postprocessing¶

The postprocessing routines generate a series of composite files that are then merged across tiles into statewide mosaic products. This is mostly handled by the elmfire/src/merge_tiles.py script.

merge_tiles.py¶

merge_tiles.py [-h] [--local data_directory] [--mosaic output_directory] 
               [--remote elmfire_directory] [--split tile_index] [-p {10,20,30,40,50,60,70,80,90}]
               [--te xmin ymin xmax ymax] [--tr xres yres] [--blocksize xsize ysize]
               [--just_burned] [--just_center] [-f] [--multithread num_threads] [--skip]

Merges tiled ELMFIRE outputs into mosaics of burn probability, flame length, spread rate & hazard.

optional arguments:
  -h, --help            show this help message and exit
  --local data_directory
                        Local directory to store tiled ELMFIRE data.
  --mosaic output_directory
                        Local directory to store mosaicked output data.
  --remote elmfire_directory
                        Remote directory where the xx_yy format ELMFIRE tiles are stored.
  --split tile_index    The index for the number of subdirectories in which to find the TILE_IDs.
  -p {10,20,30,40,50,60,70,80,90}, --percentile {10,20,30,40,50,60,70,80,90}
                        Percentile aggregation for flame length and spread rate.
  --te xmin ymin xmax ymax
                        Output spatial extent.
  --tr xres yres        Output spatial resolution.
  --blocksize xsize ysize
                        The raster internal data read block size.
  --just_burned         Compute burn probability using only the pixels that burned.
  --just_center         Compute burn probability using ignition counts inside the internal tile boundary.
  -f, --force           Force overwriting local data if already downloaded.
  --multithread num_threads
                        Number of threads to use for multithreading. Set to -1 to use all threads.
  --skip                Skips the tile download step.

This script takes an input directory of ELMFIRE postprocessing outputs, downloads the tiled data locally, computes derivative products, and merges the tiles into large, single-band mosaics. The current outputs are [burn_probability, flame_length, spread_rate and wildfire_hazard].

Here are a few tips for passing commmand line arguments that may not be immediately obvious from the help tips above.

--remote - Requires a path to a cloud storage bucket that contains directories with each tiled, postprocessed dataset (e.g. gs://elmfire/ca-2020-hazard-v1/).
--local - The local directory where all tile outputs will be downloaded.
--mosaic - An alternative local directory to write mosaic data products. Default is to use the --local directory. We set this up to reduce consecutive read/write operations on disks to speed up the merge process.
--split - the number of "splits" in the file path to make to find the directory with the tile outputs. The default (5) assumes a structure like gs://{BUCKET}/{HAZARD_RUN}/{TILE_IDS}, where the TILE_IDs are the 5^th indexed element if you were to run .split('/') on the remote directory string. If the tiles are under gs://{BUCKET}/{HAZARD_RUN}/{SUBDIRECTORY}/{TILE_IDS} then set --split 6.
--blocksize - sets the tile read/write size. Increase this to perform fewer i/o operations.
--just_center - uses inverse distance weighting in the burn probability calculations. This reduces tile edge effects because it discounts the number of potential ignitions by the distance from where the ignitions can start (the internal tile boundary).
--force is useful for rewriting existing files (currently, this script will not overwrite). But it will also force re-downloading the data, which could be time intensive. To re-run just the local data operations like data merging, run both --force and --skip, which will skip the download step.

CreateMosaicBlocks.ipynb¶

We were still generating postprocesssed tiles with about 36 hours to go before the webinar, and running merge_tiles.py on the full state simultaneously would have taken more time than we had. So we created an intermediate mosaic product using a set of six blocks.

California ELMFIRE blocks

If you squint you can make out the shape of California.

These blocks were included all X blocks within 10-tile Y blocks, which break the state down into the shape above. This also included a buffer of one set of Y tiles to eliminate edge effects between blocks. The notebook for block generation can be found here:

elmfire/notebooks/CreateMosaicBlocks.ipynb

This notebook computes the spatial extents for each block and copies the associated tiles into separate block subdirectories. We then created six separate compute instances to run merge_tiles.py for each block (using the elmfire/launch_elmfire_mosaic_nodes.sh as an instance creation template). We logged in to each node and ran commands like this to process each block.

BLOCK_NUMBER=6
conda activate wildfire
python /home/cba/salo-wildfire/elmfire/src/merge_tiles.py --split 6 --remote gs://elmfire/ca-2020-hazard-v1-fusion/5-${BLOCK_NUMBER} --local /scratch/read --mosaic /scratch/write --multithread 1 --blocksize 2048 2048 --just_center

Once each block finished processing, all merged datasets were downloaded to an instance, merged into statewide mosaics, and pushed to the gs://elmfire bucket. This was done with the following scripts:

elmfire/block-scripts/download-merge-burn-probability.sh
elmfire/block-scripts/download-merge-flame-length.sh
elmfire/block-scripts/download-merge-spread-rate.sh
elmfire/block-scripts/download-merge-hazard.sh

These scripts require a copy of ca-landmask.shp in the directory where the scripts are run. This group of shapefiles can be found in gs://elmfire/vector/ca-landmask.*.

What does the ELMFIRE acronym stand for?¶

ELMFIRE is short for "chris lautenberger and the rest of the fire modeling community can eat shit for making us debug and work with their shitty code and their shitty paradigms and their we-know-nothing-despite-acting-like-we-know-lots-of-things attitudes which are all encoded in this Eulerian Level set Method for wildland FIRE modeling."

ELMFIRE¶

California tiles¶

Running a simulation for an individual tile¶

run_tile.py¶

Arguments¶

Summary of operation¶

postprocess_tile.py¶

Arguments¶

Summary of operation¶

Using elmfire-pipeline.sh¶

Simulation post-processing¶

Merging tile runs in post-postprocessing¶

merge_tiles.py¶

CreateMosaicBlocks.ipynb¶

What does the ELMFIRE acronym stand for?¶

`run_tile.py`¶

`postprocess_tile.py`¶

Using `elmfire-pipeline.sh`¶