Skip to content

Getting started

The cloudy package is the primary resource managed and maintained by the salo-cloud-services repository. This is a python package for working with cloud services like GCP.

These are mostly developed for use within production environments - in automated deployments, scheduled tasks, and massively parallel workflows. They're less useful in interactive programming contexts like jupyter notebooks.

Functions are organized by cloud provider, and then by the different services available from each provider. For working with Google Cloud Storage, you would first run:

from cloudy.google import storage

You would then be able to access routines like storage.upload_directory_to_cloud_storage() or storage.list_cloud_storage_blobs_recursively() for interacting with the Cloud Storage API.

This from cloudy.{provider} import {service} pattern is common for this package. Each service has its own quirks and concepts, and it made sense from a development perspective to compartmentalize a lot of these functions.

These pieces and parts can then be combined to create complex workflows that chain services and operations together to deploy cloud resources. You can find some examples of this in the Google Cloud Platform intro.


Setting up the development environment

Clone the repository and run make init in order to create a local conda environment:

git clone git@github.com:salosciences/salo-cloud-services.git
cd salo-cloud-services
make init

You'll then be able to activate the development environment with conda activate salo-cloud-services.

make init installs the cloudy python packages into this conda enviroment along with all the dependencies.

Run make test to run the full package test suite and determine if there are any issues with the installation or the code.

Run make docs to create a local server for hosting the package web docs. This should tell you the URL to enter into your browser to view the docs.

Once you've made changes to the code, run make build to build a local conda package. This will be stored in a newly-created package/ directory and can be used with conda index to install the package with conda instead of pip.

You don't need to do this as part of any routine development workflow, but it is worth doing before pushing changes to the main or production branches to debug and ensure that the package builds properly for deployment.


Repository structure

The cloudy/ directory contains the source code for the package. This includes a set of files for working with google and aws services, which are each contained in separate python files. This also contains the unit tests for each function, which are run with pytest. Running make test from the base directory is the easiest way to run tests.

The conda_recipe/ directory contains configuration files and shell scripts for building the conda package using mamba and boa. These two packages are fast dependency solvers and make working with conda much faster. But they also require more complex build commands, which are mostly contained in this directory. These are referenced by make init.

The .circleci/ directory contains configurations and scripts for building and deploying the conda package (to gs://salo-packages) and the documentation (to docs.salo.ai). The CI workflows are activated whenever you push to the main or the production branches. CircleCI handles automatic package deployment, which will require some active version management (the logic of .circleci/build_package.sh will overwrite packages deployed with the same version number on the main branch, but fail if there's an existing {package}-{version} on the production branch).

The Makefile defines a series of repository helper functions. Run make help to learn more about them.

The Dockerfile defines a container environment for building and testing the cloudy package. This is useful if you need to test functions in an isolated container, but is not used in any deployment context at this time.

Documentation is managed by mkdocs.yml and written in markdown in the docs/ directories. Code block documentation is handled automatically by the mkdocstrings plugin, and should automatically update as code and docstrings are written and pushed. These updates are automatically deployed to the web via CircleCI.


Below are the shared/common routines included in the base package directory:

run_command_line(command)

Run a shell command via subproces and assert no error.

Parameters:

Name Type Description Default
command str

the shell command to run.

required

Returns:

Type Description
str

stdout from completed process.

Source code in cloudy/shared.py
def run_command_line(command: str) -> str:
    """Run a shell command via subproces and assert no error.

    Args:
      command: the shell command to run.

    Returns:
      stdout from completed process.
    """
    _logger.debug("Running command:  {}".format(command))
    completed = subprocess.run(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    if completed.stderr:
        message = "command:  {}\n\nstdout:  {}\n\nstderr:  {}".format(
            command, completed.stdout.decode("utf-8"), completed.stderr.decode("utf-8")
        )
        assert completed.returncode == 0, message
    return completed.stdout.decode("utf-8")
Back to top