Getting started¶
The cloudy
package is the primary resource managed and maintained by the salo-cloud-services
repository. This is a python package for working with cloud services like GCP.
These are mostly developed for use within production environments - in automated deployments, scheduled tasks, and massively parallel workflows. They're less useful in interactive programming contexts like jupyter notebooks.
Functions are organized by cloud provider, and then by the different services available from each provider. For working with Google Cloud Storage, you would first run:
from cloudy.google import storage
You would then be able to access routines like storage.upload_directory_to_cloud_storage()
or storage.list_cloud_storage_blobs_recursively()
for interacting with the Cloud Storage API.
This from cloudy.{provider} import {service}
pattern is common for this package. Each service has its own quirks and concepts, and it made sense from a development perspective to compartmentalize a lot of these functions.
These pieces and parts can then be combined to create complex workflows that chain services and operations together to deploy cloud resources. You can find some examples of this in the Google Cloud Platform intro.
Setting up the development environment¶
Clone the repository and run make init
in order to create a local conda environment:
git clone git@github.com:salosciences/salo-cloud-services.git
cd salo-cloud-services
make init
You'll then be able to activate the development environment with conda activate salo-cloud-services
.
make init
installs the cloudy
python packages into this conda enviroment along with all the dependencies.
Run make test
to run the full package test suite and determine if there are any issues with the installation or the code.
Run make docs
to create a local server for hosting the package web docs. This should tell you the URL to enter into your browser to view the docs.
Once you've made changes to the code, run make build
to build a local conda
package. This will be stored in a newly-created package/
directory and can be used with conda index
to install the package with conda instead of pip.
You don't need to do this as part of any routine development workflow, but it is worth doing before pushing changes to the main
or production
branches to debug and ensure that the package builds properly for deployment.
Repository structure¶
The cloudy/
directory contains the source code for the package. This includes a set of files for working with google
and aws
services, which are each contained in separate python files. This also contains the unit tests for each function, which are run with pytest
. Running make test
from the base directory is the easiest way to run tests.
The conda_recipe/
directory contains configuration files and shell scripts for building the conda package using mamba
and boa
. These two packages are fast dependency solvers and make working with conda much faster. But they also require more complex build commands, which are mostly contained in this directory. These are referenced by make init
.
The .circleci/
directory contains configurations and scripts for building and deploying the conda package (to gs://salo-packages
) and the documentation (to docs.salo.ai). The CI workflows are activated whenever you push to the main
or the production
branches. CircleCI handles automatic package deployment, which will require some active version management (the logic of .circleci/build_package.sh
will overwrite packages deployed with the same version number on the main branch, but fail if there's an existing {package}-{version}
on the production branch).
The Makefile
defines a series of repository helper functions. Run make help
to learn more about them.
The Dockerfile
defines a container environment for building and testing the cloudy
package. This is useful if you need to test functions in an isolated container, but is not used in any deployment context at this time.
Documentation is managed by mkdocs.yml
and written in markdown in the docs/
directories. Code block documentation is handled automatically by the mkdocstrings
plugin, and should automatically update as code and docstrings are written and pushed. These updates are automatically deployed to the web via CircleCI.
Below are the shared/common routines included in the base package directory:
run_command_line(command)
¶
Run a shell command via subproces and assert no error.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
command |
str |
the shell command to run. |
required |
Returns:
Type | Description |
---|---|
str |
stdout from completed process. |
Source code in cloudy/shared.py
def run_command_line(command: str) -> str:
"""Run a shell command via subproces and assert no error.
Args:
command: the shell command to run.
Returns:
stdout from completed process.
"""
_logger.debug("Running command: {}".format(command))
completed = subprocess.run(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if completed.stderr:
message = "command: {}\n\nstdout: {}\n\nstderr: {}".format(
command, completed.stdout.decode("utf-8"), completed.stderr.decode("utf-8")
)
assert completed.returncode == 0, message
return completed.stdout.decode("utf-8")