GES DISC access from a container

Ari Gesher
2 min readFeb 15, 2023
Copernicus Sentinel-5P TROPOMI Methane Product (Orbit# 9397) — a map of false-color visualization of methane concentrations over South America

The NASA GES DISC data archive is an amazing resource to access many scientific and Earth observation datasets.

Many datasets are also hosted in AWS’s us-west-2 datacenter and you must request a set of temporary AWS credentials to access it from a machine running in that datacenter.

There’s a tutorial on how to set up access on their website. The tutorial is geared towards interactive users with persistent home directories.

In container land, passing authentication credentials is often handled with environment variables rather than persistent configuration files.

With a few clever environment tricks, we can use a temporary .netrc file in order to drive the whole authentication flow via environment variables.

The following snippet expects to find the username and password for EARTHDATA in environment variables:

import os
from pathlib import Path
import tempfile
import requests

AWS_SESSION_TOKEN_ENDPOINT = "https://data.gesdisc.earthdata.nasa.gov/s3credentials"

# username for EARTHDATA account
USERNAME = os.environ["GESDISC_USERNAME"]
# password for EARTHDATA account
PASSWORD = os.environ["GESDISC_PASSWORD"]
# machine we'll be redirected through for authentication
MACHINE = 'urs.earthdata.nasa.gov'

# tolerate an unset $HOME cleanly
_home = os.environ.get("HOME")

# make a temporary folder to use as $HOME
with tempfile.TemporaryDirectory() as td:
# move $HOME
os.environ["HOME"] = td
# write a temp netrc file
with (Path.home() / ".netrc").open("w") as f:
f.write(f"machine {MACHINE}\nlogin {USERNAME}\npassword {PASSWORD}\n")
# make the request that needs to find the ~/.netrc file
resp = requests.get(AWS_SESSION_TOKEN_ENDPOINT)
# put $HOME back how we found it
os.environ["HOME"] = _home
# deal with any errors
resp.raise_for_status()

# extract the dict with temporary AWS credentials
s3_creds = resp.json()

# use GESDISC data in s3...

Leveraging this netrc automation in requestsis the simplest way to jump through these hoops. I hope this snippet helps someone else out there.

--

--