Getting Started

How to get started using the ENTR data stack.

Installating the ENTR Runtime

This section contains information on how to install and begin using the ENTR Runtime for analysis. For instructions on how to develop the components of the ENTR runtime itself, please see the Developer Setup section.

Pull and Run the Image

  1. Install Docker Desktop on your workstation (see instructions).

    • We recommend following this guide to install Docker on Windows. After installing the WSL 2 backend and Docker you should be able to run containers using Windows PowerShell.

  2. Pull and run our image from github container registry:

docker pull ghcr.io/entralliance/entr_runtime:latest

Note: There are numerous security considerations when pulling and running images from the public internet. User should take the necessary steps to ensure operational security.

  1. Run the entr runtime container, forwarding the necessary ports:

docker run -p 8888:8888 ghcr.io/entralliance/entr_runtime:latest

  1. Open the Jupyter link printed to the terminal in your web browser.

Building your own Image

In most cases, we reccomend using the pre-built entr_runtime image avaialble from the github container registry. If you need to rebuild the image yourself, follow the instructions below:

  1. Install Git and Docker Desktop on your workstation.

  2. Clone the ENTR Runtime repository:

git clone git@github.com:entralliance/entr_runtime.git
git checkout dev
  1. Navigate to the entr_runtime directory and run the following, replacing yourname with your username:

docker build -t yourname/entr-runtime docker

Note: Use the option ``–no-cache`` to force rebuilding of each layer

  1. Run the image you just built:

docker run -p 8888:8888 yourname/entr_runtime

Developing with the ENTR Runtime

This section contains information on how to begin developing the components of the ENTR Runtime environment.

Manually Running ENTR in Dev Mode

The ENTR runtime contains the following preinstalled components: OpenOA, entr_warehouse, and py-entr. To develop these components, you check out development versions of these packages to your local filesystem, and then start the entr image with these paths mounted as volumes. You then install the packages from these volumes in editable mode. This allows you to edit the code in these components on your local machine, and see the changes immediately reflected in the runtime. If $ENTR_HOME is the directory you’d like to work from:

  1. cd $ENTR_HOME

  2. git clone https://github.com/entralliance/entr_warehouse.git

  3. git clone https://github.com/entralliance/OpenOA.git

  4. git clone https://github.com/entralliance/py_entr.git

  5. git clone https://github.com/entralliance/entr_runtime.git

  6. Optionally, build the entr image. You can also use the dev image from the container registry as discussed in the quickstart guide.

  7. Now, start the entr container in dev mode, mapping the directories you checked out to paths within the container: docker run -p 8888:8888 -v $ENTR_HOME/OpenOA:/home/jovyan/src/OpenOA -v $ENTR_HOME/entr_warehouse:/home/jovyan/src/entr_warehouse -v $ENTR_HOME/py-entr:/home/jovyan/src/py-entr.

  8. Once inside the container, you will then need to re-install OpenOA in editable mode, or run dbt run as needed to materialize any changes to the dbt model code in the warehouse.

    • To install OpenOA in editable mode:

      • cd /home/jovyan/src/OpenOA

      • pip install -e .

    • To re-run DBT:

      • cd /home/jovyan/src/entr_warehouse

      • dbt run

Updating the Warehouse

Changes to the warehouse may require re-running dbt. To do this:

  1. Open a terminal from Jupyter (File > New > Terminal) and navigate to the location where your dbt project is installed (see section “Assumed Repository Structure” section below) using cd ~/src/entr_warehouse and run dbt debug to test your connection to the Spark warehouse.

  2. Once the connection to the warehouse is confirmed, install the dbt packages for your project using dbt deps

  3. Seed the metadata tables contained in the entr_warehouse repo using dbt seed to instantiate them in the Spark warehouse

  4. (Re-)register example or newly added source data files with dbt run-operation stage_external_sources

  5. Run dbt run to build all models in the Spark warehouse, which can now be consumed by any application connected to the Spark warehouse such as OpenOA

Advanced Topics

Extra ports:

docker run -p 8888:8888 -p 8080:8080 -p 4040:4040 entr/entr-runtime

Override OpenOA and entr_warehouse with local versions:

docker run -p 8888:8888 -p 8080:8080 -p 4040:4040 -v <path-to-local-clone-of-OpenOA>:/home/jovyan/src/OpenOA -v <path-to-local-clone-of-entr_warehouse>:/home/jovyan/src/entr_warehouse entr/entr-runtime-dev

Note, you will then need to re-install OpenOA in editable mode, or run dbt run as needed to update the container with the new code.

Beeline connect string for ENTR warehouse:

beeline
!connect jdbc:hive2://localhost:10000

Using VSCode Dev Container

We provide an example VSCode Dev Container config which can be used to get up and running quickly developing the ENTR platform. This is the recommended method if you use VSCode and are familliar with VSCode Dev Containers.

git clone https://github.com/jordanperr/entr_dev_environment.git cd entr_dev_environment git submodule update --init --remote

Then, open the project with VSCode and follow the prompts to initialize the dev container.

Run ENTR on Your Data

If you’re already using dbt, you should install dbt-openoa in your dbt project and follow the guidelines below for how to build models to feed the ENTR transformation pipeline to leverage OpenOA.

The ENTR Warehouse is an example dbt project available on the ENTR runtime - the easiest way to test out the functionality of the ENTR data stack on your own data if you aren’t already using dbt is to load data as a CSV

Welcome to the ENTR Data Warehouse

Background

The ENTR Warehouse is a dbt project with the goal of providing a common ground of data (formats and transformation methods) upon which renewable energy industry users can build and share analytical applications. Once an industry user integrates his or her data into the generic fact and dimension tables in the ENTR model, he or she will then be able to utilize any associated applications that were built on top of the standard ENTR table schema.

Getting Started

The ENTR Runtime Docker image contains all of the dependencies needed for this tutorial including a standalone Apache Spark warehouse that can be used for running everything contained within the ENTR Warehouse dbt project. See the installation guide here for how to build and set up the ENTR Runtime.

ENTR Data Model

dbt docs for the entr_warehouse dbt project can be found at https://entralliance.github.io/entr_warehouse. This interface is useful for exploring and understanding the ENTR data model.

How to Bring Your Own Data

Note: the following steps require at least basic experience with building models in dbt.

Loading New Data from Files

The ENTR Runtime image contains pre-built models defined by the ENTR Warehouse based on open-source example data; however, for users wishing to bring their own data, the ENTR Warehouse supports setup of new sources from CSV and other Spark-readable file types by leveraging the dbt-external-tables package from dbt-labs.

  • With a clone of this entr_warehouse project mounted to the ENTR Runtime, drop a copy of the file you’d like to process through the ENTR data model into the data/ directory

  • Within the models/staging/ directory, write out the source definition for the new file within a YML file in the staging directory using the dbt-external-tables guides as needed

    • Note: the new files can be added to any YML file in the models folder but must be mapped under the entr_warehouse: .. code-block:: yml

      sources: - name: entr_warehouse

      tables:
      • name: <new table name> description: <description of new source table> external:

        location: ‘<path to data file withing the container>’ # e.g. “/home/jovyan/src/entr_warehouse/data/la_haute_borne_plant_data_sample.csv” - this depends on where you’ve mounted the entr_warehouse dir in the container using: csv # specify for different file types accordingly options:

        header: ‘true’ # optional but used with the ENTR sample data

  • Run dbt run-operation stage_external_sources to make the file available as a table in the ENTR runtime Spark warehouse and as a source relation in dbt from which you can start building further transformations

  • See the four files within the data/ folder and their corresponding source definitions within the entr_sample_data.yml file for examples

Transforming New Data to ENTR Standard Formats

Once the new file is set as a source, you will need to transform the data into the standard ENTR fact table format - to build the dbt transformations, you’ll need to define and map the dimensional components of the new data utilizing the standard ENTR dimension table formats.

1. (Optional) Create an Intermediate Model to Facilitate Table Reshaping
  • You’ll likely notice that the initial step in the transformation of the example sources (files) is just performing type casting (see the examples within the models/staging/entr_sample_data/intermediate directory), e.g. the int_entr_scada_sample__cast model, which just performs type casting on the raw data as a preliminary step

    • Prepares the data for reshaping/pivoting; we expect this will be a frequently necessary staging step for source files with tags corresponding to data types

    • Assignment of dimensional keys, e.g. here - see below for further detail

3. Align Staging Model with Associated ENTR Fact Table Schema
  • Once all metadata about the new data from the newly loaded file is available, the last staging step is transforming the data into the relevant ENTR generic fact table schema, which can be found in this project’s dbt docs, e.g. fct_entr_wtg_scada for the generic wind turbine SCADA data fact table schema - the staging model stg_entr_scada_sample performs the final transformation on the example SCADA data from La Haute Borne to make it match the table schema of the fct_entr_wtg_scada model. The current generic ENTR fact tables are as follows:

4. Add Newly Staged Data to ENTR Fact Table

Once a staging model has been created for your new source data that matches the associated generic ENTR fact table schema, you will just need to union that new staging model with the generic ENTR fact table to make the new data ready for consumption by ENTR-based applications. The fct_entr_reanalysis_data shows how multiple staging models are combined in the generic ENTR reanalysis model.

Resources

Running OpenOA on ENTR

The ENTR Runtime includes example analysis notebooks that demonstrate operational wind plant data analytics use cases using OpenOA with example data stored in the ENTR warehouse. The example notebooks are located at /examples in the ENTR Runtime Docker workspace. All examples use two years of data for the 4-turbine “La Haute Borne” wind plant.

Running the Examples

  1. Complete the Installation section and open Jupyter Hub in your web browser.

  2. On the left-hand side of Jupter Hub, navigate to: examples

  3. Double click on any example notebook to open it.

List of Examples

The ENTR runtime contains two example notebooks:

OpenOA documentation

OpenOA documentation is hosted on ReadTheDocs.

Data are stored and organized in OpenOA using a PlantData object. The PlantData class uses the plantdata.from_entr method from the py-entr package (link to code) to load data into OpenOA from the ENTR Warehouse.