Project

General

Profile

Data analysis

We use Jupyter , Pandas and GeoPandas , accessible at http://gis.auroville.org.in/notebooks .

For integration in the processes (execution of notebooks), there's papermill . Systemd timers are used to automatically schedule the notebooks on the server, ie. for the dashboards.

There's a dedicated virtual machine for Jupyter, accessible from our local network at jupyter.csr.av.

Organization of notebooks

The setup is organized in 2 parts, that are run with 2 instances of Jupyter for security reasons.

Admin

The notebooks in the admin are mostly for maintenance: operations on the database, etc.

Users

The notebooks are organized in folders, all under Gisaf's source code git repository, except the "Sandbox" one.

This notebook server connects to the database with a specific user (jupyter), which has been set on the database server with permissions to read all data (readonly) plus has write access to some tables dedicated to store analysis results.

Integration with Gisaf

The notebook in Templates demonstrates the usage of notebook in relation with Gisaf: mostly, how to use the gisad.ipynb_tools module to access Gisaf models and the data from the database.

This module is part of gisaf: https://redmine.auroville.org.in/projects/gisaf/repository/revisions/master/entry/gisaf/ipynb_tools.py

References

Geopandas

Some nice examples of processing, using watershed and rain: https://geohackweek.github.io/vector/06-geopandas-advanced/

Integration

A good example of how a company has integrated the same tools: https://medium.com/netflix-techblog/scheduling-notebooks-348e6c14cfd6

Other docs