Data analysis¶
We use Jupyter , Pandas and GeoPandas , accessible at http://gis.auroville.org.in/notebooks .
For integration in the processes (execution of notebooks), there's papermill . Systemd timers are used to automatically schedule the notebooks on the server, ie. for the dashboards.
There's a dedicated virtual machine for Jupyter, accessible from our local network at jupyter.csr.av
.
Organization of notebooks¶
The setup is organized in 2 parts, that are run with 2 instances of Jupyter for security reasons.
Admin¶
The notebooks in the admin are mostly for maintenance: operations on the database, etc.
Users¶
The notebooks are organized in folders, all under Gisaf's source code git repository, except the "Sandbox" one.
This notebook server connects to the database with a specific user (jupyter
), which has been set on the database server with permissions to read all data (readonly
) plus has write access to some tables dedicated to store analysis results.
Integration with Gisaf¶
The notebook in Templates
demonstrates the usage of notebook in relation with Gisaf: mostly, how to use the gisad.ipynb_tools
module to access Gisaf models and the data from the database.
This module is part of gisaf: https://redmine.auroville.org.in/projects/gisaf/repository/revisions/master/entry/gisaf/ipynb_tools.py
References¶
Geopandas¶
Some nice examples of processing, using watershed and rain: https://geohackweek.github.io/vector/06-geopandas-advanced/
Integration¶
A good example of how a company has integrated the same tools: https://medium.com/netflix-techblog/scheduling-notebooks-348e6c14cfd6