Project

General

Profile

Data Analysis » History » Version 1

Giulio Di Anastasio, 03/05/2021 10:56

1 1 Giulio Di Anastasio
h1. %{color:BLUE}  Data analysis%
2 1 Giulio Di Anastasio
3 1 Giulio Di Anastasio
We use "Jupyter":https://jupyter.org , "Pandas":https://pandas.pydata.org/ and "GeoPandas":http://geopandas.org/ , accessible at http://gis.auroville.org.in/notebooks .
4 1 Giulio Di Anastasio
5 1 Giulio Di Anastasio
For integration in the processes (execution of notebooks), there's "papermill":https://github.com/nteract/papermill . Systemd "timers":https://wiki.archlinux.org/index.php/Systemd/Timers are used to automatically schedule the notebooks on the server, ie. for the dashboards.
6 1 Giulio Di Anastasio
7 1 Giulio Di Anastasio
There's a dedicated virtual machine for Jupyter, accessible from our local network at @jupyter.csr.av@.
8 1 Giulio Di Anastasio
9 1 Giulio Di Anastasio
h2. %{color:BLUE}  Organization of notebooks%
10 1 Giulio Di Anastasio
11 1 Giulio Di Anastasio
The setup is organized in 2 parts, that are run with 2 instances of Jupyter for security reasons.
12 1 Giulio Di Anastasio
13 1 Giulio Di Anastasio
h3. %{color:BLUE}  Admin%
14 1 Giulio Di Anastasio
15 1 Giulio Di Anastasio
The notebooks in the admin are mostly for maintenance: operations on the database, etc.
16 1 Giulio Di Anastasio
17 1 Giulio Di Anastasio
h3. %{color:BLUE}  Users%
18 1 Giulio Di Anastasio
19 1 Giulio Di Anastasio
The notebooks are organized in folders, all under Gisaf's source code git repository, except the "Sandbox" one.
20 1 Giulio Di Anastasio
21 1 Giulio Di Anastasio
This notebook server connects to the database with a specific user (@jupyter@), which has been set on the database server with permissions to read all data (@readonly@) plus has write access to some tables dedicated to store analysis results.
22 1 Giulio Di Anastasio
23 1 Giulio Di Anastasio
h2. %{color:BLUE}  Integration with Gisaf%
24 1 Giulio Di Anastasio
25 1 Giulio Di Anastasio
The notebook in @Templates@ demonstrates the usage of notebook in relation with Gisaf: mostly, how to use the @gisad.ipynb_tools@ module to access Gisaf models and the data from the database.
26 1 Giulio Di Anastasio
27 1 Giulio Di Anastasio
This module is part of gisaf: https://redmine.auroville.org.in/projects/gisaf/repository/revisions/master/entry/gisaf/ipynb_tools.py
28 1 Giulio Di Anastasio
29 1 Giulio Di Anastasio
h2. %{color:BLUE}  References%
30 1 Giulio Di Anastasio
31 1 Giulio Di Anastasio
h3. %{color:BLUE}  Geopandas%
32 1 Giulio Di Anastasio
33 1 Giulio Di Anastasio
Some nice examples of processing, using watershed and rain: https://geohackweek.github.io/vector/06-geopandas-advanced/
34 1 Giulio Di Anastasio
35 1 Giulio Di Anastasio
h3. %{color:BLUE}  Integration%
36 1 Giulio Di Anastasio
37 1 Giulio Di Anastasio
A good example of how a company has integrated the same tools: https://medium.com/netflix-techblog/scheduling-notebooks-348e6c14cfd6
38 1 Giulio Di Anastasio
39 1 Giulio Di Anastasio
h2. %{color:BLUE}  Other docs%