Project

General

Profile

Performance

Gisaf is written basically as a OO and asynchronous way.

For manipulating potentially large datasets, the performance of SqlAlchemy (actually, asyncpg and Gino) has become a concern.

Few techniques are being put in place to tackle this problem.

Use Pandas (Numpy) instead of OO models

This is work in progress, but shows improvements of ~ 4 times with few thousands of records already.

Parallel processing

Using vector based processing (Pandas) serves as the base for future improvements: parallel processing and shameless code jit compilation.

For future reference, see https://towardsdatascience.com/how-i-learned-to-love-parallelized-applies-with-python-pandas-dask-and-numba-f06b0b367138

Geographical clustering

Gisaf uploads complete layers; boundary boxes based on the desired visualization, if possible with mapbox, would bring substantial speed ups. Generating vector tiles on the fly, rather than GeoJSON, seems to be the most promising track.