Project

General

Profile

Performance » History » Version 2

Philippe May, 30/03/2019 03:05

1 1 Philippe May
h1. Performance
2 1 Philippe May
3 1 Philippe May
Gisaf is written basically as a OO and asynchronous way.
4 1 Philippe May
5 1 Philippe May
For manipulating potentially large datasets, the performance of SqlAlchemy (actually, asyncpg and Gino) has become a concern.
6 1 Philippe May
7 1 Philippe May
Few techniques are being put in place to tackle this problem.
8 1 Philippe May
9 1 Philippe May
10 1 Philippe May
h2. Use Pandas (Numpy) instead of OO models
11 1 Philippe May
12 1 Philippe May
This is work in progress, but shows improvements of ~ 4 times with few thousands of records already.
13 1 Philippe May
14 1 Philippe May
15 1 Philippe May
h2. Parallel processing
16 1 Philippe May
17 1 Philippe May
Using vector based processing (Pandas) serves as the base for future improvements: parallel processing and shameless code jit compilation.
18 1 Philippe May
19 1 Philippe May
For future reference, see https://towardsdatascience.com/how-i-learned-to-love-parallelized-applies-with-python-pandas-dask-and-numba-f06b0b367138
20 2 Philippe May
21 2 Philippe May
22 2 Philippe May
h2. Geographical clustering
23 2 Philippe May
24 2 Philippe May
Gisaf uploads complete layers; boundary boxes based on the desired visualization, if possible with mapbox, would bring substantial speed ups. Generating vector tiles on the fly, rather than GeoJSON, seems to be the most promising track.