Performance » History » Version 2
Philippe May, 30/03/2019 03:05
1 | 1 | Philippe May | h1. Performance |
---|---|---|---|
2 | 1 | Philippe May | |
3 | 1 | Philippe May | Gisaf is written basically as a OO and asynchronous way. |
4 | 1 | Philippe May | |
5 | 1 | Philippe May | For manipulating potentially large datasets, the performance of SqlAlchemy (actually, asyncpg and Gino) has become a concern. |
6 | 1 | Philippe May | |
7 | 1 | Philippe May | Few techniques are being put in place to tackle this problem. |
8 | 1 | Philippe May | |
9 | 1 | Philippe May | |
10 | 1 | Philippe May | h2. Use Pandas (Numpy) instead of OO models |
11 | 1 | Philippe May | |
12 | 1 | Philippe May | This is work in progress, but shows improvements of ~ 4 times with few thousands of records already. |
13 | 1 | Philippe May | |
14 | 1 | Philippe May | |
15 | 1 | Philippe May | h2. Parallel processing |
16 | 1 | Philippe May | |
17 | 1 | Philippe May | Using vector based processing (Pandas) serves as the base for future improvements: parallel processing and shameless code jit compilation. |
18 | 1 | Philippe May | |
19 | 1 | Philippe May | For future reference, see https://towardsdatascience.com/how-i-learned-to-love-parallelized-applies-with-python-pandas-dask-and-numba-f06b0b367138 |
20 | 2 | Philippe May | |
21 | 2 | Philippe May | |
22 | 2 | Philippe May | h2. Geographical clustering |
23 | 2 | Philippe May | |
24 | 2 | Philippe May | Gisaf uploads complete layers; boundary boxes based on the desired visualization, if possible with mapbox, would bring substantial speed ups. Generating vector tiles on the fly, rather than GeoJSON, seems to be the most promising track. |