Project

General

Profile

Performance » History » Version 1

Philippe May, 30/03/2019 03:00

1 1 Philippe May
h1. Performance
2 1 Philippe May
3 1 Philippe May
Gisaf is written basically as a OO and asynchronous way.
4 1 Philippe May
5 1 Philippe May
For manipulating potentially large datasets, the performance of SqlAlchemy (actually, asyncpg and Gino) has become a concern.
6 1 Philippe May
7 1 Philippe May
Few techniques are being put in place to tackle this problem.
8 1 Philippe May
9 1 Philippe May
10 1 Philippe May
h2. Use Pandas (Numpy) instead of OO models
11 1 Philippe May
12 1 Philippe May
This is work in progress, but shows improvements of ~ 4 times with few thousands of records already.
13 1 Philippe May
14 1 Philippe May
15 1 Philippe May
h2. Parallel processing
16 1 Philippe May
17 1 Philippe May
Using vector based processing (Pandas) serves as the base for future improvements: parallel processing and shameless code jit compilation.
18 1 Philippe May
19 1 Philippe May
For future reference, see https://towardsdatascience.com/how-i-learned-to-love-parallelized-applies-with-python-pandas-dask-and-numba-f06b0b367138