Poor performance when writing directly to fefs disk
Profiling studies of datasequence.py
shows that half of the time is spent in provenance capture, and in particular writing the logs into the prov.log
file. We could greatly improve performance of provenance capture if we log the provenance for each run instead of for each sub-run (that is how we do now), which could greatly reduce the time spent by the whole pipeline in the datasequence.py
step.
For this some refactoring in the /provenance/capture.py
and scripts/provprocess.py
files is needed. We would still have prov.log
files for each day, though much more smaller in size.
Edited by Daniel Morcuende