• Advestis

Accelerating Pandas concatenation

P. Cotte, Towards Data Science, November 26th, 2020.

Abstract: I was recently faced with the problem of concatenating a fair amount of MultiIndexed Pandas Series (stacked DataFrames) into one single DataFrame. This can take a fair amount of time if you have many and/or large Series, and because of the MultiIndex, Dask cannot be used. I first present a sample of code using Dask’s logic to concatenate the Series pairwise in parallel jobs. I then show the acceleration performance for different parallel processing methods, for several number of CPUs ranging from 8 to 80 and several number of Series ranging from 10 to 90.