• Advestis

Using Non-negative matrix factorization to classify companies

C. Geissler, Towards Data Science, October 1st, 2020.

Abstract: Companies are complex entities that evolve over time. As a data-scientist involved in investment, I have long been asking myself the question of evaluating the most appropriate dimension for modeling enterprise data: in what space do these things live? No better answer could be found than this one: “Far too many!”. Another formulation of the question is to ask how many independent criteria are sufficient to characterize a company. As an example of criteria, we can think of market capitalization, industrial sector, number of employees, level of current earnings, carbon footprint, opinion of financial analysts, and much more: the number of available criteria easily exceeds one hundred. These criteria are not independent but, on the contrary, linked by a whole network of implicit correlations. This makes the choice of relevant subsets of variables a very hard task.