Classification vs regression

Created by: TarekHC

We can use two approaches: multi-class classification and regression

Multi-class classification: The performance of most algorithms is really bad (roughly 35-40% precision), but generally I chose the algorithms that if they don't label properly an event, they are usually relatively close:

Each of these plots is a different energy bin in log scale, each showing the confusion matrix of the classifier: the Y axis are the true event types and the X axis the predicted one.

As you can see, it seems the "bad" events are generally well labeled across all energies (event type 3), while best events are more or less also well labeled. The intermediate event types seem rather random to me... But we will probably need to wait for the IRFs to see how good the separation really is. Best algorithm seems to be a One vs One ensemble of random forest classificators.

Regression: Instead of just dividing into 4 groups, we can also try to estimate the expected angular difference between true and reconstructed direction. For that, I used the same variables as in the previous step.

Following a similar approach as before, I show the true (Y) vs reconstructed (X) log10(angular difference):

For the moment the best classification is given by a Ridge linear regression, but I probably need to play around more.

The good thing of performing a regression is that we can decide the statistics falling into each event type during IRF production, while in the case of classification we can only control the training statistics. I have not compared yet which classification method provides better classifications, but it will be trivial to do.