iact_event_types issueshttps://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues2021-01-19T16:16:35+01:00https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/7Separate tree for cuts?2021-01-19T16:16:35+01:00Tarek HassanSeparate tree for cuts?*Created by: orelgueta*
The fact the the cuts are saved in a separate tree means it's harder to read the tree in batches as they recommend in the uproot docs. Is there anything we can do or do we need to hack it to work (or continue usi...*Created by: orelgueta*
The fact the the cuts are saved in a separate tree means it's harder to read the tree in batches as they recommend in the uproot docs. Is there anything we can do or do we need to hack it to work (or continue using arrays)?https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/6DL2 variables to use2021-01-19T16:14:35+01:00Tarek HassanDL2 variables to use*Created by: orelgueta*
A few points that might require Gernot's input (add to this list as we go):
- tgrad_x (gradient in samples/deg along long-axis of image) doesn't seem to be filled in the DL2 file. I see only -99 in all entries...*Created by: orelgueta*
A few points that might require Gernot's input (add to this list as we go):
- tgrad_x (gradient in samples/deg along long-axis of image) doesn't seem to be filled in the DL2 file. I see only -99 in all entries. It might be important as this parameter is used in the ED BDT. (I think it is used squared, see [here](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp#L162))
- We are missing the "asym" variable which is used in the training I think.
- The ED BDT uses the width/length values in the training. I am not sure how it is done, since these vectors are per telescope with image. That means the vector length changes for each event. How does the TMVA BDT deal with that? It would be good to use the same inputs in our regressors instead of the reduced width/length.
- In [the BDT input variables](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp#L162) there is also a variable called "wol". I assume it is the width over length, ask for confirmation and add it as a variable in our training?
- The [TMVA code](https://github.com/Eventdisplay/Eventdisplay/blob/771ed53460f69f870f85022147f992cfa28e539b/src/trainTMVAforAngularReconstruction.cpp) used for angular resolution says it uses one MVA per telescope type. I do not understand this, each type, LST, MST, SST has a separate MVA? The results of each are combined afterwards then? Why is it done like that? Is it a way to deal with the variable vector length per event?https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/5Cone angular diff2021-01-11T11:06:23+01:00Tarek HassanCone angular diff*Created by: orelgueta*
There seems to be an issue with the log_ang_diff variable for the diffuse gamma file, see plot below. I would have expected a similar distribution to the on-source file. Or is this expected?
![image](https://u...*Created by: orelgueta*
There seems to be an issue with the log_ang_diff variable for the diffuse gamma file, see plot below. I would have expected a similar distribution to the on-source file. Or is this expected?
![image](https://user-images.githubusercontent.com/22728856/103544599-a1ba3100-4ea0-11eb-854c-3b29c734b431.png)
https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/4Regression performance2021-01-07T11:25:23+01:00Tarek HassanRegression performance*Created by: orelgueta*
Performance isn't great at the moment. At low energies the prediction is quite poor. At higher energies we see some improvement, but it's still might not be good enough (see plots below).
Questions/ideas:
- Se...*Created by: orelgueta*
Performance isn't great at the moment. At low energies the prediction is quite poor. At higher energies we see some improvement, but it's still might not be good enough (see plots below).
Questions/ideas:
- Search for more useful variables.
- Would feature selection in each energy bin improve things (I doubt it)?
- Why do we have a bias in our predictions? It seems like we generally predict better PSF than the true one. Such a "consistent" bias points to a problem in the logic or a bug, no?
- Would more events for training or using diffuse gamma improve things?
Plots below are for `gamma_onSource.S.3HB9-FD_ID0.eff-0.root`.
![image](https://user-images.githubusercontent.com/22728856/103529490-103dc580-4e86-11eb-9a10-2a7d248d5419.png)
![MLP_small](https://user-images.githubusercontent.com/22728856/103529588-37949280-4e86-11eb-9065-53e24b1f247b.png)
https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/2Classification vs regression2021-01-11T10:49:09+01:00Tarek HassanClassification vs regression*Created by: TarekHC*
We can use two approaches: multi-class classification and regression
Multi-class classification:
The performance of most algorithms is really bad (roughly 35-40% precision), but generally I chose the algorithms...*Created by: TarekHC*
We can use two approaches: multi-class classification and regression
Multi-class classification:
The performance of most algorithms is really bad (roughly 35-40% precision), but generally I chose the algorithms that if they don't label properly an event, they are usually relatively close:
Each of these plots is a different energy bin in log scale, each showing the confusion matrix of the classifier: the Y axis are the true event types and the X axis the predicted one.
![imagen](https://user-images.githubusercontent.com/7864276/100619339-49ca6080-331d-11eb-9b9e-98a509926719.png)
As you can see, it seems the "bad" events are generally well labeled across all energies (event type 3), while best events are more or less also well labeled. The intermediate event types seem rather random to me... But we will probably need to wait for the IRFs to see how good the separation really is. Best algorithm seems to be a One vs One ensemble of random forest classificators.
Regression:
Instead of just dividing into 4 groups, we can also try to estimate the expected angular difference between true and reconstructed direction. For that, I used the same variables as in the previous step.
Following a similar approach as before, I show the true (Y) vs reconstructed (X) log10(angular difference):
![imagen](https://user-images.githubusercontent.com/7864276/100619370-52bb3200-331d-11eb-94ae-f452f1ecec20.png)
For the moment the best classification is given by a Ridge linear regression, but I probably need to play around more.
The good thing of performing a regression is that we can decide the statistics falling into each event type during IRF production, while in the case of classification we can only control the training statistics. I have not compared yet which classification method provides better classifications, but it will be trivial to do.
https://gitlab.cta-observatory.org/thassan/iact_event_types/-/issues/1Feature selection2021-01-11T20:44:14+01:00Tarek HassanFeature selection*Created by: TarekHC*
The first thing I did was select the parameters that better separate the event types (from the PSF class, dividing the events through the 4 quartiles of the angular difference between true and reconstructed directi...*Created by: TarekHC*
The first thing I did was select the parameters that better separate the event types (from the PSF class, dividing the events through the 4 quartiles of the angular difference between true and reconstructed direction).
![imagen](https://user-images.githubusercontent.com/7864276/100619039-e0e2e880-331c-11eb-8ede-597388092de8.png)
As I define the event types as a function of the reconstructed energy, I chose the following variables to be used for the training:
log_reco_energy = log10 of the reco energy
log_NTels_reco = log10 of the number of telescopes used
array_distance = distance to the array center
img2_ang = Not sure how it is defined... angle between the showers of the second brighter telescope pair? No clue...
log_SizeSecondMax = log10 of the size of the second brighter image