Explanation-based Learning with Feedforward Neural Networks

This thesis primarily attempts to incorporate an argument loss in feedforward networks, in order maximize the contributions of selected input attributes to the respective prediction.

Completed Master Thesis

The idea of this thesis is to adapt Možina et al.’s method of exploiting experts’ arguments to neural networks.  This is done by incorporating the method of Ross et al. to include an explanatory loss, which penalizes attention on the wrong features. More specifically the novelty of this work is to focus on both positively and negatively influencing features depending on experts’ arguments. Contributions are computed using the shapley value and approximations thereof, with the goal in mind to achieve improved training performance as well as improving the focus on the correct features. Additionally, by concentrating the neural network on features in the experts’ arguments, explanations generated for predictions of our network should be more similar to those of experts without relying on unfamiliar dependencies.

Supervisors

To the top of the page