Organizations typically use simulation campaigns to train employees to detect phishing emails but are non-personalized and fail to account for human experiential learning and adaptivity. We propose a method to improve the effectiveness of training by combining cognitive modeling with machine learning methods. We frame the problem as one of scheduling and use the restless multi-armed bandit (RMAB) framework to select which users to target for intervention at each trial, while using a cognitive model of phishing susceptibility to inform the parameters of the RMAB. We compare the effectiveness of the RMAB solution to two purely cognitive approaches in a series of simulation studies using the cognitive model as simulated participants. Both approaches show improvement compared to random selection and we highlight the pros and cons of each approach. We discuss the implications of these findings and future research that aims to combine the benefits of both methods for a more effective solution.