Machine Learning for Violence Risk Assessment Using Dutch Clinical Notes




Natural language processing, Topic modeling, Electronic health records, BERT, Evaluation metrics, Interpretability, Document classification, LDA, Random forests


Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records are valuable resources capturing unique information, but are seldom used to their full potential. We explore conventional and deep machine learning methods to assess violence risk in psychiatric patients using practitioner notes. The performance of our best models is comparable to the currently used questionnaire-based method, with an area under the Receiver Operating Characteristic curve of approximately 0.8. We find that the deep-learning model BERTje performs worse than conventional machine learning methods. We also evaluate our data and our classifiers to understand the performance of our models better. This is particularly important for the applicability of evaluated classifiers to new data, and is also of great interest to practitioners, due to the increased availability of new data in electronic format.


M. van Leeuwen, J. Harte, Violence against mental health care professionals: prevalence, nature and consequences, J. Forensic Psychiatry Psychol. 28 (2017), 581–598.

M. Inoue, K. Tsukano, M. Muraoka, F. Kaneko, H. Okamura, Psychological impact of verbal abuse and violence by patients on nurses working in psychiatric departments, Psychiatry Clin. Neurosci. 60 (2006), 29–36.

H. Nijman, L. Bowers, N. Oud, G. Jansen, Psychiatric nurses’ experiences with inpatient aggression, Aggress. Behav. 31 (2005), 217–227.

J.P. Singh, S.L. Desmarais, C. Hurducas, K. Arbach-Lucioni, C. Condemarin, K. Dean, et al., International perspectives on the practical application of violence risk assessment: a global survey of 44 countries, Int. J. Forensic Ment. Health. 13 (2014), 193–206.

R. Almvik, P. Woods, K. Rasmussen, The Brøset violence checklist: sensitivity, specificity, and interrater reliability, J. Interpers. Violence. 15 (2000), 1284–1296.

C.Y. Chen, P.H. Lee, V.M. Castro, J. Minnier, A.W. Charney, E.A. Stahl, et al., Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records, Transl. Psychiatry. 8 (2018), 1–8.

C. Colling, M. Khondoker, R. Patel, M. Fok, R. Harland, M. Broadbent, et al., Predicting high-cost care in a mental health setting, BJPsych Open. 6 (2020), E10.

R. Perlis, D. Iosifescu, V. Castro, S. Murphy, V. Gainer, J. Minnier, et al., Using electronic medical records to enable largescale studies in psychiatry: treatment resistant depression as a model, Psychol. Med. 42 (2012), 41–50.

G. Gorrell, S. Oduola, A. Roberts, T. Craig, C. Morgan, R. Stewart, Identifying first episodes of psychosis in psychiatric patient records using machine learning, in Proceedings of the 15th Workshop on Biomedical Natural Language Processing, ACL, Berlin, Germany, 2016, pp. 196–205.

K.J. Moon, Y. Jin, T. Jin, S.M. Lee, Development and validation of an automated delirium risk assessment system (auto-delras) implemented in the electronic health record system, Int. J. Nurs. Stud. 77 (2018), 46–53.

V. Menger, F. Scheepers, M. Spruit, Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text, Appl. Sci. 8 (2018), 981.

V. Menger, M. Spruit, R. van Est, E. Nap, F. Scheepers, Machine learning approach to inpatient violence risk assessment using routinely collected clinical notes in electronic health records, JAMA Netw. Open. 2 (2019), e196709.

P. Mosteiro, E. Rijcken, K. Zervanou, U. Kaymak, F. Scheepers, M. Spruit, Making sense of violence risk predictions using clinical notes, in: Z. Huang, S. Siuly, H. Wang, R. Zhou, Y. Zhang (Eds.), Health Information Science, Springer International Publishing, Cham, Switzerland, 2020, pp. 3–14.

J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, vol. 1, pp. 4171–4186.

A.J. Quijano, S. Nguyen, J. Ordonez, Grid search hyperparameter benchmarking of BERT, ALBERT, and LongFormer on DuoRC, arXiv:2101.06326 [Preprint], 2021, p. 9. 2101.06326

Y. Cui, W. Che, T. Liu, B. Qin, S. Wang, G. Hu, Revisiting pretrained models for Chinese natural language processing, in EMNLP 2020: Findings of the Association for Computational Linguistics, ACL, 2020, pp. 657–668.

N. Vaci, Q. Liu, A. Kormilitzin, F. De Crescenzo, A. Kurtulmus, J. Harvey, et al., Natural language processing for structuring clinical text data on depression using UK-CRIS, Evid. Based Ment. Health. 23 (2020), 21–26.

M. Senior, M. Burghart, R. Yu, A. Kormilitzin, Q. Liu, N. Vaci, et al., Identifying predictors of suicide in severe mental illness: a feasibility study of a clinical prediction rule, Front. Psychiatry. 11 (2020), 268.

R. Rijo, R. Martinho, L. Pereira, C. Silva, Text mining applied to electronic medical records, Int. J. E-Health Med. Commun. 6 (2015), 1–18.

Y. Wang, L. Wang, M. Rastegar-Mojarad, S. Moon, F. Shen, N. Afzal, et al., Clinical information extraction applications: aliterature review, J. Biomed. Inform. 77 (2018), 34–49.

S. Friedman, R. Margolis, O.J. David, M. Kesselman, Predicting psychiatric admission from an emergency room, J. Nerv. Ment. Dis. 171 (1983), 155–158.

J.S. Lyons, J. Stutesman, J. Neme, J.T. Vessey, M.T. O’Mahoney, H.J. Camper, Predicting psychiatric emergency admissions and hospital outcome, Med. Care. 35 (1997), 792–800.

B.L. Cook, A.M. Progovac, P. Chen, B. Mullin, S. Hou, E. Baca-Garcia, Novel use of Natural Language Processing (NLP) to predict suicidal ideation and psychiatric symptoms in a textbased mental health intervention in Madrid, Comput. Math. Methods Med. 2016 (2016), 8708434.

S.H. Huang, P. Le Pendu, S. VIyer, M. Tai-Seale, D. Carrell, N.H. Shah, Toward person-alizing treatment for depression: predicting diagnosis and severity, JAMIA. 21 (2014), 1069–1075.

D. Van Le, J. Montgomery, K.C. Kirkby, J. Scanlan, Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting, J. Biomed. Inform. 86 (2018), 49–58.

T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in 1st International Conference on Learning Representations, Workshop Track Proceedings (ICLR 2013), Scottsdale, AZ, USA, 2013.

Q. Le, T. Mikolov, Distributed representations of sentences and documents, PMLR. 32 (2014), 1188–1196.

A. Rumshisky, M. Ghassemi, T, Naumann, P. Szolovits, V.M. Castro, T.H. McCoy, R.H. Perlis, Predicting early psychiatric readmission with natural language processing of narrative discharge summaries, Transl. Psychiatry. 6 (2016), e921.

T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett. 27 (2006), 861–874.

J. Huang, C.X. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng. 17 (2005), 299–310.

M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks, Inf. Process. Manag. 45 (2009), 427–437.

U. Kaymak, A. Ben-David, R. Potharst, The AUK: a simple alternative to the AUC, Eng. Appl. Artif. Intell. 25 (2012), 1082–1089.

J. Cohen, A coefficient of agreement for nominal scales, Educ. Psychol. Meas. 20 (1960), 37–46.

V. Menger, F. Scheepers, L. van Wijk, M. Spruit, DEDUCE: a pattern matching method for automatic de-identification of dutch medical text, Telemat. Inform. 35 (2018), 727–736.

S. Bird, E. Klein, E. Loper, Natural language processing with python: analyzing text with the natural language toolkit. 2019.

R. Řehůřek, P. Sojka, Software framework for topic modelling with large corpora, in Proceedings of LREC 2010 Workshop on New Challenges for NLP Frameworks, University of Malta, Valletta, Malta, pp. 45–50.

S. Syed, M.R. Spruit, Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation, in 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, pp. 165–174.

C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995), 273–297.

L. Breiman, Random forests, Mach. Learn. 45 (2001), 5–32.

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, et al., Transformers: state-of-the-art natural language processing, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, ACL, 2020, pp. 38–45.

Google Research, Multilingual BERT [repository], 2019. https://

W. de Vries, A. van Cranenburgh, A. Bisazza, T. Caselli, G. van Noord, M. Nissim, BERTje: a Dutch BERT model. 2019.

B. van der Burgh, S. Verberne, The merits of universal language model fine-tuning for small datasets - a case with Dutch book reviews, arXiv:1910.00896 [Preprint], 2019. 1910.00896

L. Joosten, Sentiment Analysis of Dutch Tweets: a Comparison of Automatic and Manual Sentiment Analysis, Bachelor’s Dissertation, Utrecht University, Utrecht, The Netherlands, 2015.

L. Yang, M. Zhang, C. Li, M. Bendersky, M. Najork, Beyond 512 tokens: Siamese multi-depth transformer-based hierarchical encoder for long-form document matching, in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 1725–1734.

R. Suchting, C.E. Green, S.M. Glazier, S.D. Lane, A data science approach to predicting patient aggressive events in a psychiatric hospital, Psychiatry Res. 268 (2018), 217–222.



How to Cite

Mosteiro P, Rijcken E, Zervanou K, Kaymak U, Scheepers F, Spruit M. Machine Learning for Violence Risk Assessment Using Dutch Clinical Notes. JAIMS [Internet]. 2021 May 5 [cited 2023 Feb. 4];2(1-2):44-5. Available from: