We transformed probability scores obtained from our authorship verification method to binary answers and considered scores greater than the calculated threshold as a positive answer (i.e., the known and questioned documents are by the same author), and scores lower than the calculated threshold as a negative answer (i.e., the known and questioned documents are by different authors). In our studies, to calculate a threshold for an author at a certain cognitive load (CL), this text was always compared against the text from the same author with CL 1, and a cosine similarity between these two was calculated and used as the threshold. We explain our research method and findings in detail in our article. I'll try to keep this post less technical.
Our findings (and details from our study/investigation) were published at ASCILITE 2022. In short, our results showed that authorship verification methods could provide good results for academic writings with varied cognitive loads; we could identify when texts were written by the same student or not in most of the investigated cases (80% accuracy).
A few days ago we submitted a new paper from new evaluations on the use of automated authorship verification to validate software engineering students’ assessments (e.g., text artefacts and/or reports) through their writing styles. This one is still under review so I won't anticipate/share as much about it as I'd love to. However, if we are right, the results from this study suggest that the authorship verification approach could be successfully used in software engineering education to mitigate academic cheating issues.
Why is this important and what are the implications of our findings to academic cheating?
These findings have important implications for the evaluation of academic cheating in higher education (and other educational environments). Combined with anti-plagiarism tools such as Turnitin, authorship verification methods can support educators to identify academic cheating (as we can extract stylistic writing features from students AND from ChatGPT to build consistent and recognisable profiles, much like a 'fingerprint'). ChatGPT and other AI-powered tools are becoming popular among students but authorship verification methods are also growing in popularity among educational researchers and institutions. In future, as AI tools become more humanised and powerful, authorship verification methods will also continue to evolve and perform better (as it has been happening in the past few years).
What's next for us and our research?
Whether authorship verification in educational settings is used as a tool to educate students, to detect misconduct, or a combination of both, I strongly believe it is here to stay and could be used to promote better education and reflections around ethics and educational issues. Will authorship verification address the current problem? Not completely! I don't think we have a silver bullet for this.
Instead, as I believe AI-powered tools and many new technologies will continue to be available to us all, our focus in the educational context here should continue (or shift) to educational practices and processes. How can we include these technologies to promote an authentic assessment of learning process?
Here at The University of Melbourne, we are currently working on the development of a dataset that combines real data from the answers of 50+ students together with AI-generated answers for the same questions. In a few weeks we hope to have an even better understanding on the extent to which authorship verification methods can detect answers generated by (the same) students, and if we can identify what's generated by AI. Watch this space :)
Is ChatGPT and Authorship Verification the new King Kong vs GodZilla battle in academic cheating? Maybe! Not ChatGPT specifically but any similar AI-generated tool. In any case, get your bucket of popcorn - it's a good one to be watching closely from now on.
In the meantime, I hope others will find the information above useful. Please reach out to me with your own reflections and suggestions. I would love to incorporate extra suggestions into this post.
This text was really written/generated by me, Eduardo :) No ChatGPT was used in this article.
 V. Keselj, F. Peng, N. Cercone, and C. Thomas, “N-gram-based author profiles for authorship attribution,” Proc. of the Pacific Association for Computational Linguistics, pp. 255—-264, 2003
 Calix, K., Connors, M., Levy, D., Manzar, H., MCabe, G., & Westcott, S. (2008). Stylometry for e-mail author identification and authentication. Proceedings of CSIS Research Day, Pace University, 1048–1054.
 Abbasi, A., & Chen, H. (2008). Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems (TOIS), 26(2), 7.
 Holmes, D. I., & Kardos, J. (2003). Who was the author? An introduction to stylometry. Chance, 16(2), 5–8.
 M. Koppel and J. Schler, “Authorship verification as a one-class classification problem,” ICML ’04 Proceedings of the twenty-first international conference on Machine learning, p. 62, July 2004.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive science, 12(2), 257–285.
Beilock, S. L., & DeCaro, M. S. (2007). From poor performance to success under stress: Working memory, strategy selection, and mathematical problem solving under pressure. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 983.
Groen, G. J., & Parkman, J. M. (1972). A chronometric analysis of simple addition. Psychological Review, 79(4), 329.
Parkman, J. M., & Groen, G. J. (1971). Temporal aspects of simple addition and comparison. Journal of Experimental Psychology, 89(2), 335.
Trezise, K., & Reeve, R. A. (2014). Cognition-emotion interactions: Patterns of change and implications for math problem solving. Frontiers in Psychology, 5, 840. https://doi.org/doi: 10.3389/fpsyg.2014.00840.
Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., … Wittrock, M. C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives, abridged edition. White Plains, NY: Longman.
Brizan, D. G., Goodkind, A., Koch, P., Balagani, K., Phoha, V. V., & Rosenberg, A. (2015). Utilizing linguistically enhanced keystroke dynamics to predict typist cognition and demographics. International Journal of Human-Computer Studies, 82, 57–68. https://doi.org/10.1016/j.ijhcs.2015.04.005
Crossley, S. A., Kyle, K., & Dascalu, M. (2019). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51(1), 14–27. https://doi.org/10.3758/s13428-018-1142-4
Lu, X., & Ai, H. (2015). Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16–27. https://doi.org/10.1016/j.jslw.2015.06.003
Hunt, K. W. (1965). Grammatical Structures Written at Three Grade Levels. NCTE Research Report No. 3. Retrieved from https://eric.ed.gov/?id=ED113735
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.