Detecting AI Authorship: Analyzing Descriptive Features for AI Detection | CEUR-WS | Nov 2023Motivated by the growing role of AI in text generation and the potential misuse of generative tools, this study investigates key features that differentiate AI-generated text from human-authored content. We produce a corpus of AI-generated counterparts to 2.100 research paper abstracts, in order to compare formal linguistic and stylometric characteristics such as perplexity, grammar, n-gram distributions and function word frequencies between human- and AI-generated texts. Key findings indicate that human-written abstracts tend to exhibit higher perplexity, greater grammatical error, and more diverse n-gram distributions. To distinguish between the two types of texts we employ various machine learning algorithms, with our Random Forest implementation achieving a precision of 0.986 on unseen data. Notably, feature importance analysis reveals that perplexity, grammar, and n-gram distributions are highly influential in AI-detection classification. Our research contributes a nuanced study of discriminating characteristics of AI-generated text to the increasingly important field of AI authorship attribution.