AI and Human Reasoning: Qualitative Research in the Age of Large Language Models

Muneera Bano; Didar Zowghi; Jon Whittle

doi:10.47289/AIEJ20240122

AI and Human Reasoning: Qualitative Research in the Age of Large Language Models

Autores

Muneera Bano CSIRO's Data61
Didar Zowghi CSIRO's Data61
Jon Whittle CSIRO's Data61

DOI:

https://doi.org/10.47289/AIEJ20240122

Palavras-chave:

Qualitative Research, ChatGPT, GPT4, large language model

Resumo

Context: The advent of AI-driven large language models (LLMs), such as ChatGPT 3.5 and GPT-4, have stirred discussions about their role in qualitative research. Some view these as tools to enrich human understanding, while others perceive them as threats to the core values of the discipline.

Problem: A significant concern revolves around the disparity between AI-generated classifications and human comprehension, prompting questions about the reliability of AI-derived insights. An “AI echo chamber” could potentially risk the diversity inherent in qualitative research. A minimal overlap between AI and human interpretations amplifies concerns about the fading human element in research.

Objective: This study aimed to compare and contrast the comprehension capabilities of humans and LLMs, specifically ChatGPT 3.5 and GPT-4.

Methodology: We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. ChatGPT 3.5 and GPT-4 were then asked to classify these reviews and provide the reasoning behind each classification. We compared the results with human classification and reasoning.

Results: The research indicated a significant alignment between human and ChatGPT 3.5 classifications in one-third of cases, and a slightly lower alignment with GPT-4 in over a quarter of cases. The two AI models showed a higher alignment, observed in more than half of the instances. However, a consensus across all three methods was seen only in about one-fifth of the classifications. In the comparison of human and LLMs reasoning, it appears that human analysts lean heavily on their individual experiences. As expected, LLMs, on the other hand, base their reasoning on the specific word choices found in app reviews and the functional components of the app itself.

Conclusion: Our results highlight the potential for effective human-LLM collaboration, suggesting a synergistic rather than competitive relationship. Researchers must continuously evaluate LLMs’ role in their work, thereby fostering a future where AI and humans jointly enrich qualitative research.

AI and Human Reasoning: Qualitative Research in the Age of Large Language Models

AI and Human Reasoning: Qualitative Research in the Age of Large Language Models

Autores

DOI:

Palavras-chave:

Resumo

Downloads

Publicado

Como Citar

Edição

Seção

Licença