By Jim Shimabukuro (assisted by ChatGPT, Gemini, and Claude)
Editor
(Also see Review of “OpenAI (2023), GPT‑4 Technical Report” [4 March 2024] and A Review of Ouyang et al.’s 2022 Paper aka “InstructGPT”.)
Introduction: For this article, I collaborated with three chatbots: Gemini (Google Bard), ChatGPT (GPT-3.5 free; GPT-4, OpenAI), and Claude (Sonnet 4, Anthropic). I asked each to come up with five seminal works in the development of chatbots. Three were mentioned by two chatbots, so I ended up with a list of twelve. They are listed below by their date of publication. Three were published before 2000, and only one between 2000 and 2014. Five were published between 2015 and 2019, and the remaining three, 2020 and after. Thus, 67% were published in the last ten years.
1. “Computing Machinery and Intelligence” by Turing (1950).
2. “ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine” by Weizenbaum (1966).
3. “Procedures as a Representation for Data in a Computer Program for Understanding Natural Language” by Winograd (Often referred to as the SHRDLU dissertation) (1971).
4. “Social Dialogue With Embodied Conversational Agents” by Bickmore & Cassell (2005).
5. “A Neural Conversational Model” by Vinyals & Le (2015).
6. “Attention Is All You Need” by Vaswani et al (Google Brain Team) (2017).
7. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” by Devlin et al. (2018).
8. “Language Models Are Unsupervised Multitask Learners” by Radford et al. (2019).
9. “A Unified Framework of Five Principles for AI in Society” by Floridi & Cowls (2019).
10. “Language Models Are Few-Shot Learners” by Brown et al. (2020).
11. “Constitutional AI: Harmlessness from AI Feedback” by Bai et al (2022).
12. “Training Language Models to Follow Instructions with Human Feedback” by Ouyang et al. (InstructGPT Paper, 2022).
See the chatbot listings below for details about each work. I’m drawn to the latest, especially Brown et al. and Ouyang et al. I’ll follow up this article with chatbot-generated reviews of these two.
Additional Publication: While reading Ouyang et al., I discovered “OpenAI (2023),” v6 July 2024.* This is a technical report aboutOpenAI’s GPT-4, distributed through the arXiv preprint server, commonly used in computer science and AI fields for rapid open-access publication before or instead of peer-reviewed journal acceptance.
My path to chatbots is probably not unique. It began with the recent addition of AI to the Google search engine in my Windows browser. It was the same tiny door, but it now opens to a whole new world of information. A search leads to an invitation to chat for more detailed and nuanced reports. Curiosity led me to ChatGPT, which I’d previously associated with student cheating. I tested it with a question about a 2022 article speculating on AI hastening the demise of college essays, and the result was “ChatGPT’s Take on Marche’s ‘The College Essay Is Dead,'” published in this journal, ETC, in early June 2025.
This experience opened a floodgate and released a surge of questions that had backed up for decades for want of a convenient, free or low-cost, large language model (LLM) interactive medium such as this. I decided that ETC, which has been more or less dormant for the last five years, might be a useful means for chronicling what appears to be a turning point in the history of technology in education.
It’s worth noting that chatbots, like humans, aren’t perfect. They make mistakes. But the exponential growth we’re witnessing today is directly attributable to progress in reducing errors while improving the range, depth, and value of chatbot services. For example, in a simulated bar exam, OpenAI’s GPT-4 achieved scores in the top 10% among takers. It has also recorded strong performances on standardized tests such as SAT, LSAT, and GRE. These highlights, however, must be balanced by weaker showings in other tasks. The point, though, is that chatbots are improving at a rate that inspires growing confidence.
Perhaps the wisest approach to chatbots is GIGO, garbage in, garbage out. Test chatbots to get a feel for how they respond, and use the results to carefully compose your prompt or series of prompts. This give and take, trial and error, and gradual shaping, this dialogue or chat, is the heart of chatbotting. When the bot’s responses aren’t combining with yours to build toward a logical conclusion or output, then troubleshoot the interaction to determine how to modify your input, your prompts and responses. With repeated trials, you’ll discover that it’s either your fault or the bot’s.
I hope you’ll find this list of works useful in deepening your knowledge of chatbots. -js
Prompt: What are the five most influential publications (articles, books) about chatbots that are guiding their development? Describe the authors and explain your choices. Please provide references, in APA style, that have informed your choices.
Gemini’s response
The development of chatbots, from rudimentary rule-based systems to the sophisticated large language models of today, has been guided by a fascinating evolution of ideas. Here are five influential publications that have significantly shaped this journey:
1. “Computing Machinery and Intelligence” (1950) by Alan Turing
- Author: Alan Turing (1912-1954) was a British mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. He is widely considered to be the father of theoretical computer science and artificial intelligence. His work during World War II, particularly his contributions to cracking the Enigma code, was pivotal.
- Explanation: This seminal paper introduced the concept of the “Imitation Game,” now famously known as the Turing Test. While not directly a blueprint for building chatbots, it laid the philosophical and conceptual groundwork for evaluating machine intelligence and, by extension, conversational ability. Turing’s question, “Can machines think?”, was reframed into a practical test of whether a machine could convincingly imitate human conversation. This set the aspirational goal for much of early AI and chatbot development: to create systems that could engage in human-like dialogue. His foresight in considering the nuances of human-machine interaction, even at a time when computers were in their infancy, is remarkable.
- Influence: The Turing Test [a human being should be unable to distinguish the machine from another human being] has been a guiding star and a point of contention for decades. It propelled researchers to develop more sophisticated natural language understanding and generation techniques, even if many contemporary AI researchers view it as a limited measure of true intelligence. It fundamentally shaped the goal of conversational AI.
2. “ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine” (1966) by Joseph Weizenbaum
- Author: Joseph Weizenbaum (1923-2008) was a German-American computer scientist and a professor at MIT. He was a pioneer in artificial intelligence, but later became a vocal critic of the uncritical acceptance and potential dangers of AI.
- Explanation: ELIZA was one of the earliest and most famous chatbots. Weizenbaum designed it to mimic a Rogerian psychotherapist, primarily by identifying keywords in user input and transforming them into generic, reflective questions. What made ELIZA revolutionary wasn’t its deep understanding (it had none), but its ability to create a convincing illusion of understanding, often leading users to attribute human-like empathy to the program (a phenomenon now known as the “ELIZA effect”). Weizenbaum himself was disturbed by how readily people projected human qualities onto the machine.
- Influence: ELIZA demonstrated the power of relatively simple pattern-matching and substitution rules to generate seemingly intelligent conversation. It provided a tangible, albeit limited, example of human-computer dialogue and spurred both excitement and critical reflection on the nature of AI interaction. It was a foundational piece in showing how natural language processing could be applied to create interactive systems.
3. “Procedures as a Representation for Data in a Computer Program for Understanding Natural Language” (1971) by Terry Winograd (Often referred to as the SHRDLU dissertation)
- Author: Terry Winograd is an American computer scientist, currently a professor emeritus of computer science at Stanford University. He is known for his work in natural language processing and human-computer interaction, and later for his critical perspective on AI and advocacy for “human-centered design.”
- Explanation: Winograd’s SHRDLU program represented a significant leap forward from ELIZA. SHRDLU operated within a limited “blocks world” where it could understand and respond to natural language commands about manipulating virtual objects (e.g., “Pick up the red block,” “Put the blue pyramid on the green cube”). Its significance lay in its integration of natural language understanding with a rich internal model of the world and the ability to perform actions and answer questions based on that model. It used a procedural approach, where knowledge was encoded in procedures that could interpret and respond to commands.
- Influence: SHRDLU showcased the importance of world knowledge and context for truly understanding language. It moved beyond simple pattern matching to a system that could reason, plan, and execute tasks based on its understanding of the environment and the user’s intent. This work was crucial in demonstrating how AI systems could move towards more meaningful and goal-oriented conversations.
4. “Attention Is All You Need” (2017) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin (Google Brain Team)
- Authors: This paper was a collaborative effort by eight researchers from the Google Brain team. These researchers are at the forefront of deep learning and natural language processing, having made significant contributions to the development of modern AI architectures.
- Explanation: This landmark paper introduced the Transformer architecture, a novel neural network architecture that revolutionized natural language processing (NLP). The key innovation was the “attention mechanism,” which allows the model to weigh the importance of different words in a sequence when processing other words, vastly improving its ability to handle long-range dependencies in text. This architecture eliminated the need for recurrent or convolutional layers, making training much more efficient and enabling the development of much larger and more powerful language models.
- Influence: The Transformer architecture is the bedrock of virtually all modern large language models (LLMs) like GPT-3, GPT-4, and the models that power sophisticated chatbots today. Its introduction led to an explosion in performance for tasks like machine translation, text summarization, and, crucially, conversational AI. Without the Transformer, the current capabilities of chatbots would be unimaginable.
5. “Language Models are Few-Shot Learners” (2020) by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Rewon Child, Adam Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei (OpenAI)
- Authors: This paper was written by a large team of researchers from OpenAI, a leading AI research and deployment company. Many of these authors are prominent figures in the field, known for their work on large-scale deep learning models and pushing the boundaries of AI capabilities.
- Explanation: This paper introduced GPT-3 (Generative Pre-trained Transformer 3), a massive language model with 175 billion parameters. Its key finding was that by scaling up the model size and training data significantly, the model exhibited impressive “few-shot learning” abilities. This means it could perform new tasks (like translation, summarization, or even generating code) with only a few examples, without needing extensive fine-tuning. This marked a paradigm shift from models requiring task-specific training to general-purpose language models.
- Influence: This publication demonstrated the emergent capabilities of extremely large language models and solidified the “scaling hypothesis” – the idea that simply making models bigger can lead to qualitative improvements in performance and emergent abilities. It directly paved the way for the current generation of highly capable conversational AI systems, including those used in sophisticated chatbots and virtual assistants, by showing that a single, very large model could handle a vast range of conversational tasks.
References
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
- Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
- Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45.1
- Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language2 (AI Technical Report 235). Massachusetts Institute of Technology, Artificial Intelligence Laboratory.
ChatGPT’s response
The development of chatbots—especially intelligent, conversational agents like ChatGPT—has been shaped by both foundational academic work and high-impact interdisciplinary publications across natural language processing (NLP), human-computer interaction (HCI), AI ethics, and educational technology.
Below are five of the most influential publications guiding current chatbot development. For each, I describe the content, explain its impact, and introduce the key authors behind the work. An APA-style reference list follows.
1. Weizenbaum, J. (1966). “ELIZA—a computer program for the study of natural language communication between man and machine.”
Why it matters:
This is the foundational paper that launched the field of chatbot development. ELIZA was the first chatbot to mimic a therapist using pattern-matching techniques. It sparked early debates about machine-human communication and emotional engagement with machines.
Author:
Joseph Weizenbaum was a professor at MIT and a pioneer in computer science. Despite creating ELIZA, he later became a strong critic of artificial intelligence’s encroachment into human domains.
Legacy: It set the philosophical and ethical stage for all chatbot research that followed.
2. Bickmore, T., & Cassell, J. (2005). “Social dialogue with embodied conversational agents.”
Why it matters:
This work introduced embodied conversational agents (ECAs) that use not only language but also gestures, facial expressions, and social cues. The authors argue that social and relational elements are crucial for user trust and sustained engagement.
Authors:
Tim Bickmore is a professor at Northeastern University focused on health informatics and relational agents.
Justine Cassell, formerly of MIT and CMU, is a pioneer in multimodal interfaces and the use of storytelling in HCI.
Legacy: This work influenced the design of emotionally intelligent and socially responsive agents like modern AI tutors, health bots, and therapy assistants.
3. Vinyals, O., & Le, Q. (2015). “A neural conversational model.”
Why it matters:
This paper introduced the first end-to-end neural network model for conversational agents using sequence-to-sequence learning. It showed how chatbots could learn directly from data, instead of relying on hand-crafted rules.
Authors:
Oriol Vinyals and Quoc Le were both researchers at Google DeepMind. Their work laid the technical groundwork for transformer-based models like ChatGPT, Bard, and Claude.
Legacy: It marks the shift from scripted bots to data-driven, learning-based models with scalable, open-domain capabilities.
4. Radford, A., Wu, J., Amodei, D., et al. (2019). “Language models are unsupervised multitask learners.”
Why it matters:
This is the OpenAI GPT-2 paper, which introduced large language models (LLMs) that could generalize across many tasks without fine-tuning. It redefined the field and established the viability of scaling up models to improve performance.
Authors:
Alec Radford and colleagues at OpenAI were at the forefront of generative AI. Their GPT series evolved into the architecture used in ChatGPT and beyond.
Legacy: It directly shaped the trajectory of scalable, general-purpose conversational AI tools in education, healthcare, and daily life.
5. Floridi, L., & Cowls, J. (2019). “A unified framework of five principles for AI in society.”
Why it matters:
As chatbots became powerful and widely deployed, ethical concerns grew. This paper provides a framework of fairness, accountability, transparency, and governance for AI systems—including conversational agents. It bridges technical design with societal impact.
Authors:
Luciano Floridi is a leading philosopher of technology at Oxford and Yale. Josh Cowls is a research fellow specializing in AI governance. Together, they work on shaping global AI policy.
Legacy: Their work informs the ethical design of chatbots, particularly in education, healthcare, and child development, where stakes are high.
📘 Annotated References (APA Style)
- Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168
Classic work that introduced the first chatbot and ignited debate about the psychological impact of machines simulating empathy. - Bickmore, T., & Cassell, J. (2005). Social dialogue with embodied conversational agents. Advances in Natural Multimodal Dialogue Systems, 27, 23–54. https://doi.org/10.1007/1-4020-3933-6_2
Explores how agents can build long-term relationships with users through social dialogue and multimodal expression. - Vinyals, O., & Le, Q. V. (2015). A neural conversational model. arXiv preprint arXiv:1506.05869. https://arxiv.org/abs/1506.05869
Landmark paper introducing sequence-to-sequence learning for conversational agents. - Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Introduced GPT-2 and demonstrated emergent capabilities in large-scale language models. - Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1). https://doi.org/10.1162/99608f92.8cd550d1
Influential work outlining ethical guidelines essential for AI systems used in education, healthcare, and public services.
Would you like a follow-up list focused on education-specific chatbot research or a comparison of Western vs. Asian approaches to chatbot development in learning contexts?
Claude’s response
I’ll search for current information about influential publications guiding chatbot development, as this is a rapidly evolving field where recent developments are crucial.Based on my knowledge and the search results, here are the five most influential publications that are guiding modern chatbot development:
Five Most Influential Publications in Chatbot Development
1. “Attention Is All You Need” (2017)
Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin (Google Brain/Google Research)
This is the foundational paper that introduced the Transformer architecture, which underlies virtually every major modern chatbot including ChatGPT, Claude, and Bard. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism. The authors revolutionized natural language processing by showing that attention mechanisms alone, without recurrence or convolution, could achieve state-of-the-art results. This architecture enabled the scaling that made large language models possible.
2. “Language Models are Unsupervised Multitask Learners” (GPT-2 Paper, 2019)
Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever (OpenAI)
This paper demonstrated that large language models could perform multiple tasks without task-specific training, simply through next-token prediction. It established the paradigm of emergent capabilities in large models and showed how scaling model size could lead to qualitatively different behaviors – a principle that directly led to ChatGPT’s conversational abilities.
3. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (2018)
Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (Google AI Language)
While BERT itself isn’t primarily used for chatbots, this paper established crucial techniques for pre-training transformer models on large text corpora. The bidirectional training approach and masked language modeling influenced how modern chatbots understand context and meaning in conversations.
4. “Training Language Models to Follow Instructions with Human Feedback” (InstructGPT Paper, 2022)
Authors: Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe (OpenAI)
This paper introduced Reinforcement Learning from Human Feedback (RLHF), the technique that transformed powerful but unwieldy language models into helpful, harmless, and honest conversational agents. This methodology is fundamental to making chatbots safe and useful for general users.
5. “Constitutional AI: Harmlessness from AI Feedback” (2022)
Authors: Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, and Jared Kaplan (Anthropic)
This paper introduced methods for training AI systems to be helpful, harmless, and honest through constitutional principles rather than just human feedback. It has significantly influenced how chatbots are aligned with human values and made safe for deployment.
Rationale for These Choices
These five publications represent the core technological and methodological breakthroughs that enabled modern conversational AI:
- Transformer architecture (Vaswani et al.) provided the fundamental neural network design
- Unsupervised pre-training (Radford et al.) showed how to create general-purpose language understanding
- Bidirectional context understanding (Devlin et al.) improved comprehension capabilities
- Human feedback alignment (Ouyang et al.) made models conversational and useful
- Constitutional training (Bai et al.) addressed safety and reliability concerns
These papers collectively address the technical architecture, training methodology, and safety considerations that define modern chatbot development.
References
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073. This paper informed my understanding of how modern chatbots are trained to be safe and aligned with human values.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. This work provided context for understanding bidirectional language model training techniques.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744. This paper was crucial for understanding how reinforcement learning from human feedback shapes conversational AI behavior.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9. This work informed my understanding of how large-scale language modeling enables emergent conversational capabilities.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. This foundational paper was essential for understanding the transformer architecture underlying modern chatbots.
__________
* OpenAI. (2023, March 4). GPT-4 technical report (arXiv:2303.08774v6). arXiv. https://arxiv.org/html/2303.08774v6. (v6 was last updated in July 2024).
Filed under: Uncategorized |





















































































































































































































































Leave a comment