An AI Professor at Harvard: ChatLTV
This last semester, I ran a few generative AI experiments
Course Background
LTV is a course I created thirteen years ago in partnership with my colleague Professor Tom Eisenmann. This year, there were three sections of the course with over 80 Harvard MBA students each so a total of roughly 250 students. The course is taught using the case method
As part of the course material, I have written over 50 HBS cases and teaching notes, two books, and numerous book chapters. There are also dozens of PowerPoint slide decks and Excel spreadsheets that we use in the course. Thus, there is a large corpus of material tied to the course that has built up over our thirteen years. I also created an online version of the course last year which required me to write out a precise transcript, including a detailed glossary of various frameworks and acronyms that the course covers, as well as a large number of video interviews with case protagonists. Finally, I've been writing blog posts about entrepreneurship for nearly 19 years. The importance of this large corpus will become clear shortly.
A final important background point about the course is that we have a course Slack and require our students to post reflections in Slack as part of their grade as well as use Slack to share various course materials. This use of Slack had several benefits: first, we have three years' worth of Q&A content from the course Slack, and, second, Slack is emphasized as a part of the student workflow.
Building a Generative AI Chatbot: Chat LTV
With a small team, we developed a Slack-based chatbot called ChatLTV, which served as a faculty co-pilot throughout the semester. ChatLTV was trained on the entire corpus of my course -- including all the case studies, teaching notes, books, blog posts, and historical Slack Q&A mentioned above -- as well as selected and curated publicly available material. In total, the corpus contained roughly 200 documents and 15 million words.
We embedded ChatLTV into the course Slack in the form of a Slack app, allowing each of our 250 students to engage with the chatbot either privately or publicly. If the student chose for the engagement to be private, only the student and the faculty could see the interactions. If posted publicly, everyone in either the section or the full course could see the interaction.
Our technical approach was to respond to a student's query by providing an LLM (in our case, we chose OpenAI's ChatGPT v4) with two pieces of information: (a) the question being asked, (b) relevant context that the LLM can use to answer the question. The relevant context was retrieved from the corpus, which was stored in a vector database (in our case, we chose Pinecone). The most relevant content chunks were then served to the LLM using OpenAI's API. This technique is known as Retrieval Augmented Generation
An architecture for ChatLTV can be found here:
After a great deal of trial and error (see testing below), we settled on the following LLM prompt to provide an answer with the relevant chunked content as context: “You are a world-class algorithm to answer questions in a specific format. You use the context provided to answer the question and list your sources in the format specified. Do not make up answers.”
Since HBS has a copyright on much of the content being used, it was paramount that we ensure that the content would not flow into the public domain. Rather than using the OpenAI APIs directly, we used Microsoft's Azure OpenAI Service for both development and production use. Leveraging Azure allows us to take advantage of the security, privacy, and compliance benefits, as well as guarantee that the data fed into the service is not used to retrain models that are then made available to others. The content is itself stored within the Pinecone Vector Database, which is SOC2 Type II compliant, and only relevant segments of content (e.g., a few paragraphs of a particular case or teaching note) are sent to the Azure OpenAI Service depending on the query that is made. During the course of our development, Harvard made a private LLM available to faculty and we anticipate porting ChatLTV over to it.
The total ChatLTV code base was 8000 lines for the backend (including 800 lines for RAG and 900 lines for content indexing, and then backend APIs, tests, and deployment code). We also created a content management system (CMS) that allowed faculty to add or delete additional content and observe student queries. That CMS was 9000 lines of code and a simple web-based application. The importance of the CMS will become clear later. The code was written over the course of the late spring and summer and took roughly 2-3 person months. If written today, with the rapid improvement in the underlying development tools, the code base would be substantially smaller (perhaps half the size) and the person months similarly smaller.
We also made ChatGPT4 publicly available to the students in the course Slack alongside ChatLTV. That way, students could use the public chatbot or the course chatbot, depending on their needs. In addition to providing the answers through Slack, the LLM shared the document sources for the answers in the Slack reply to the answer so that students could see the source material references.
Training and Testing the GPT, Adding Admin Content
Given the inherent probabilistic and nondeterministic nature of LLMs and the large body of text involved in the inputs and outputs, the development of an LLM app is an iterative process
We also ran an automated evaluation by using OpenAI to compare the outputs to the ground truth data and generate a quality score as well as manual testing noted above. The mix of manual and automated testing allowed us to play around with our prompts (i.e., prompt engineering
Results: Student Experience
We launched the chatbot at the start of the semester in early September and used it throughout the semester, ending just last week. From my standpoint, the experiment was a smashing success. Throughout the semester, students found ChatLTV to be an invaluable resource for course preparation. They used the chatbot to ask clarifying and evaluative questions about case studies, analysis, acronyms, and a full range of administrative matters. Students expressed a high level of interest and excitement for the chatbot and described it as a valuable tool for enhancing their learning experience. A few quotes from a post-course evaluation:
I loved it -- I found that I could use it to check my answers but more importantly understand if my methodlogy was directionally correct, which helped me get farther in my case prep. I loved that I could use it almost like a professor by my side as I worked through the questions, and I feel like it definitely helped me learn the content better.
Recommended by LinkedIn
It was nice to have a walled garden of content that we could know and trust to be used in tandem with other resources, including ChatGPT.
Over half our students -- roughly 170 -- made over 3000 queries over the course of the semester of ChatLTV. The course has 28 sessions, including 24 cases (versus exercises). Thus, there were roughly 130 queries made per case. When surveyed, nearly 40% of the students who used the chatbot gave it a quality score of a "4" or "5". The usage and quality were frankly higher than I had anticipated. I was thrilled with both.
Interestingly, of the over 3000 queries, only a dozen or so were made using the public channel versus the private channel on Slack. Thus, 99% of our students elected for private queries with the chatbot rather than allow their peers to see what they were asking in advance of their case preparation.
Results: Faculty Experience
Perhaps most surprising to me over the course of this semester was the faculty experience. I had two fears: (1) ChatLTV would see no usage after all this work, or (2) students would use ChatLTV in a way that diminished the quality of the in-class conversation (e.g., getting "the answers" from the chatbot and spitting them back in a rote fashion). The latter was not at all the case. In fact, the quality of the in-class case conversation was excellent. Students appeared to have used the chatbot to prepare effectively for the case conversation and advance their understanding of the material, as noted in the student quotes above. When students provided answers to analytical questions that they were provided by ChatLTV, the faculty was able to push them to deconstruct their assumptions, methodology, and strategic implications rather than waste class time with "doing the math".
Most interestingly, as a faculty member, I had a unique window into what my students were asking about before walking into the classroom. Each morning before class, I would inspect the admin CMS to see what queries had been made by what students (typically the night before -- ChatLTV usage seemed to be most active between 10pm-2am!). From that resource, I had a unique opportunity to peer inside their minds and appreciate where they were at in terms of their comfort and knowledge of the material for that day and beyond.
A few examples will illustrate this point:
These and countless other examples demonstrated that ChatLTV was a useful tool not just for our students, but for me as a faculty member trying to meet my students where they were at any given moment to assist them in their individualized learning journeys.
Bonus: HBS LTV Project Feedback, a custom GPT
OpenAI launched a powerful new feature a few weeks ago, called custom GPTs. Using no code, Custom GPTs can create customized versions of ChatGPT, trained and tuned for a particular skill.
At the end of the semester (i.e., last weekend), I decided to create a custom GPT called "HBS LTV Feedback", a critical academic evaluator to provide feedback on LTV final course papers and startup ideas. The final project requires students to apply a course tool to a startup of their choice, often their own, and write their reflections and takeaways from the experience.
Typically, students work in teams of two. Thus, with over 250 students, we will receive around 125 papers. We grade them all, but historically we simply don't have the time to provide them with tangible written feedback on the quality of their paper or the quality of the idea. Thus, a custom GPT project evaluator.
It took me two hours to set up and train the custom GPT and zero lines of code. The functionality is ridiculously easy to use. As one of my genAI portfolio company founders likes to put it, "English is the cool new programming language for software."
The results were excellent. I had to prompt the GPT to be tougher and more critical than its instincts might normally be (LLMs are way softer than HBS professors -- in the face of widespread grade inflation, we still grade on the same forced curve from decades and decades ago). Students seemed happy with the results. Two feedback comments that were indicative:
Thank you so much for sending feedback. Honestly, will incorporate this feedback into the document ASAP since I am actually going to use this for real tests to validate the idea and business model over the next couple of months.
Thanks for the feedback and explanation - both human and in silico. A couple of the GPT points are pretty helpful! Especially the 2 areas for improvement for the project and startup idea.
One of my HBS faculty colleagues joked that this represented a historic moment as he believes no faculty in the 100+ year history of HBS has ever provided tangible, written, constructive feedback
Conclusion
This semester was a fun experiment. There is an enormous amount of usage of AI across the HBS faculty and curriculum and the school is racing ahead to embrace the tools even more on behalf of our students. Hopefully, this write-up will inspire other faculty around the world to run their own experiments.
Thank Yous
The ChatLTV project team consisted of Saswat Panda, Chiyoung Kim, Laura Whitmer, and Robin Lobo. Saswat was a total hero in writing every line of code. Special thank you to HBS' administrative and IT leaders for allowing me to run this experiment and taking the risks associated with it, particularly Prof Mitch Weiss. Also thanks to my LTV faculty colleagues Lindsay Hyde and Christina Wallace. Finally, thank you to our 250 LTV students from 2023 as well as the 2000+ students who have taken the course over the last 13 years. None of us would be here if not for you.
Private Equity Investor at American Securities
11moReally interesting use cases - Would love to use a GenAI tool that also gives feedback on communication skills and pitches!
"GenAI Agent/Agency Architect | Innovating Real-Time Systems for Robots and Automation"
11moThis marks the beginning of the reinvention of school/college 😎 Agree that "English is the cool new programming language for software." and soon hardware design also!
Founder, CEO @ OpenWater (4x founder with multiple exits)
1yLove this experiment.
Rutgers Universty, New Brunswick
1yBrent Lollis
Helping business leaders to find and win new business.
1yThanks for sharing