Can GenAI write code? A real story
There is much talk about Generative AI (GenAI) taking over programmer jobs, with headlines like:
"The Death of Programmers" or
"Software Developer Job Will Be Obsolete Next Year."
As a programmer with many years of experience, a Machine Learning Scientist, and an AI thought leader, I want to put this coding premise to the test. This article is not the result of a one-day quick look, a cursory try-out with a toy implementation, a cruise through scholarly papers, nor a reading of self-appoint AI experts’ articles.
The following story results from two months of using GenAI in my daily Machine Learning programming job. I use Python on Jupyter Notebook/Lab and work mainly in CNN classification, Stable Difussion, including the XL model, GPT4, Llama Index, and the latest Meta/Facebook Llama 3B.
The GenAI of choice is the Google Colab AI, powered by Google Codey, CoPilot by OpenAI and Microsoft, and Amazon CodeWhisper—the three big boys for enterprise companies.
My overall arching goal is [1] to use GenAI to improve the quality of my coding, [2, maybe] to shorten the development time, and above all else, [3] to provide more time for hugging trees. :-)
Let's get started.
Welcome
Welcome new friends and fellow readers to the latest article in the Demystified AI series. This article will be returning to the “[AI] how to” theme. It focuses on the actual story of GenAI writing and assisting in writing code.
Fun Fact: Did you know that GenAI, Leonardo.ai DreamSharper-7 model creates the cover image of this article? I feed in the article title and a few choice adjectives, Figure 1. It got it right on the first try. In the past (about three months ago), I have had to try dozens and even hundreds of times with various “prompt engineering” phrases before I select a suitable picture. If AI continues progressing as projected, by the time I write the next article, could an AI read my thoughts and draw the picture while I write the essay?
I enjoy writing essays like this because they clarify the meaning of the "AI Demystify series" title. Demystifying means showing how to do something rather than just explaining it. Providing my opinion and interpreting the data and finding can be beneficial, but it is essential to consider intentional or unintentional biases. By showing my process, findings, along with my insights, we can have a friendly conversation and share stories.
It is so much more fun to talk with you than throwing dogmatic opinions at each other.
This GenAI article will cover the following topics:
We begin our story with the character and his struggle, i.e., the goal.
The Goal
We will describe our quest before leaving the comfort of our coding world to venture into the uncharged GenAI realm full of delightful secrets and monstrous bugs alike, Figure 2.
My goal as a programmer, whether experienced or novice, is to improve my code quality consistently. It entails writing clean, well-documented code, utilizing coding patterns, and adhering to standardized coding conventions, among other things.
Most will say the secondary goal should be the first or primary goal: to shorten the coding time or, in other words, be more productive. I empathize with why managers want faster coding as a success matrix. It is easy to measure and directly impacts the cost and revenue equation. However, in a holistic view, quality code leads to shorter maintainability of the code base, i.e., fewer bugs to debug.
For a quest, you need more than the act of heading out to sea. Similarly, with GenAI to focus on my goals, I use the following three projects with verifiable output:
I have completed these three tasks before but without the assistance of GenAI. I want to explore a lot of "what if" scenarios.
How I conduct the test is a good lead into the next section, the philosophy.
The Philosophy
I approach this task with the mindset of having a collaborator, not a replacement or a jokester. I do not seek reasons to criticize or belittle their accomplishments or viewing through rose-colored glasses.
A colleague once shared a metaphor for treating AI like a horse instead of a bicycle. Both modes of transportation will take you to a destination, but you cannot expect a horse to behave or operate like a bicycle. By adopting the horse metaphor, you can enhance your understanding of AI to a deeper level, Figure 3.
Using GenAI may take longer at first, but it’s worth pushing past my comfort zone to learn. The greatest challenge is to unlearn what you've grown used to.
Beginner programmers starting with GenAI may have an advantage. They can freely explore new ideas, while I may dismiss them due to old prejudices.
Before we start our journey, let’s review the process and the tools.
The Process
We know what to do, our goals, and we know how to do it, our philosophy. This section describes the software tools and the setup.
I am using the GenAI to write GenAI code, i.e., Convolutional Neural Network (CNN) and Large Language Model (LLM) based on a transformer algorithm. Thus, there is only one choice for my tech stack.
Jupyter Notebook runs on Python 3.10+, Figure 4. I use Jupyter "Notebook" interchangeably with Jupypter "Lab." Many online options are available, and installing it on your device is simple. A powerful Nvidia GPU with 24+ GB of GPU RAM and 48+ GB of CPU RAM is required.
If GenAI is not integrated into Jupyter Notebook, I set up a dual-screen display with one side having the Notebook and the other having the GenAI.
I use the MacBook Pro and Edge browser with a Google Colab Pro+ account, which gives me access to Linux VM with NVidia GPU 40 GB or GPU RAM and 128 GB or CPU RAM.
I have installed Jupyter Labs on my MacBook and can access several online notebooks, including Kaggle Notebook and Microsoft Azure AI Notebook. However, I prefer using Google Colab due to its similarity, even though it may not have significant hardware advantages.
Codey, the AI integrated into Google Colab, offers a more efficient experience by seamlessly integrating with Jupyter Notebook. This integration allows Codey to read comments and code within the Notebook easily. Additionally, I utilize a dual-screen display for CoPilot (OpenAI and Microsoft) and CodeWhisper from Amazon, often copying and pasting between the Notebook and IDE. This workflow is not ideal because sometimes I don’t copy everything to the IDE before asking them to generate new code.
It’s time to begin our journey with GenAI coding assistance. The crew (consisting of myself, I, and me) has been briefed and is ready to go. The ship is prepared, the anchors are lifted, and the sails are set to the wind.
The Finding
The Jupyter Notebooks contain the entire process and valuable insights. They are available on GitHub. Through my journey, I have gained many valuable lessons and nuggets of gold. Therefore, I will present my observation sequentially. It is a lengthy process. Thus, you should view and hack the code on GitHub Jupyter Notebook and draw your conclusions.
I use all three GenAI at the same time. The Codey (Google integrated Colab AI, Copilot (on a separate screen), and CodeWhishper (on a different screen).
I will summarize the finding with examples from the Notebooks, particularly the following topics.
The first example is the Notebook, pluto_hugging_face_stable_diffusion.ipynb. We’ll start by defining the grading scale for the results from the prompt.
Grading Scale
For each prompt, I review and choose to use, update, or reject the recommended code from one of the three GenAI (CoPilot, CodeWhishper, or Codey). I keep the prompt used in the code cell and the grade as a comment.
The grading is as follows:
If I don’t mention which GenAI (CoPilot, CodeWhishper, or Codey) they all behave similarly, i.e., give similar code recommendations as in Figure 5.1.
If any of the GenAI tools (CoPilot, CodeWhisper, or Codey) underperforms significantly compared to the others, I will call it out. Otherwise, they have the same grade.
The grading scale is set. Let's rock and roll. :-)
Fresh Start
I start the task with a blank Jupyter Notebook and organically add one task at a time. Figure 5.2 shows the completed result from the journey.
Notice, in Figure 5.2, that you can click on the Open in Colab blue button to copy it and run it on your Google Colab space.
It is time for a deep dive.
Insights through the code
First, I will not explain every code cell.
Out of all the options available, I have chosen a select few that I believe are the most intriguing and worthy of highlighting. I strongly urge you to delve into the Notebook and experiment with the code firsthand.
Second, the CodeWhisper from Amazon on Notebook is NOT GenAI.
Although CodeWhisper is a reliable code completion tool, it is essential to note that it is not a GenAI product. When used in conjunction with Jupyter Notebook, no specific prompt or inquiry section is available.
To activate CodeWhisper on a MacBook, press the Option+C key, as illustrated in Figure 5.3. However, it is worth mentioning that CodeWhisper falls short compared to more advanced tools like CoPilot and Codey, and therefore, I use it sparingly in this project.
Figure 5.1 demonstrates that the three GenAI are proficient in basic tasks such as importing and creating self-contained functions. However, they fall short in critical studies, as illustrated in Figure 5.4.
The task is unfair as the gradio.load() function is relatively new, introduced less than a year ago. Despite its considerable usefulness, it remains underutilized by data scientists. The lack of documentation on the feature is concerning, and as far as I’m aware, no one has written about it in any form, whether it be a paper, article, or blog post. Thus, it is not a surprise that the GenAIs failed.
The silver lining is that all three GenAI did a decent job at documenting my function. Ultimately, I opted to follow Codey's suggestion as it seamlessly integrated with the Notebook and eliminated the need for additional copying and pasting, Figure 5.4.
The best of the worst award goes to CoPilot for giving a fair but incorrect answer, Figure 5.5.
For the record, the correct answer is one line of code, as in Figure 5.6
Recommended by LinkedIn
With a quick check out of the way, we can focus on writing the code correctly. The first step is to fetch the LLM model from HuggingFace and use the GPU, Cuda, Figure 5.7.
The problem is that LLM based on the transformer algorithm changes almost monthly. Thus the responses were out of date or deprecated. As a novice, attempting to debug the provided code from Copilot and Codey would have been a significant waste of time, as it may have seemed reasonable at first glance. However, even a single alteration from a dependent library could render the code unusable. For example, most of the time, Copilot recommends using transformer standard pipe, but it should use the latest StableDiffusionXLPipeLine.
The next step is to write a function to generate an image from the prompt and other inputs, the draw_me() function in Figure 5.8. Although the generated code contains a few bugs, it is not entirely incorrect once GenAI comprehends the pipeline. I would give it a grade of C+ since it can work with bugs. Despite trying several times, I was unable to achieve true success. However, I was able to correct the function for them manually.
Next is to test draw_me() function, Figure 5.9.
To effectively share my model with fellow AI researchers and colleagues, I will use Gradio to create a user-friendly web interface before pushing it to HuggingFace.
To my surprise, GenAI performed this task with many bugs, Figure 5.10. Despite being released over two years ago, Gradio may not be as widely recognized as I previously believed. Thus, the return code is more fanciful than helpful.
The closing step is to test the method, such as foxy.fetch_gradio_interface(foxy.draw_me).queue(4).launch(). The result is shown in Figure 5.11. You can visit the website HuggingFace under Duc Haba's space to try this out.
One of the three Notebooks evaluation is now complete. The Meta Llama and CNN image classification Notebooks will be available in Part 2 of the series.
GenAI could not write code on itself due to its fast-paced development, making the previous test unfair.
CoPilot utilizes OpenAI's GPT-3 model with limited knowledge up to mid-2021. Therefore, requesting GenAI to write code for release in 2022 is unjust. However, my code surpasses by 120% due to its documentation and ability to display 23 functions and documentation through "help(pluto)," Figure 5.12. How cool was that?
The following section will be a more impartial assessment.
Fun Extra Evaluation
The Fun Extra Evaluation section of the Notebook includes [1] a GenAI function that displays the WordCloud plot and [2] lists all Python libraries used in the current environment. Data scientists frequently use these functions and do not require knowledge of the latest developments in Machine Learning.
I use Copilot through the GPT4 dashboard, which offers excellent coding assistance. It provides correct code, unit tests, and detailed explanations. GPT4 is an excellent teacher and mentor, especially for novice programmers. Plus, you can ask GPT4 to explain the code in more detail. I wish I had a teacher like this when I was learning to code, and perhaps I continue to learn from it today.
Figure 5.13 shows the prompt and the perfect, no-error function GPT4/Copilot wrote.
Let’s ask Copilot/GPT4 to generate test data and run a test. It seems simple enough, as shown in Figure 5.14.
Notice, the is no fancy prompt. I ask Copilot/GPT4 as I would a fellow coder. The results speak for themselves, Figure 5.15.
The code is clear and easy to understand, but what makes Copilot/GPT4 better than others is the explanation along with the code, Figure 5.16.
We can ask Copilot/GPT4 to explain any part of the code, as shown in Figure 5.17.
Copilot/GPT4 has now transitioned from being an assistant to becoming a mentor.
Once again, notice the prompt that I use. No fancy prompt engineering, I ask/talk/write as I would to a human colleague: "Good job. Explain what is plt figsize?"
Unlike humans, GPT4/Copilot is always patient and never gets snippy. You can ask him to explain any code line. Thus, GPT4/Copilot is an exceptional mentor for both experienced and novice programmers.
Why not add a color map while we are at it? See Figure 5.18.
We can ask our mentor to explain the color map, Figure 5.19.
Even though we didn't ask for a function to display all the possible colormaps, Copilot/GPT4 wrote the function to display 62 colormaps from matplotlib, Figure 5.20.
And the results speak for themself, Figure 5.21.
The other "Fun extra" will be discussed in Part 2 of this article. It is time for the conclusion.
The Conclusion
We have discussed the article's goal: to determine if GenAI can write code, improve coding, and reduce development time. We also explain the philosophy of approaching tasks with a fair mindset and the process of using Python and Jupyter Notebook to write code from scratch for the LLM Stable Diffusion XL model, Meta Llama-2 Text model, and CNN Image classifier.
I have provided a step-by-step guide on how to use CoPilot powered by OpenAI GPT4, Codey from Google, and CodeWhisper from Amazon to generate code. In terms of integration with Jupyter Notebook, CodeWhisper is a simple code completion plugin and ranks far third, while Codey comes in second place, and CoPilot/GPT4 is the clear winner.
The following is a list of lessons learned in the order of discovery. The first lesson started with a challenging task of loading and inferencing from the latest Stable Diffusion XL model, released three weeks ago. This task is an unfair test because GenAI has only knowledge of up to mid-2022. Still, CoPilot and Codey provided an approximate answer, and they performed well for any task outside of the actual inferencing, such as saving images, importing the correct libraries, documentation, and writing other utility functions.
When it comes to routine tasks like creating the WordCloud plot, Copilot performs exceptionally well. He wrote the function with documentation, and the following prompt asked him to write the test data and store them in a Pandas DataFrame. Copilot wrote the Python code flawlessly and even explained how he wrote it, which would benefit novice programmers and reinforce the correctness of the code for experienced programmers.
At any time, you can ask Copilot to explain a line of code. For example, I asked for an explanation of “plt figsize.” The response was concise and easy to understand. It’s much better than using StackOverflow. Plus, the prompts you use are just everyday conversations with other programmers, with no fancy or repetitive prompt engineering required.
GenAI, such as Copilot, goes beyond collaboration and becomes a mentor for learning or becoming an expert in Python coding.
I understand that the above might sound like a powerful endorsement of GenAI, but can GenAI serve as a replacement for a programmer?
Absolutely NOT!
Programmers are human beings with independent thoughts and feelings, and like most programmers, myself included, we need a paycheck to support our lives. :-)
Joking aside, many people outside the programming community speculate that AI will someday replace humans in code writing. However, until AI can achieve “General Intelligence” and gain consciousness, it will remain a powerful tool for human programmers to write better code, shorten development time, and increase productivity.
GenAI is both lowing the floor and raising the ceiling. In other words, AI will enable more individuals to become proficient in programming while allowing experts to reach previously unattainable heights.
The first step towards remaining marketable in the tech industry is to unlearn past habits, adopt new working methods, and embrace GenAI as a collaborator. I, a human, will remain a productive member of the programming community long after my retirement or after winning the lotto. If you are in tech, you know that “change is the only constant.”
Epilogue
The conclusion of this article is unsurprising. While generative AI is a powerful tool for software programming, it comes with a warning. The concern lies in the potential misuse by those outside the programming community, such as managers, CEOs, founders, and business owners, who may believe that hiring cheap, untrained personnel and utilizing GenAI is the solution for reducing development costs.
An analogy that comes to mind is that of a sword. Would you arm teenagers and bullies with swords and expect them to be samurai?
The other scenario is that experienced programmers or mid-level managers might resist GenAI, claiming it’s just a toy and can’t help with coding without spending time learning it, or they might find a fault and deem it unusable.
I choose to embrace my fear, insecurity, and humility to unlearn old habits and make GenAI my collaborator. The optimistic view may not make me rich, but it keeps me smiling.
Part 2 of this article will feature the following two Python Notebooks, Figures 6.1 and 6.2.
Lastly, I am looking forward to reading your feedback. As always, I apologize for any unintentional errors. The intentional errors are mine and mine alone. :-)
Have a wonderful day, and I hope you enjoy reading this article as much as I enjoy writing it. Please give the article a “thumbs up, like, or heart.”
#AI, #GenAI, #Coding, #Python, #DucHaba
Book announcement
Before letting you go, I recently authored a book titled Data Augmentation with Python with Packt Publishing. If you’re interested, you can purchase it on Amazon and share your thoughts on the Amazon book review. It will make me happy as a clam. :-)
On GitHub, you can find the entire collection of Jupyter Notebooks for all nine chapters. You can customize the Notebooks to fit your specific project requirements. Additionally, you can run Python code without the need to install Python.
Demystify AI series
<End of Doc>
furniture manufacturing is our forte
1yWould be great if you could take five minutes of your time to advise me. I am in Delhi and available on +91 9311955119. This number is also on WhatsApp Thank you in advance Kamal
Director of Engineering @ eTip.io | Delivering World Class Digital Products
1yGreat write up Duc! Thank you so much for sharing. Btw, you wrote “…Copilot performs exceptionally well. He wrote the function..” Was calling CoPilot a “He” intentional? 🤔
Innovations in AI + AR | Solutions Architect | Engineering Leader | Technical Sales Consultant | Author | Designer | Speaker | Family Man
1yI absolutely love this. What a great read!