Researchers From Stanford Release Alpaca: An Instruction-Following Model Based on Meta

There has been a rise in the efficacy of instruction-following models like GPT-3.5 (text-da Vinci-003), ChatGPT, Claude, and Bing Chat. These versions are now widely used by consumers daily, with some even taking them into the workplace. Despite their popularity, instruction-following models still have significant flaws. These include training them to deliver misleading results, which can perpetuate harmful societal stereotypes and poisonous language.

High-quality instruction-following model training on a student budget is difficult because it requires a powerful pretrained language model and abundant, high-quality instruction-following data. Due to the lack of a publicly available model with comparable features to closed-source models like OpenAI’s text-DaVinci-003, academic research on instruction-following models has been hampered.

Recent Stanford Institute for Human-Centered Artificial Intelligence (HAI) research released Alpaca, an instruction-following model based on Meta AI LLaMA 7B. Using OpenAI’s text-da-Vinci-003, the researchers created 52K demonstrations of instruction-following in the style of self-instruct, which was used to train the Alpaca model. Alpaca exhibits many of the same behaviors as OpenAI’s text-DaVinci-003 on the self-instruct evaluation set, but it is remarkably compact and simple/cheap to reproduce.

Recommended Read: Leveraging TensorLeap for Effective Transfer Learning: Overcoming Domain Gaps

As data, the team created examples of following instructions by expanding upon the self-instruct approach. First, they used the self-instruct seed set, which consists of 175 instruction-output pairs written by humans. The seed set was fed into text-DaVinci-003, which generated further instructions based on those examples. They simplified the generating pipeline to make it more efficient than the self-instruct technique and cut its price significantly. Using the OpenAI API, the researchers developed 52K unique instructions and their related outputs for under $500.

Using Hugging Face’s training architecture and methods like Fully Sharded Data-Parallel and mixed precision training, they refined the LLaMA models with the help of this dataset of people obeying directions. For their first run, 8 80GB A100s were used, which is less than $100 on most cloud computing providers, to fine-tune a 7B LLaMA model. The team recognizes room for improvement in training efficiency, which could lead to greater savings.

The human evaluation (performed by the 5 student writers) method was adopted on the inputs of the self-instruct assessment set to determine how well the Alpaca performs. The creators of the self-instruct guides compiled this evaluation set, which offers guidance on a wide range of topics like email composition, social media, and productivity software. Through a blind pairwise comparison, it was observed that text-da-vinci-003 and Alpaca 7B performed similarly well.  

In addition to using this static evaluation set, the researchers have conducted interactive Alpaca model tests. They have discovered that it often exhibits behavior consistent with text-davinci-003 on various inputs. 

Alpaca shares many of the shortcomings of language models with other languages, such as its tendency towards delusion, toxicity, and stereotyping. Even compared to text-da-vinci-003, hallucination is a particularly frequent failure mode for Alpacas.

The team plans to learn how the training recipe produces talents in their future work. With techniques like automatic red teaming, auditing, and adaptive testing, they also aim to better understand the threats posed by Alpaca and reduce them.

Check out the Github, Wed demo and Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.

Get paid to share your bandwidth! Peer2Profit is the BEST side income for you! →