Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
This article discusses Vicuna-13B, an open-source chatbot developed by the Vicuna team. It was trained by fine-tuning LLaMA on user-shared conversations from ShareGPT and, in preliminary evaluations using GPT-4, achieved over 90% of the quality of OpenAI ChatGPT and Google Bard. The project's code, weights, and an online demo are publicly available for non-commercial use.
Key Features and Performance
Vicuna-13B's impressive performance stems from its training on approximately 70,000 user-shared ChatGPT conversations. This resulted in more detailed and well-structured answers compared to similar models like Alpaca. The model's evaluation, using GPT-4 as a judge, showed it outperforming other open-source models in over 90% of cases and achieving near parity with ChatGPT in 45% of cases.
Training and Infrastructure
The training process involved enhancing existing Alpaca training scripts to handle multi-turn conversations and longer sequences. Memory optimizations, such as gradient checkpointing and flash attention, were employed to manage the increased memory demands of processing longer contexts. Cost-effective training was achieved through the use of SkyPilot managed spot instances.
Serving and Deployment
A lightweight, distributed serving system was developed to handle multiple models and support flexible integration with various GPU workers. This system leverages fault-tolerant controllers and managed spot instances to reduce serving costs.
Evaluation Methodology
A novel evaluation framework utilizing GPT-4 was employed to assess chatbot performance. This involved creating diverse questions across various categories and using GPT-4 to compare model outputs. While promising, this method is acknowledged as a preliminary approach and requires further research for complete rigor.
Limitations
Like other large language models, Vicuna has limitations in reasoning, mathematics, and ensuring factual accuracy. Safety measures, such as using the OpenAI moderation API, were implemented to mitigate potential risks.
Release and License
The training, serving, and evaluation code, along with the Vicuna-13B model weights, are available on GitHub. The online demo is for non-commercial use only, subject to relevant licenses and terms of use.
Conclusion
Vicuna-13B represents a significant advancement in open-source chatbot technology. Its impressive performance, coupled with the availability of its code and weights, makes it a valuable resource for researchers and developers. Further research is needed to address its limitations and improve the evaluation methodology for chatbots.