WebLLM | Home

🚀 Exciting AI tool alert! WebLLM allows you to run large models like Llama 2 7B/13B and Mistral 7B directly in your browser for enhanced privacy and cost reduction. Try out different models and accelerate your AI tasks with WebGPU support! #AI #WebLLM #AIAssistant

The latest Web LLM provides access to models like Llama 2 7B/13B, Mistral 7B, and WizardMath in browsers without server support, using WebGPU for acceleration.
Users with Apple Silicon Macs with 64GB+ memory can run the 70B model by downloading Chrome Canary.
The project aims to enable the creation of AI assistants with enhanced privacy, powered by open-source efforts like LLaMA and Alpaca.
This initiative seeks to simplify AI deployment by running large models directly in the client's browser for cost reduction and personalization.
Users can try out models by selecting one, entering inputs, and clicking "Send" in the chat demo.
Initial model downloads may take a few minutes, subsequent runs speed up, requiring around 6GB memory for Llama-7B models and 3GB for RedPajama-3B.
The chat demo features Llama 2, Mistral-7B, RedPajama-INCITE-Chat-3B-v1, with more models planned for support.
WebGPU in Chrome 113 supports Web LLM, offering an opportunity for native AI in browsers.
The project emphasizes bringing diversity to the AI ecosystem and tapping into the client side's growing power.
For research purposes, the demo is subject to the model License of LLaMA, Vicuna, and RedPajama, with potential violations flagged for review.