WebLLM | Home
🚀 Exciting AI tool alert! WebLLM allows you to run large models like Llama 2 7B/13B and Mistral 7B directly in your browser for enhanced privacy and cost reduction. Try out different models and accelerate your AI tasks with WebGPU support! #AI #WebLLM #AIAssistant
- The latest Web LLM provides access to models like Llama 2 7B/13B, Mistral 7B, and WizardMath in browsers without server support, using WebGPU for acceleration.
- Users with Apple Silicon Macs with 64GB+ memory can run the 70B model by downloading Chrome Canary.
- The project aims to enable the creation of AI assistants with enhanced privacy, powered by open-source efforts like LLaMA and Alpaca.
- This initiative seeks to simplify AI deployment by running large models directly in the client's browser for cost reduction and personalization.
- Users can try out models by selecting one, entering inputs, and clicking "Send" in the chat demo.
- Initial model downloads may take a few minutes, subsequent runs speed up, requiring around 6GB memory for Llama-7B models and 3GB for RedPajama-3B.
- The chat demo features Llama 2, Mistral-7B, RedPajama-INCITE-Chat-3B-v1, with more models planned for support.
- WebGPU in Chrome 113 supports Web LLM, offering an opportunity for native AI in browsers.
- The project emphasizes bringing diversity to the AI ecosystem and tapping into the client side's growing power.
- For research purposes, the demo is subject to the model License of LLaMA, Vicuna, and RedPajama, with potential violations flagged for review.