WebLLM | Home

🚀 Exciting AI tool alert! WebLLM allows you to run large models like Llama 2 7B/13B and Mistral 7B directly in your browser for enhanced privacy and cost reduction. Try out different models and accelerate your AI tasks with WebGPU support! #AI #WebLLM #AIAssistant

  • The latest Web LLM provides access to models like Llama 2 7B/13B, Mistral 7B, and WizardMath in browsers without server support, using WebGPU for acceleration.
  • Users with Apple Silicon Macs with 64GB+ memory can run the 70B model by downloading Chrome Canary.
  • The project aims to enable the creation of AI assistants with enhanced privacy, powered by open-source efforts like LLaMA and Alpaca.
  • This initiative seeks to simplify AI deployment by running large models directly in the client's browser for cost reduction and personalization.
  • Users can try out models by selecting one, entering inputs, and clicking "Send" in the chat demo.
  • Initial model downloads may take a few minutes, subsequent runs speed up, requiring around 6GB memory for Llama-7B models and 3GB for RedPajama-3B.
  • The chat demo features Llama 2, Mistral-7B, RedPajama-INCITE-Chat-3B-v1, with more models planned for support.
  • WebGPU in Chrome 113 supports Web LLM, offering an opportunity for native AI in browsers.
  • The project emphasizes bringing diversity to the AI ecosystem and tapping into the client side's growing power.
  • For research purposes, the demo is subject to the model License of LLaMA, Vicuna, and RedPajama, with potential violations flagged for review.