Using threaded emscripten to speed up the generation and offload the main loop. No SIMD or other optimizations. Might work faster with #enable-experimental-webassembly-features enabled. Tested in x86 Chrome and Firefox, Apple Silicon Safari Run it yourself: https://github.com/lxe/ggml/tree/wasm-demo Thanks, https://github.com/ggerganov/ggml , Story Published at: April 21, 2023 at 04:24AM
Show HN: WasmGPT – “ChatGPT” in the browser, no WebGPU and no server needed
Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
I’ve been playing around with https://github.com/zphang/minimal-llama/ and https://github.com/tloen/alpaca-lora/blob/main/finetune.py , and wanted to create a simple UI where you can just paste text, tweak the parameters, and finetune the model quickly using a modern GPU. To prepare the data, simply separate your text with two blank lines. There’s an inference tab, so you can test how the tuned model behaves. This […]