Show HN: We scaled Git to support 1 TB repos

I’ve been in the MLOps space for ~10 years, and data is still the hardest unsolved open problem. Code is versioned using Git, data is stored somewhere else, and context often lives in a 3rd location like Slack or GDocs. This is why we built XetHub, a platform that enables teams to treat data like code, using Git. Unlike Git LFS, we don’t just store the files. We use content-defined chunking and Merkle Trees to dedupe against everything in history. This allows small changes in large files to be stored compactly. Read more here: https://xethub.com/assets/docs/how-xet-deduplication-works Today, XetHub works for 1 TB repositories, and we plan to scale to 100 TB in the next year. Our implementation is in Rust (client & cache + storage) and our web application is written in Go. XetHub includes a GitHub-like web interface that provides automatic CSV summaries and allows custom visualizations using Vega. Even at 1 TB, we know downloading an entire repository is painful, so we built git-xet mount – which, in seconds, provides a user-mode filesystem view over the repo. XetHub is available today (Linux & Mac today, Windows coming soon) and we would love your feedback! Read more here: – https://xetdata.com/blog/2022/10/15/why-xetdata – https://xetdata.com/blog/2022/12/13/introducing-xethub
Story Published at: December 13, 2022 at 03:14PM

Show HN: We scaled Git to support 1 TB repos

INX to boost its $117M IPO with token listings on global exchanges

Russia’s Gazprombank gets green light for crypto custody in Switzerland

New Huawei smartphone will feature a hardware wallet for digital yuan

Chinese city seeks to power urban governance and more using blockchain tech

Binance CEO denies allegations that the exchange’s US arm is a regulatory decoy

Tax professional explains the most important thing for US crypto holders

Bank of Chain - The Very First DeFi Bank

Finance Redefined: The curious case of Harvest Finance, Oct. 21-28

Winklevoss’ Gemini exchange to count crypto taxes in real time