what if model distribution was through bittorrent

+++
date = "2023-03-24"
+++

#decentralization #technology #artificial_intelligence

at the moment, huggingface is one of the main hubs for model distribution, they're basically a "free CDN", and that DOES NOT SCALE. from what i'm gathering they do not add any pricing restrictions for downloaders nor uploaders. they're a startup, and by definition, must be hemmoraging money to drive user base up, to maybe in the end, be bought by micro$oft.

AI is not really "the media industry", so there's nothing as strong as RIAA that can sue everyone out of existence. HOWEVER, this is an unsubstantiated claim. we'll have to wait until openai or meta starts to sue everyone out of existence. oh wait meta's trying. i wonder what happens when we go to real legal battles (like Alpaca being trained with GPT-3-generated data, something not in the OpenAI ToS, see here, last part)

so i saw this tweet today:

AI needs to be democratized and equally available to all, not just to those who have money.

FREE and OPEN.

Today I'm sharing a little experiment.

Introducing GOAT, a decentralized way to publish and download AI models.

Powered by BitTorrent and Bitcoin.

- cocktail peanut (cocktailpeanut), 2023-03-23T22:47:20+00:00

https://ipfs.io/ipfs/QmYyucgBQVfs9JXZ2MtmkGPAhgUjNgyGE6rcJT1KybQHhp/index.html

here's the rundown of how it works:

maybe this can be done differently? #

bitcoin here is serving as a low latency key-value store (from a hash to the yaml file). can't we do this with just torrent itself? what are the drawbacks?

i dont fully know. i have the feeling that you could just have a single torrent containing all files and folders, people download the folders containing the models that they want. if there's a new model to be added to the collection, that becomes a new torrent link, which is also a property of the GOAT system. you'd need to emit a new yaml file with changed torrents, and somehow distribute the goatid/hash on the clearnet, same way as a torrent.

maybe the whole thing with torrent magnet links is that they're gigantic since you need to add the tracker metadata. how could we do that without needing the blockchain? ipfs, maybe? it's hash addressed, for sure, but ipfs also has the issues of keeping files alive in the network and they're also bolting cryptocurrency on top, see Filecoin

i'm not the biggest pro-cryptocurrency person around, but i'm not anti-cryptocurrency either. i am too tired of having to deal with the current financial systems as someone that does not want to share her deadname to every person that wishes they could donate to me (github sponsors and patreon are valid avenues for someone like me, but then we also add the systematic oppression Visa and MasterCard has been doing as of late under the guise of "protecting the youth", it gets me with bullshit boiling on my throat)

unanswered question: what about unsafe AI systems in a decentralized system? how to nicely update models?