ChatGPT scraped data from various sources on the internet. > The model was trained using text databases from the internet. This included a whopping 570GB of data obtained from books, webtexts, Wikipedia, articles and other pieces of writing on the internet. To be even more exact, 300 billion words were fed into the system. I believe it’s unfair to these sources that ChatGPT drives away their clicks, and in turn the ad income that would come with them. Scraping data seems fine in contexts where clicks aren’t driven away from the very site the data was scraped from. But in ChatGPT’s case, it seems really unfair to these sources and the work that the authors put, as people would no longer even to attempt to go to these sources. Can this start breaking the ad-based model of the internet, where a lot of sites rely upon the ad income to run servers?
Story Published at: February 5, 2023 at 02:08PM
Story Published at: February 5, 2023 at 02:08PM