Happy Friday! This week’s startup, Hugging Face, was initially a chatbot made to make you laugh and smile, hence the emoji. They’ve stuck true to their logo: Clem Delangue, Hugging Face CEO, wants to become the “first company to go public with an emoji” instead of a three-letter ticker.
This week’s edition is 1264 words (~5 minutes). Enjoy!
Hugging Face: Open-source ML
Introduction
One year ago in April 2022, DALL-E 2 was released, reaching 1.5 million users in under 3 months. At the time, this tied the record with Instagram for user growth, foreshadowing to the public that a wave of generative AI was imminent. This was confirmed a few months later when ChatGPT was released, reaching 100 million users in 2 months. ChatGPT and DALL-E 2 were revolutionary in the field due to their partial open-source nature: what had taken leading data scientists decades to develop could now be accessed by anyone, and be used to generate virtually anything.
"Illustration of Elon Musk checking twitter and screaming in the style of cubism" - Bakz T. Future
Today, the public has gained even more access to the inner workings of ChatGPT and DALL-E with the help of organizations increasing accessibility to such models.
Hugging Face is an open-source machine learning platform allowing anyone to download, code, and publish ML resources. Known as the emoji company fostering a community for AI enthusiasts, developers, and enterprises, Hugging Face has unearthed the traditionally opaque field of machine learning, allowing the public to collectively push the boundaries of the field.
Product
Transformers
Hugging Face began as an iOS chatbot in 2016, gaining popularity among developers as they released bits of their code to the public. In 2018, Google/OpenAI announced the development of transformers, which many developers attempted to build from scratch. Within a few months, Hugging Face's in-house transformers software was released to GitHub. It immediately became the center of the machine-learning community, increasing nonacademic ML research as well as the use of Hugging Face’s open-source software.
Hugging Face’s rise in popularity due to its natural language processing (NLP) is no surprise. Even the smallest of language models require millions of parameters that take up gigabytes of storage. These models require vast amounts of text to train, in the range of petabytes (1 million gigabytes). Hugging Face allows developers who lack the machinery to train language models from scratch to access them nonetheless.
Today, Hugging Face’s NLP models are some of the most popular on their platform, with their top 5 most popular models being language models of some form, amassing over 159 million downloads in the last month.
Open-Source Software
Hugging Face’s key service is its online repository, with thousands of models & datasets from the community. These models are free to access and download for all users, shrinking the gap between individual developers and industrial tech labs.
Described as a “GitHub for machine learning”, Hugging Face’s open-source platform has become the website for everything AI-related, and has increased the accessibility of models such as GPT-2 and BERT.
As of April 2023, Hugging Face has over 180,000 publicly available models, ranging from voice replication software to Stable Diffusion, the text-to-image generator.
A dataset used to accurately identify dogs that look like fried chicken/muffins.
Hugging Face also has over 30,269 datasets, including OpenWebText (used to train GPT-2). Members of the community can easily access large amounts of general data for their own models through Hugging Face’s centralized dataset platform.
Community
Hugging Face allows developers to share their own machine-learning APIs through Spaces, their platform for members of the AI community. All Spaces are entirely free to access, though users hosting a Space are charged hourly based on additional computing power above a basic threshold.
CEO Clem Delangue attributes the rise of Hugging Face largely to the community, saying that “we are the kind of random French founders and if it wasn't for the community, for the contributors, for the people helping us on the open source, people sharing their models, we wouldn't be where we are today.”
Check out my Apple vs Barn Owl image classifier here!
Market
AI Infrastructure
With the rise in NLP, the AI market has ballooned across various layers. In the model layer, tech giants such as OpenAI, Google, and Microsoft have spearheaded research within the field, releasing new closed-source ML architectures with limited real-world access. The application layer features products such as Asana & ChatGPT, which build on top of the software from the model layer.
Hugging Face operates as a middleman between these two layers in the infrastructure layer, building the “picks and shovels” for the machine learning community. As the dominant open-source platform for machine learning, they have a unique opportunity to capture a large portion of the estimated $7Tn AI market.
“When everyone is looking for gold, it's a good time to be in the pick and shovel business.” - Mark Twain
Customer Segments
Hugging Face offers a wide selection of services for enterprise teams; however, unlike most machine learning operations (MLOps) competitors, Hugging Face started off and has continued to serve indie developers.
In recent years, the value captured by such developers has skyrocketed, with over 90% of large-scale AI results coming from outside of academia. As models become higher quality, the incremental difference between closed-source model generations will converge with open-source models (e.g. GPT-2 vs 3 will be a noticeably larger difference than GPT-5 vs 6). Given this, the quality of open-source models (large language models, image-to-text, etc.) may become comparable to those from larger tech companies, with the key differentiators being user experience and adoption speed.
Without continual increases in funding for closed-source organizations, open-source software use may catch up and even surpass big tech. Source: Elad Gil
Business Model
Hugging Face operates on an open-core business model, where users are charged on a subscription basis for exclusive access and features above its free resources. For example, paying users currently have access to AutoTrain: given a user-inputted data file, AutoTrain will automatically find (and train) the best model for that data. Unlike a freemium model, this approach allows high-quality models to be made publicly available, albeit at the expense of the publishing user.
“We found kind of like a good balance where if you're a company actually contributing to the community and to the ecosystem, you're releasing your models in open source, it's always going to be free for you.” - Clem Delangue
Hugging Face has begun to offer solutions for enterprises as well. They offer secure, on-premise/private cloud model hosting, as well as access to their exclusive autotrain and inference features.
Traction
To date, Hugging Face has raised over $164M in funding. Their most recent $100M Series C in April 2022, led by Lux Capital, put them at a $1.9B valuation.
Key Opportunities & Risks
Demand for Specific/Hyperlocal Models
A key component contributing to the quality of models is the data. With general data, even the best models may have difficulty operating in specific (sector-focused) and hyperlocal (company-specific) niches. By increasing access to complex models, Hugging Face allows users to fine-tune high-quality models on proprietary datasets.
Source: Stanford Artificial Intelligence Index 2023
AI Moratorium
Longstanding cultural inertia against AI has surfaced in the form of a proposal to delay all AI development. Experts such as Elon Musk and Steve Wozniak have called for a 6-month moratorium on all AI development beyond GPT-4 in fear of unintended risks. Though this proposal has received public and governmental backlash, many still fear the “AI revolution”, with nearly 40% of Americans reportedly being “more concerned than excited” about the rise in AI use.
As AI becomes an integral part of daily life, open-source platforms such as Hugging Face will continue to maintain guardrails for their users to orient the efforts of the community for the benefit of society.
Thank you for reading! If you enjoyed this week’s edition, you might enjoy Cohere.ai, a company increasing the accessibility of NLP models. Feel free to check it out here - see you next week!