Seattle based XetHuba startup that makes it easy for businesses to use Git for data management announced today that it has raised a $7.5 million seed funding round led by madrona. The basic idea here is to enable developers to work with data in the same way they work with code, including all the collaboration features that a tool like Git allows. The team describes XetHub as a “shared storage platform for data management”.
The company was co-founded by Yucheng Low (CEO), Ajit Banerjee and Rajat Arya, a team with years of experience working with large data platforms. Indeed, Low previously co-founded ML startup Turi, where Arya was the first employee. Apple acquired the company in 2016, allowing Low and Arya to work on different parts of Apple’s ML platform stack, with Arya leading Apple’s data platform team, for example. It was also at Apple that the two met Banerjee, who previously worked at Inktomi, Amazon and Facebook. He also previously founded two startups.
While working on the data platform at Apple, the team realized that there was still a lot of room for improvement in data management.
“It really shouldn’t come as a surprise, but data is much more important than anything else. More important than the model — than anything else,” Low told me. “Managing where you store this data, how you collaborate on this data is really fundamental. However, what we see is that the way we manage data today really feels like how the source code was made 30 years ago – meaning versioning or collaboration is done by copy and paste – sometimes there’s a more elaborate version of it, but it is ultimately still copy and paste when I want to make sure no one else touches what I’m doing.
Just as developers have turned to tools like Git to collaborate on their source code, XetHub wants to enable them to use the same familiar primitives for working with data.
“The way we’re thinking about it is, for the first time, we’re really enabling developers to work on data in exactly the same way as code,” Low said. He noted that the team aimed to create a tool that not only mimics a Git-like experience, but one that preserves the core Git user experience — including all the integrations developers are familiar with.
Currently, the service can handle repositories with up to 1 TB of data, with plans to expand to 100 TB soon. Few developers would want to clone such a large repository, so a useful feature here is that developers can also link these repositories and make them behave like a local file system, whether that’s on their laptop or a large GPU cluster . It’s also worth noting that the tool is file format agnostic.
From a marketing perspective, the team focuses its efforts on AI/ML teams, but of course users can use XetHub to manage all types of data.
Xethub is now publicly available with a free community edition that you can use to manage up to 20 GB of deduplicated storage. Low tells me the company is already in talks with some enterprise clients, but the team isn’t quite ready to name names just yet.
“Yucheng and the exceptional XetHub team have been innovating with machine learning for over a decade, then applying their skills at the most iconic consumer technology company: Apple. XetHub enables developers to work with large datasets, in collaboration with others, to build intelligent and generative applications,” said Matt McIlwain, Managing Director, Madrona. “Developing and deploying these applications is constrained by legacy infrastructure and complex data workflows, and XetHub addresses these pain points from a developer’s point of view.”