Transcript
Hi, Mike Matchett with Small World Big Data. I'm here today talking about, you know, big data in a small world and how we handle that. And it keeps getting, uh, more tightly grained. We keep getting more and more data. Data warehouses are growing. And then there's this little thing called AI that wants to make use of it. How do we really leverage all that data in the bestest, fastest way possible? Uh, we've got some cutting edge solution technologies to talk to about today with Yellowbrick. And so just hang on, and I think you're going to learn something. Hey, Mark. Uh, welcome back to our show. It's been a little while. Hi, Mike. Nice to be here. Yeah. And, uh, so, you know, last time I think we were talking with Yellowbrick, we were talking about, you know, columnar analytics. And how do we do things in a bigger, faster, cheaper way? Really just sticking to that big data story on here. Uh, but things are evolving. And today we're going to talk a little bit about, you know, how that data gets geo distributed and or pushed to partly in the cloud, partly on premise. We're going to talk about some of the AI demands being put on that data streams at the same time, and you have some technologies for that. But first, I kind of want to ask you just to tell us a little bit about how people should think about Yellowbrick in this in this large end of the big data market. What how do you how do you position that really sort of the elevator pitch? Yeah, absolutely. Mike. So what people should take away from this is that Yellowbrick is a NoSQL data platform. And you look at our customers as well, their financial institutions, their telcos, government, even cloud startups, and they're all modernizing their data warehouse footprint with Yellowbrick. They typically have a lot of private data. And so they want to run Yellowbrick either on premises or in the public cloud, in their own cloud account, or a hybrid combination of the two. They want to keep it in their own cloud account because that data, as I said, is sensitive. They want to keep control of it. So the ability to deploy in Yellowbrick in their cloud account is critically important, and they come from Yellowbrick from a range of different prior data warehouses, or augment their current capacity with Yellowbrick . They come from AWS, redshift, Teradata, and others as well. Um, so essentially what the customers love about us is they want faster access, new insights within their data. It drives happier users a massive ROI and a significant cost savings when they do that. All right. So this kind of interesting, uh, shift to a newer solution. Some of this has been enabled by, you know, building a lot of this stuff from microservices and leveraging, uh, Kubernetes, where some of these older platforms, you bought a specific appliance that might have had a lot of specific code for that, that node that you stuck in your data center. And now we're talking about something that could possibly run anywhere, uh, wherever you need to do it. So we've kind of unlocked that. So tell us a little bit about where Yellowbrick , uh, leads the market in that way. Yeah, you're spot on about the microservices approach. Um, and it's all really enabled by Kubernetes. Kubernetes is essentially the cloud operating system here, and it provides portability. I can run my containerized software anywhere, in any public cloud or even on prem, on an on prem. Red hat OpenShift deployment if I wished. Um, and it provides elasticity, which these days any modern data platform has to be elastic. You've got to be able to grow and shrink the compute capacity to fit your workloads. Um, and then finally, it provides resilience as well. So things do go wrong in the cloud. Um services become unavailable and Kubernetes has the resilience to deal with that. And so it's been a massive help for us in allowing us to port our software to every public cloud. So AWS, Azure and Google Cloud, but also running that on prem as well. That's been critically important for us and our customers. Okay. Yeah. Because I think that, you know, conceptually, you know, we are seeing a change. I hate to say see change or lake change, if you will, on on this idea of a warehouse or lake house or whatever you will term for it, of being something with a lot of gravity and a petabyte of something sitting in one data center somewhere to something where the data can be distributed, uh, geographically, it can be, uh, multiple locations. The work that needs to be done can be in a bunch of different places. Uh, and it's really now more about analytics, the data analytics, than it is about, you know, in the past, maybe just the data storage or thinking of it as a data base somewhere. We're now thinking, how do we unlock that and get it out there? Um, so, uh, what do you think people are, you know, give us some use cases on how people now use this hybrid data warehouse approach to solve some of their real challenges. Uh, compared to thinking about it as a database that's locked up somewhere. Yeah, I mean, today and in a way that's growing and will grow in the future. Data sovereignty is of critical importance to so many businesses. I mean, obviously you've got GDPR and CcpA and things like that that have very strict requirements on keeping your data in country. Um, a lot of our customers have global operations. They can't share data from different parts of the world and mix it in one place. So it's critically important for them that they can deploy the their data and analytics solution at the right place, at the right time, in the right location, right. So, so whether that's the cloud of their choice in the region of their choice or even in the data center or colo, uh, facility of their choice as well. So the the hybrid approach that Yellowbrick has gives them this flexibility. It makes sure that they have the same user experience running the same workloads in any of these areas where depending on where their data is. So I think that's a really important point that the data sovereignty. Then of course, you've got other things like data latency. If you're really if you're running kind of low latency applications off your database. You want to be running them close to your database as well. And so we have customers that are kind of trying to figure out their way through all of these different, um, these different sort of ways. And, you know, then you have the more horizontal kind of use cases where, well, you know what? I have my production system running in a Yellowbrick appliance in my own data center. I'm going to use my D.R. in the cloud, and I'm going to replicate data. Um, and back it up in, in, into the cloud. In the event of a disaster, I will switch over all my users to running that temporary instance in the cloud. So that's another often, often, uh, used use case. Yes. Both. Both availability kind of concerns. And that D.R. kind of concern that, you know, I'm not I'm not losing anything, uh, as well as, uh, you know, a corruption kind of concern as well. So you're addressing a lot of those data protection. And the sovereignty concern is, is the sovereignty. Um, one of the things that, you know, we haven't really talked about here, though, is, um, cost, right. Which is which comes down to it. Uh, when you take this approach, I'm just going to guess that if I'm doing something that's a little bit more software oriented than than hardware, you know, oriented, that I've got some cost opportunities here, some cost efficiency. How does that go? Yeah. And so there are there are cost efficiencies here. So you can imagine. Let's take the Yellowbrick customer running in AWS for example. They're deploying the Yellowbrick software in their own cloud account on hardware that they've bought from AWS, EC2 instances and S3 storage that we use. They're paying for through their cloud account, right? They're paying Yellowbrick a software licensing fee for the number of vCPUs they they buy. We don't mark up any of the infrastructure they're using. So there's a cost efficiency saving here because a lot of large organizations have discounts with the cloud providers that they can apply on their spend of the hardware and infrastructure they use to run Yellowbrick . So there's there's areas of use there now where the other side of it is and we're seeing some trend towards it is repatriating certain workloads back to on prem. Okay. And it typically it's not a whole scale reversal of workloads. You know, having moved from on prem data centers in the cloud all the way to the other way back, I think customers are getting much more sophisticated about how they manage their costs in the cloud. They're looking for those workloads that maybe are running 24 over seven. That might be more cost effectively based. If they purchase the hardware and run that Yellowbrick workload on their in their own data center. So we're seeing sort of an optimization happening, I think, in the industry around where workloads are placed. I mean, we've long we've long said and written a bunch of things about hybrid, really allowing you to do the right sizing and the right placing of your infrastructure as well as your data and things. And I think, you know, as people start to explore some of these newer workloads that are a little bit more, more latency sensitive or will be increasingly latency sensitive, you have to bring the data to where the work is too, as well. And that's not necessarily up in up in somebody's cloud. That might be on prem. That might be even out to a distributed work site. Um, so let's talk let's talk about some of those new workloads coming along. I don't think this would be 2025 here if we didn't talk about AI in some way. So let me, let me let me ask you, uh, do you see people looking at the data warehouse, the data lakehouse, the, you know, these these these big analytical, uh, bodies of knowledge that they have about what they're doing and saying, you know, that's what I want to point my LMS at. That's what I want to use to become more accurate. I want to actually. Or mine that, mine that data with, with a more AI approach to start with. And how is that a big demand? Do you see that like the big driver this year? It is, it is um, you know, you can imagine a data warehouse contains your enterprise's crown jewels in terms of data, right? That deep history that you can mine for patterns and get more sort of understanding and insight into your customers or whatever the applications you're running. So. So we do and a lot of our customers come to us and talk to us and tell us about their AI applications and developments they're doing. And generally speaking, all our customers are working on some kind of intelligent assistant that has knowledge about their business that can be used to improve the productivity of their own, you know, actuaries, insurance companies, or, um, you know, PMS in hedge funds and financial services and things like this. And so the, the, you know, the the way that we interact with this is that we, we've added capabilities to our product to allow Yellowbrick not to only serve up data warehousing workloads, but also vector similarity searching in, uh, retrieval, augmented generation use cases as well. So they can use Yellowbrick to inject additional context into the prompt that that agent will be, uh, you know, ultimately executing against an LLM. So that's one area that's we're pretty mature in that. And we're working with customers today on, on, on, on their rag setups. They're using Yellowbrick. Um, the other area that we, uh, have released now is a text to SQL Copilot in our product. So what that does is it has a full understanding of your databases and schemas. And when you write a question into Yellowbrick, it will create the SQL that you can run in Yellowbrick to answer that question based on your. So also kind of unlocking by from having to understand the Structured Query language in your own head, you can just start to query for what you're looking for, and you're using the LLM in this case to help do a better job of what we might have historically thought of as VI. I actually use it myself. I find it a fantastic productivity tool because I can I can throw in a question, get a starting point SQL answer that I can verify, check that it's correct and expand on as well. Just like a lot of engineers today use ChatGPT to generate code, right? That they use it, or they should be using as a starting point that they thoroughly validate rather than just slapping it into to whatever they're working on. But that yeah, so that's another area. And I think later down the line we're getting more and more sophisticated to go, well, how can we use an agent based approach with backed by Llms to analyze the workloads running in Yellowbrick themselves and surface up those workloads to our end customers to make recommendations like, hey, your your certain workloads are are, um, busting the limits of your capacity here. Certain users are writing really, really shonky queries that they could they could do with some help with. And so that's one area we're looking at. And in fact, we have customers that want to take that idea and apply it to their own business and shine a light on their own data in Yellowbrick and surface up patterns. All right. So, so just to if I'm counting properly, we've got a kind of a native vector database storage similarity search concept built right into Yellowbrick itself. So so that would help someone who wants to take the data and use it for other Gen I or LMN kind of purposes. You've got this ability to use Llms to query Yellowbrick Brick itself. So the user of Yellowbrick , the bi user, or the person who's trying to do research into that data, also has some augmentation available to them through AI. And then the IT person or whoever's administering this or whoever's looking at tuning this can use AI in a in an IT ops kind of way to do a better job like help. And believe it or not, you know, those first two sound like great business opportunities and great, like, levers, right? And just but I'm an IT guy and talked to you and I'm like, I'm like, you know, I'm really interested in how I do a better job of capacity planning, rightsizing, moving my workloads around, architecting. And I ask everybody, Mark, I ask everybody if they're yet using AI for those jobs. And I'm glad to hear you are probably one of the first people who've come back and said, yeah, we're working on that. We've got this. We've got this coming because I just I think that's really this shining thing for a lot of people in 2025 is like, well, let's get smart about what we're doing. Let's, let's do a better job. So um, and then and then of course, because you are augmenting that, um, uh, it doesn't sound like you are forcing someone to use a particular LLM or technology or API, right? You are the the data half of that solution so they can architect. No. In fact, in fact, so a lot of our customers, because they work in, in highly sensitive areas with sensitive data, will not use an openly available LLM. Like they're not using deep seek from China. They're not. Hoping. That. They actually want to run a local private LLM in their own data center or in their own cloud account. And so we we support that. We have work to do for what we call bring your own LLM to Yellowbrick, where you can bring your own private LLM and have it tightly integrated into the data that's that's managed by Yellowbrick. I mean, that sounds like a very quick way to build something very smart without having to do a lot of upfront spec work for, for for a shop that's got this, you know, CIO or board level imperative to do something with AI. It's like, well, you've got a data warehouse. Let's marry that with a decent LLM LM internally. So we're not exposing anything. We're not taking those kind of risks. Uh, but let's see what we can get out of our data set. Let's mine that better. And that's what our customers are pursuing right now. Absolutely. Yeah, I like that I like that, right, right. So it's one thing to write SQL and it's one thing to just ask it like tell me how to do a better job, which we do, which we do to improve that. Okay. Uh, do you want to give us any sort of hints about what's coming down the pipe? What are people should be looking forward to? Well, I'm very excited about our third generation hardware platform. Uh, that runs Kubernetes, runs Red hat OpenShift. Um. That's coming. And, you know, in partnership with, with a vendor that's going to be providing sort of off the shelf commodity hardware. Um, so very excited about that this year. Uh, and as I mentioned, sort of actually a little bit down the line, we're going to be releasing, uh, what we call a community edition of Yellowbrick, which is a freely downloadable single node version runs in a single Docker container. You can run a Docker pull Docker run, and it will run on your laptop, and you can use it as a sandbox for learning about Yellowbrick or doing some development work, which is what our customers want to do. They want to use Yellowbrick before allowing their DBAs to any access to the production system they they can play around with with Yellowbrick on their laptop. Communication would be would be an awesome, awesome thing to play around with. I download those things all the time from everybody who has one, just to kick the tires and see what I can do about it. It's a great way to learn about things. Uh, so great stuff coming. Um, if if someone wants to get started with Yellowbrick or dive a little deeper, they're a little interested in what we're talking about here, saying, hey, this could be one of the shortest paths to getting something really productive for both our BI and I modernized, uh, in our, uh, company. Where would you where would you point them out? Obviously, you've got a website. Is there a specific resources or a path, a learning path you'd recommend? There are. I mean, if you hit our website, Yellowbrick.com, you can see all the product documentation on there. You'll find tutorials that will help you set Yellowbrick up in a rag pipeline if you want to do that. And if you want to get your hands on Yellowbrick through the website again, you can sign up for a 30 day free trial. Run it up in your own cloud account and get cracking. And play and play around with Yellowbrick . And. And if if you want to get more serious, then we're happy to get on a call with you to discuss your use cases and help you in that 30 day free trial journey as well. All right. This is great. I'm excited. I'm excited. I can't wait to try it. Try it out on my own laptop. Challenge accepted. Uh, run a MPP data warehouse on my laptop, I can't wait. Uh, thank you, Mark, for being here today. Uh, guys, check out Yellowbrick. Uh, particularly if you've got some hybrid challenges with your data warehousing today. You want some cost challenges. You've got an AI opportunity here as well to really leverage. So, uh, take out Yellowbrick. Take care, guys. Bye.