View the Video Interview Here: Striim for Real Time Data Streaming
Mike Matchett: Hi. I'm Mike Matchett with Small World Big Data. And today, we are going to talk about big data. In fact, we're going to talk about all data, streaming data in particular, and how you really need to make all your data streaming in order to take advantage of this new fast world and create operational speed applications which is what everyone is trying to do. Today, I've got Steve Wilkes who is the CTO and co-founder of Striim. That's S-T-R-I-I-M. And they've got some new way to solve the streaming data problem that everyone really should be appreciating that they have. Welcome, Steve.
Steve Wilkes: Thanks. Thanks a lot.
Mike Matchett: All right. So with Striim, you previously let me know that you were with a company that did change data capture and some replication beforehand. And you saw a couple of things like obviously data needs to be processed faster and just copying the data from one place to another wasn't getting the job done. So what did you really see that you need to do to make Striim happen?
Steve Wilkes: So the first thing was the -- at GoldenGate, we had these various customer advisory councils. And the customers would often say it would be great if we actually see the data while it's moving and get value out of it. So when we built the Striim platform, we not only recognized that we want to be able to move data at one place to another but also be able to do data processing, transformation, enrichment, visualization of that data as it's moving. And that processing is important for a lot of different use cases. Because if you're getting from one database to another which is a lot of kind of GoldenGate's business, then it's a light to light, seamless looks similar and you don't need to do a lot of transformations.
Steve Wilkes: But if you want to go from a database to say a Kafka or message queue or right data into big data, then you have to worry about additional things. First, you have to worry about the data formats. Maybe you want to put JSON on Kafka or you want to put it as archive, your Big Data solution. We also have to worry about, is there enough information in this data to enable data applications to make decisions?
Steve Wilkes: So if you're doing change date capture and you're doing it from a well-normalized database, most of what you're going to be getting is IDs. You have updates to the customer order detail table, you have to see author ID, customer ID, item ID, status ID, right. That doesn't mean anything for downstream application. So what you need to do in that case is do some processing of that data as it's moving in memory before you land it onto that Kafka or to your cloud solutions.
Steve Wilkes: That enrichment requires loading reference data into memory and joining with that. And that could be your information, your customer information. Large amounts of reference data, potentially millions of those. But you need to be able to have it in memory and be able to join it in a very efficient way so that you can maintain the speed of the streams. If you're getting tens or hundreds of thousands of operations a second coming through, you can't afford to go off to a database and ask for some reference information for every record. You have to join in memory and that data has to be in memory.
Steve Wilkes: So when we built the Striim platform, we want to expand upon what we've done before in multiple ways. We want to be able to collect data from more than just databases, add in message views and files and sensors and all that kind of stuff. We want to be able to transform the data processing and analyze it while it's moving. We now support lots of different targets, not just databases but big data, message queues, cloud technologies, all of those things. And we want to be able to add new uses capabilities to visualize the data. So to do analytics, to build dashboards, and to see the data as it's moving. And we recognized when we start the company six years ago this was ambitious but we've done it and that's why we are with Striim today.
Mike Matchett: So when you showed me some -- the pictures here, it really looks like instead of just taking data from one place and stuffing in another or taking the data, you're transforming it and putting it somewhere else, you really create a space in the middle while the Striim is running to make an application happen. And we talked about some use cases like correlation and maybe in that analysis and analytics. We talked about some eventing you could maybe do in the middle. You could forking.
Mike Matchett: Some machine learning came up in our conversation but it really now becomes a situation where someone's data is in motion and while it's in motion, we want to apply the application to it and do our applications. Not necessarily even at the endpoint which is what most people think of as get something like streaming Kafka, ticket here, and putting it out there, putting it on message queue, and getting there and then I'm going to do something down there. You certainly do that, right. But you're seeing this shift to, hey, let's actually build the value of what we're doing while the data is moving.
Steve Wilkes: That's right. You can certainly move the data from one place, put it in somewhere, store it, and analyze it afterwards. But the real value is to get real time insights and that's where a lot of customers are heading. They may not all be there yet. They might be solving the first problem which is now to get to the data in a streaming fashion and move it, right. But those are all working on getting real time insights, recognize that the only way of doing that is by doing it in memory and doing it at the point the data is moving.
Steve Wilkes: And that's why you can take data sources in that platform and you don't just have to put into one place. That same data source can be used to deliver into multiple targets and can also be used in analytics. So you can have multiple data flows, multiple pipelines all coming up the same streaming data, combining different streaming data together, running all these complex queries in analytics, and visualizing it all in the same platform without you having to try to piece together lots of different bits of software to achieve a solution.
Mike Matchett: Right. And I know you guys work hard to make the lingua franca if you will of what goes on in the middle, SQL-based, right. So it makes it very accessible to someone who is used to analytics to filter and aggregate and group things on the streaming data as if it were the final data. The other thing I liked is as you're delivering the same consistent data in a real time way, you can put it in a graph database for graphical analysis. You can put it in Kudu for a drill down query. You can put it into Alexandra. You could put it into just anything else. And now in real time, all those other places are updated with the same information. So any downstream applications in the whole business are looking at sort of the same data.
Steve Wilkes: That's right. And it's actually really great for us that there are so many different technologies today, that people want to utilize the different types of analytics because at the cool integration requires things to integrate. And the more different types of sources there are, the more different types of targets there are, and the more you want to keep that up to date in real time, the more you need a full complete streaming analytics platform that supports all your sources of targets and can stream the data.
Mike Matchett: All right. One last question. Just for the pro or more I.T. oriented audience, what are the enterprise kind of features that Striim brings to the table because you're not just connecting something up with an open source wizardry, right? You've got a bunch of good things there. And where can we find out more information of this if we're interested?
Steve Wilkes: So I'll answer the second question first. So for more information on our website, striim, S-T-R-I-I-M.com. And there's a lot of information there. We also have lots of videos you can watch on YouTube and get to those resources. As far as being an enterprise grade platform, we are inherently Java based, cluster server based platform that supports full failover, recovery scenarios, enterprise, migration purity built intimacy and lot and even individual data streams. It has the capability to integrate within everything you need.
Steve Wilkes: Because it's clustered, you can add more service as you need as well so you can handle scale and you can scale as necessary. And we can deploy our server platform almost anywhere. You could deploy on-premise, on bare metal machines, VMs, et cetera. We have containerized version of the platform. We also have marketplace offerings in Google Cloud, Azure, and AWS so you can split up instances there. So we run anywhere Java VM is available. And that can all form a cluster on which applications can move data easily from one place to another. And the cool premise really is getting your data where you want when you want to.
Mike Matchett: Zero data loss and just scalable for high performance, high throughput. I know these cases you're presenting are just the tremendously large companies that are doing transportation worldwide and everything else, lots of this. I'm afraid we ran out of time today though, Steve. So we're going to have to come back, have you come back and tell us more about some of the details of what's going on here because I think this is a brave new world and where everyone is going to need to go. So thank you for being on the show today.
Steve Wilkes: Great. Thanks a lot, Mike. It's been a pleasure.
Mike Matchett: All right. And thank you for watching and stay tuned. Thanks. Bye.
Steve Wilkes: Bye-bye.