An interview and introduction to real time data streaming for time sensitive applications such as IoT with Streamlio.
matchett, streamlio, iot
Check out the full video here: Streamlio for Real Time Data Streaming
Mike Matchett: Hi, I'm Mike Matchett with Small World Big Data and today we're going to talk about fast data. That's data that's not just big, but moving fast. We're talking about streaming data, real time data, message streams, message cues and all that other good stuff in this new land of big data. But if you go to implement today a fast data solution and you pull off some of those open source projects you find yourself actually maybe assembling two three four maybe even five open source projects and you know what? It gets complicated and you want to hit yourself in the head with a hammer by the time you even get somewhere. We're going to today with Jon Bock who's the Head of Marketing with Streamlio. Streamlio decided to make it simpler for everyone and keep it highly scalable and performant. Welcome Jon. Thanks Mike. Good to talk to you today. Alright so Streamlio, you guys are technology guys coming from a whole bunch of big companies like Yahoo and all those companies that make all the fast streaming stuff right?
Jon Bock: Yeah our team definitely had that experience of how do you build a platform that can handle streaming data real time analytics and do that at large scale.
Mike Matchett: Alright so we're not you know when we started talking I was a little bit like OK just yet another streaming platform and there's a bunch out there and there's several opensource projects we can all name. But you're actually doing something different to that as we're describing this you're actually building I don't want to say the analogy is like a spark for machine learning you're building a streamlio for applications that want to do message processing or handle message streams. It's kind of it's own little platform on it's own for streams.
Jon Bock: Exactly yeah. You know we really saw the needs to provide kind of that execution platform that any application that needed to access real time data or streaming data or really any data as it arrives, could actually go directly too and not have to deal with some mishmash of a bunch of different technology pieces.
Mike Matchett: This is in fact something if people are saying like OK I have gotten applications I've got some ideas I want to take a big data journey. Maybe I should build a data lake maybe I should do something else. This is actually one of those alternatives and one you'd actually probably recommend coming from here to say you could build applications that use Streamlio and use streaming data and you guys are wrapping up the message brokering, the logging, the storage, you've been got some ability to do replay on streams and things into one little platform package right?
Jon Bock: Yeah, correct because the applications really want to access an act on data as soon as it arrives and in some ways it's much simpler to build an application that can actually act on that real time and data as it arrives, as opposed to needing to think about where the data resides. Does it reside in the data lake, or is the data lake behind it? I need to go grab faster information from somewhere else. This really simplifies the way that an application thinks about how it can act on data.
Mike Matchett: And it doesn't prevent you from building a data lake. In fact we were talking about it might make sense to start here and then use that to feed and drive what the data going into a data lake for a longer term history database and a data warehouse if you want to do that BI kind of thing and some other solutions right? So this could actually be the first step.
Jon Bock: Yeah in many ways we're providing a platform that can be that first mile for when you first receive the data and want to understand it and act on it. But you can still then have that data move downstream to that data lake, that data warehouse where you might do longer historical analysis or exploration or some complicated model building or things like that.
Mike Matchett: And now we really don't have time to get into all the good things you've added into this that you baked into it: the performance and scalability, the overall simplicity of operation and deployment, the unification of management, and so on but that stuff's all in there, right?
Jon Bock: Yeah and that's the value ad that we bring. The core technology is open source technology, but we've integrated that and then put a layer on top of that, that makes it easy to deploy, manage, and use that as a single solution.
Mike Matchett: Right. And let's talk about use cases just for a minute or two here. So when we see people saying, hey, I've got event streams coming in from my web analytics, people using my consumer products and I want to correlate them, or they've got a lot of IoT devices or connected devices. Those are the kinds of things people can start to jump to directly. I think you were talking about, today people are moving towards a more practical approach to their big data and saying there's a problem I want to tackle right now rather than a pie in the sky thing, and that's where you guys are aiming.
Jon Bock: Yeah exactly. Because when people see new data sources and they're trying to figure out how to start getting value out of those new data sources, they don't want to take on some huge, massive re-architecture project to change everything. So that's a very common scenario, IoT is a good example. Instead of trying to create some broad multipurpose IoT platform, people are saying what's a fast way for me to start integrating that IoT data that I might have available from sensors, from devices, from different distributed applications? How do I start integrating that data so that I can actually do very practical things with it very quickly? Maybe it's even just simply: I want to start alerting on that, and then you can move toward: I want to start to do analytics on that, and then you can start working toward: I want to create applications that can act on those analytics. It opens up a roadmap for you, but you can start with that immediate value that you need to get.
Mike Matchett: Right, because you're not simply a message boss. You can do some processing in there, you can do some other things. This is a place to really take that streaming data and make it valuable to the enterprise.
Jon Bock: Yeah, it's really the evolution from what used to be, what I've called, dumb pipes connecting data applications to an intelligent fabric, where that intelligent fabric can not only connect the data, it can also process that data as that data is moving and even enable acting on that data.
Mike Matchett: Right. Right. And if we had time I'd be asking you if I sent you some of my processing too, could you do the same thing with streaming that out to where it needs to go on there? -which is great. Tell me a little bit about where someone should start if they want to start getting into this and looking at this. Should they just go website download something? Do you have some educational path? Where should they go?
Jon Bock: Yeah, so on our website which is just streaml.io, you can actually find a sandbox that you can download. You can run this on your laptop, it's very easy to get set up and install. There's also a standalone mode if you want to deploy that in the cloud to get started. And there are some tutorials that can walk you through actually getting you familiar with the system once you've done that, or you can certainly even just do your research and reading with materials that we provide on the website. And then there are the open source technologies and there's a community around those that can also give you lower level details about how those technologies work.
Mike Matchett: And just to recap, that's pulsar?
Jon Bock: Apache Pulsar is the core technology. There's also Apache Bookkeeper. There's some components that we learned and leveraged from Apache here and there.
Mike Matchett: Great. Well thank you very much for being here today Jon. I know we didn't get as deep as we wanted to go, but that's a great first introduction for most folks.
Jon Bock: Great well thank you for inviting us, thanks.
Mike Matchett: And I definitely want to have you back and dive deeper into streaming because obviously, the world's real time, so we got to know more about how to do that.
Life doesn't happen in batch.
Not in batch. Take care. Thank you and come back soon. Bye. Thanks.