Transcript from June 29 interview. You can view the full video here: What's new w. MapR? - Hi, I'm Mike Matchett with Small World Big Data, and I'm back again today with a special guest from MapR. And we're gonna talk about their latest release, wh...
mapr, update, transcript
Transcript from June 29 interview. You can view the full video here: MapR Update June 29, 2018
- Hi, I'm Mike Matchett with Small World Big Data, and I'm back again today with a special guest from MapR. And we're gonna talk about their latest release, which is doing some cool things. What's happening in the big data space? Generally, seems to be, not only are there a lot more projects coming out, and data's getting bigger, but we're also trying to do more with it. We're trying to do very smart things now, we're trying to do things in more real-time, we're trying to do machine learning, we're trying to do AI, we're trying to bring things into an operational timeframe. MapR's helping us get there. Welcome to the show, Jack. Jack is the SVP of Data and Applications for MapR.
- Hey, thank you, Mike. Happy to be here.
- All right, I know we've talked to you previously about some other MapR cool things. Today, you've got a new release coming out. What's in this new release, primarily, what's the main theme?
- So, the main theme, I mean, you did a really good job in that intro, but if you look at companies wanting to take advantage of AI and analytics, the data platform that supports that is increasingly important. 90% of the success in AI is the data logistics. So, this release, a lot of features in there. Number one, how do you make developers more productive in terms of development and deployment of those applications? How do you have those applications fed with the right amount of data? And increasingly, that's really high-volume of a really disparate group of data. So, that puts pressure on, well, I need high performance, but I also need to manage cost. We put a lot of features in there to manage TCO, to reduce the footprint, to take advantage of diverse infrastructures, including on premise and cloud, and how to take the complexity out of that whole data movement, protection, replication across locations.
- All right, I know there's a whole bunch of stuff in here, and it's all really interesting, we could spend hours, I think, going through it. Let's just dive in on a couple things. Let's talk first about this Object Tiering. What is Object Tiering, what are you talking about when you say Object Tiering, and what're objects doing in MapR, anyway? Give us a little background on this.
- Well, if you look at the volume of data here, it's like, I need high performance, but I need to make sure it's as efficient as possible in terms of where it's stored and how it's retained. And then, for some of the data, the deep archive, I rarely or will infrequently access that, but when I need it, I need it, so how will I make that as cost-effective as possible? So, with our platform, you've got a single platform across the data fabric, but the underlying areas where that's stored, the system is managing. So, we've got high performance, we're compressing using erasure coding to make that footprint as small as possible, and then we're taking advantage of object store, either on premise or in the cloud, like through an S3 interface and storing data there. And making sure that we're managing, not just where it's stored, but how to deal with it on an individual file-by-file basis, so that you're optimizing cost. Because the cost structure in the cloud is different, right, it's based on storage movement as well as retention.
- Right, so we've got some ways so we can move data under the hood from replication to erasure coding to cloud storage. And the user doesn't have to be aware of it, which I think is really, really interesting.
- You've also done some other things in a couple of areas, which I think are aiming to converge some of the different components one might have, where you've got database things that now let you index JSON, and you've got native search being built in, and you've got augmented Kafka API stream objects. What is MapR aiming for by bringing all this stuff together there? Why are you putting all that in one place?
- Well, if you look at the typical application now, it's not just about one type of data and one type of processing, it's typically a broader use-case, right? I need to understand the context of this data as it's arriving, I need to quickly take a real-time action because it's not about analytics to explain what happened in the past. It's how do I inject intelligence into my customer engagement to drive revenue? Or into my transactions to figure out risk and fraud? Or how do I drive efficiency in my production process? So, that intelligence has to be part of that operation. So, what that requires, is this disparate operation on a common set of data. So, when we talk about a data platform for AI and analytics, that's what we're talking about. And, squeezing out delays and latency and providing this very robust, powerful set of processing on that data, is what it's all about. Not only on premise or in the cloud, or at the edge, but across everywhere.
- Right, so you're really saying, look, you're gonna want to add intelligence to your business processes, if that's what everyone's trying to do, I imagine, bring machine learning, bring advanced analytics, bring AI to the table, you're gonna want a consistent set of data that's accessible by file protocol, by object protocol, tables and streams, and not have 18 copies of your five-petabytes of data all over the place, right, you want it in one place, and you want it mastered and so on, so you guys are doing that. And you're bringing it together, and you're also simplifying management, I understand, too. So, you're doing some things in that space, which I thought was really cool. So, this is a general release, and where can we find more information about this?
- At website, mapr.com, we've got information about these features. And I even neglected, we've got a whole series of security innovations involved in being able to lock down that data at the data level, so regardless of which access method you're using, it's a consistent, secure system.
- Right, because you can't put it in production unless it's manageable, it's secure, and all those good things, which is what you guys are bringing to the table, is at that one place, so, that's great. Well, thank you for being on the show today again, Jack, with that update, I look forward to hearing more from you.
- Thank you, Mike!
- And thank you, it's been Mike Matchett from Small World Big Data, and we'll be back soon, thanks.
- All right, bye.