How do you manage heterogeneous data lakes easily


Summary:                          In this short video learn how to manage heterogeneous data lakes when they span a variety of locations, clouds and workflows.

Notable Quote:                  “…we give you access to the data regardless where it's stored, regardless how it's stored, and regardless what BI tool you use, in less than a second.”


Mike Matchett:                  Hi, I'm Mike Matchett with Small World Big Data and we're gonna talk today a little bit about data, big data, data queries, how do you serve data to the masses of your users? In a minute I'm gonna bring up Bruno Aziza, who's the CMO of AtScale, appropriately named because they really help you tackle this problem, at scale. What are we gonna talk about? We're gonna look at why IT needs to be concerned about how they serve data. How do you take advantage of those data lakes you're building? How do you manage to provide, define data access to the masses of data you've just spent years collating in a variety of places, consistently, to lots of end users across lots of platforms? How do you manage change, and most importantly, how do you handle the shift to the cloud when the platforms are changing on one hand, the data locations are changing on the other hand, and the tools that users are changing might even be on your third hand that you don't have?

                                                      All right, with that in mind, welcome to the show Bruno.

Bruno Aziza:                          Well thanks for having me Mike.  

Mike Matchett:                  All right. So we've talked at a couple companies before as you've been in this industry. AtScale's kind of unique. Tell me a little bit about where you got the genesis for AtScale, how that came about.

Bruno Aziza:                          So I'll tell you a little bit about AtScale. I'm gonna move to the center of the screen. You can see that's our new logo here so I'm really proud, and the team is very proud of it. And you can come and check us out at But the genesis around AtScale is fairly straightforward. It's a solution, it's a software solution that's designed to help enterprises that are dealing with large amounts of data, and that's basically 100% of companies today, but are unable to provide access and interactive analysis capabilities, if you will, to their business users. Why is that? Well it's because the data platforms they have are disparate and fairly complex. You've got relational databases, you've got Hadoop, and you've got cloud data sources. So all of them have different rules in the way they are being deployed.

                                                      And then there's also the business intelligence tools themself, which are quite different from one another. Some of them talk SQL, some of them talk MDX, some of them require API, and so it's a very complex model. So the best way to think about AtScale is a unifying, a simplifying software layer that sits between your data and your business users.

Mike Matchett:                  So I've got lots of people that use Excel, I've got some people that use some sophisticated BI tools. A couple guys on Tableau doing different stuff. I've got some Cube analysis guys. And then on the other side, I've got data lakes that I've been building and I've got ... Used to just be Hadoop but now it's Hadoop, and Spark, and BigQuery, and just all sorts of stuff over there. And so you're really coming and saying, "Look, you can marry those things directly together and have this like a web of directional relationships. But it's really hard to move or to change or do anything there. If you insert AtScale in the middle there as an abstraction layer between those, everything becomes clean. The things become clean, you have a one unified set of models, and then you can change platforms on the back-end and mix and match as economics drive you to one thing or another." Right? That's kind of-

Bruno Aziza:                          That's exactly right Mike. And the reasons why this idea of the universal semantic layers works is because if you're trying to do this today without AtScale, it's gonna require a lot of data movement, it's gonna require a lot of rework. I was with a healthcare customer a few days ago in Los Angeles. They were telling me that they were redoing the same thing 13 times. And so using AtScale, they moved from doing it 13 times to just 1 time, so you can imagine the amount of agility you're gonna get. You can imagine the amount of savings you're gonna get, but also more importantly, your business gonna perform better 'cause now you don't have to go through these data duplication projects and doing the same work multiple times.

Mike Matchett:                  And automation is really the key word in IT these days and doing it once, and then having that be leveraged has gotta be really high up there for people. We were also talking, it's not just the complexity, you also offer speed advantage, a performance boost on queries. What is that?

Bruno Aziza:                          That's right. So there are three things that we bring to the enterprise. We call them the three S's. And I think you're talking about S number one, which is speed. And so this idea that you might have terabytes of data stored across multiple layers, right? Relational, Hadoop, and cloud, but your business users, when they query that data, it takes hours for them to get the answer. Even a minute is too much. And so we have this technology that we now are on our third generation. We deal with clients that are large enterprises across financial services, healthcare, retail, so think about the Bloomberg of this world, the Home Depot, they have lots of data, they got lots of concurrent users. And so instead of having to move this data or anything like that, they use technology from us and the particular IP we have.

                                                      Actually I think that's the patent of the adaptive cache right behind me. It's a technology that allows us to essentially give you access to the data regardless where it's stored, regardless how it's stored, and regardless what BI tool you use, in less than a second. And so this is the type of interactive speed that you want. We use all the typical SQL engines that you know of, the Impala, Spark, [inaudible 00:05:12], and so forth. And our engine actually decides which engine to pick for which workload, which is also very useful for IT 'cause without us they'd have to make that determination manually, which is almost impossible.

Mike Matchett:                  All right. And real quickly, let's hit that last S that we haven't hit. So IT guys also have to be concerned about security, access, data, auditing, all that stuff. What do you bring to the table by being able to put that abstraction layer in between there. What can you do from there to help secure data, secure the environment, secure for data governance?

Bruno Aziza:                          Yeah. You're right. So the security is extremely important for our enterprise. I mean, anybody that's storing any data these days. You can see Facebook's getting in trouble with what they're doing, and they have all the engineers, so if you're running IT at an organization what makes you believe that you can do it better than Facebook? So hopefully we won't see you on TV. But the idea is that the universal semantic layer AtScale provides has three benefits. Speed, we talked about it. Scale, we talked a little bit about it. We have this technology called the Hybrid Query Service, essentially lets you access data from any tool to any data. And this last piece, which is about security is the idea that once you have multiple and disparate systems, you can't realistically apply governance and security individually. You'd have to rebuild it every time. So we have a lot of assets, but there's two I'll name in particular. One called True Delegation and another one called Perspective that allows you to manage access to that person about the data they're only supposed to see, and when they're supposed to see it, across all systems and all tools. It's very powerful. And again, it's unique. We have a patent for that too and you can't get it from anybody else.

Mike Matchett:                  And what I liked was, if I have a single-user ID ... In the old days, right, you logged in as admin on Hadoop and you had access to everything. If you have a single-user ID make in query, and that query on the back-end goes across six systems, you actually are using your true delegation to preserve their particular access credentials across all six of those systems on behalf of them, so it really takes care of that complexity for you.

Bruno Aziza:                          There's a few things. I mean, we have an unfair advantage here, which we kind of took a completely different approach than the traditional ones we're talking about. Which in the past, the way you would solve that problem is you would move the data around and you would lock people on one tool, which you can't do that today. So our unfair advantage here is that because the universal semantic layer is this virtual software layer that sits there that doesn't move data, that does not install driver on BI tools, that does not install any intrusion into your data center, so we install in one note and we manage the rest, it allows us to see everything and impact it globally rather than having you change your infrastructure for those specific governance needs.

Mike Matchett:                  And I know we don't have a lot of time left but we were talking earlier and you just mentioned three different use case problems that I really think are key to enumerate here. And I'll let you tell me about them quickly if you can.

Bruno Aziza:                          Yeah. No. So there's three issues. The first one is the typical self-service issue. You have spent millions of dollars, are doing great job storing the data, but guess what? Less than 1% actually makes it to your business users. We just completed a maturity survey to more than 5500 respondents, and 60% of the end users are complaining that they're not having self-service access data to the data storage in your database. And so that's problem number one, self-service.

                                                      Problem number two is this idea of the data lake itself. And you talked a little bit about that complexity. How do you deal with data when it's stored in Hadoop, how do you deal with it when it's stored in the cloud? We help a lot with that, manage your hybrid data lake, and not just for today, but also because of the unfair advantage we have with our infrastructure, thinking about how it's protecting you for the future.

                                                      And then the third one is the cloud. When you have a system like Google BigQuery and it's serverless and it's completely different. And then Amazon Redshift and you have a multi-cloud strategy, what's your plan? And so AtScale, by providing this universal semantic layer, allows you to build it for one and move it to the other ones. Earlier in the call we were talking about a customer that moved from analysis services to Hadoop, and then from Hadoop to the cloud. Now their business users never knew they made the change. They use AtScale the whole time. They use Excel as a front-end, and they never notice the data platform change because they were protected by the universal semantic layer.

Mike Matchett:                  I mean, I like that story because the guys on Excel never changed from using Excel, and they went from Excel, using some Microsoft Cubes to Excel over Hortonworks stuff, and then from Excel over BigQuery, which doesn't even naturally support it. And the guys never even knew about it, right?

Bruno Aziza:                          That's right.

Mike Matchett:                  And I just thought that was an excellent story. So I think we're out of time here today Bruno. I know there's a lot more to talk about in your specific IP. Hope to have you back on. But thanks for explaining what you guys do. I think it's a big value to IT folks and should take a look at it.

Bruno Aziza:                          Thanks for having me Mike, and you can find out more at

Mike Matchett:                  Thanks. And thank you for watching. This has been Mike Matchett at Small World Big Data, and I hope to have you back soon. Thanks.