Data protection and cloud security can have competing needs requiring review of governance and security protocols with Mike Matchett and Mike Osterman.
data, backup, security, matchett, osterman
Mike Matchett: We're going to talk today a little bit together about cloud and data protection and why people might want to move to the cloud but are holding back because of security reasons and still putting infrastructure in datacenters. Because I've heard that cloud has hit 50% penetration now in terms of replacing infrastructure. Everyone wants to get out of the infrastructure operations business. They are quickly moving to service providers for just about everything they can. And as their workforce becomes more distributed and mobile they're finding that it's just too much work to bring specialization in for those things when they can just have someone else do it. It was doing it at scale but I've talked a lot of enterprises and they're still holding back and some are good bringing stuff back on premise. Building a private cloud, if you will, or worried about some data that they still don't trust being in the cloud. And so one question I have for you Mike, as a security guy, is what are they really afraid of? I mean is the cloud more secure these days than his own data center in a lot of ways?
Mike Osterman: Yeah I think generally it is. I mean if you look at the major data breaches we've had whether it's Equifax or Target or Marriott or what have you. In the vast majority of the cases it's really the on-premises infrastructure that has failed or the people managing the on-premises infrastructure. It's really not the cloud. I mean we haven't seen major data breaches of Amazon or Google or Microsoft in the context of managing other people's cloud applications. Office 365 for example, has had some outages in the past, but in terms of security, I wouldn't say bulletproof, but nearly so. I think though, there's still this mindset out there, that it's just safer if things are behind a wall somehow. But I think it's more perception vs. reality.
Mike Matchett: Is there anything to all these new compliance regimes whether it's a HIPAA or GDPR or something that says in order to really meet those regulations I have to keep the data in my hands on my own site? Or I mean, obviously there's some GDPR creation kind of issues where it says keep it in the country? I get that one but keep it on your own site?...Is that is that something people really should be concerned about?
Mike Osterman: Not really. You do have to be concerned about jurisdictional issues because you can, for example, for countries in the EU, you have to keep data within the EU and so forth. But there's nothing in GDPR that says you have to keep it on premises. Now one thing you have to do is make sure that all of your cloud and other vendors are GDPR compliant because if you pass data off to them as a data processor or a controller you are still responsible for what they do with it. So I think it's probably more of a management issue at this point. How do you manage all of these vendors if you look at the typical large organization? I've seen surveys that say there's something like eleven hundred different cloud applications in use you know some sanction some not sanctioned. And I think that gives a lot of I.T. managers pause to really consider what is the safest approach. Going with the cloud is not necessarily less safe than keeping things on premises.
Mike Matchett: Yeah. And I know people with cloud vendors that I've talked to are like, we go through tons and tons of validation checklists. We've got best practices, we train our people, we've got all sorts of security parameters in place to to hold the data, and we really don't care about your individual data. So it's less likely that you've got a rogue administrator who's going to do anything to any particular issue or least that's what we've been told. But I do wonder sometimes about people putting data in the cloud if the problems don't creep in when they don't really consider encryption properly. I know there's stories of people who put things in S3 and they sort of rely on the anonymity of that big long URL to protect their data and it's not really protected if you just guess or crawl through S3 addresses you can find a lot of interesting stuff. So, I'm wondering if that isn't really more the issue...of why people are saying, "hey let's keep it in-house because our our when our users go to use cloud services there's not enough I.T. governance in the way they're not doing the right things."
Mike Osterman: Yeah I think that's a very good point. The issue is not with the cloud itself, it's in the way the information is managed; the information governance platform or program that organizations have or don't have. I think part of the problem is that a lot of organizations don't really consider information governance very seriously. They don't look at all of the implications of having data that's not encrypted or that they're keeping for too long. Most organizations for example, don't have a good defensible deletion capability. They don't know what they can safely delete. And so they just sort of keep everything and keep building the risk over time because they're maintaining all of this data that, one: they don't really need; two: it's often in locations even if it's on premises where it's not really managed very well...it's not encrypted; it's not tracked; it's not audited and so forth. So it really becomes more of a management issue, if you will, as opposed to just the physical location of where things are stored.
Mike Matchett: I know that again I've talked to some secure storage providers who are making big deals about being able to manage and create archives and other interesting compilations of massive amounts of unstructured data that seems to be the hot thing right now...an "active archive." If you want to keep a couple of petabytes 200 petabytes of archival data around you probably want make use of the cloud in an elastic and cheap way. And they play up a lot about the idea of keeping metadata about the data and actively indexing it so you can go in and say let me run some natural language kinds of queries. So just conversational kind of queries like: "which data do I have that has social security numbers in it?" and, find that data in which which, if it's e-discovery, typing in a fairly straightforward plain sentence query for discovery and looking across those petabytes of unstructured data to find out where that stuff is and pull it back because I do think people get in trouble, like you said, for not getting rid of the data they should be getting rid of. And on the other hand not keeping the data that they probably want to keep in order to mine for machine learning and uses their historical training model data that actually gets more valuable as you get more of it and as it gets older. So both ends of that problem I think are what some of these newer storage companies are aiming at newer archives service companies and newer versions of of backup and recovery products but that's morphing into that active archive space. So do you see the folks you're talking to get into that level of consideration...of where in the cloud they put things or where's where's my data and how do I find it easily?
Mike Osterman: Oh absolutely yeah. And people are becoming more cognizant of the need to really track their data to really understand what's there. In part it's things like privacy regulations you know they need to know what's in their data so that they're not accidentally, or potentially leaking stuff out. They also want to keep it for purposes of things like subject access requests because the GDPR has a requirement in it as well. Most privacy regulations where you have to be able to produce everything you have on an individual. And using traditional processes that's very cumbersome very expensive and very time consuming. And so organizations are going to have to get a handle on their data if only just to be able to know what they have on somebody and to be able to extract it efficiently. Also we're seeing a lot more, as you noted, around active archiving where organizations can go in and not only find out what they have, but really start doing some analysis on the data. We're seeing now the merger of archiving and analytics so they can go through and get a better understanding of your sales process, for example, than you'd find, you know, just within a typical CRM system. You understand who contacted you when; how long it took to get back to them, etc. You know you can do sentiment analysis on customer requests for information and that kind of thing. And we're seeing very few companies do that today. I think we'll see a lot more do it in the future not only for things like e-mail but also for social media posts for text messaging even files and various other types of unstructured data.
Mike Matchett: Yeah I mean it's curious to see these great capabilities and there is always a question of adoption and when people are going to start using it and do these great things with it. And so you know I'd be curious this year going forward. What's the tipping point for people to say, "hey, I have all this unstructured data, I have all these use cases, I could line up with that"? What's the last straw that has to fall for me to actually do something with it using these modern tools these modern active archives and putting all the content in a searchable meta index and and being able to run stuff? And I'm thinking this will be a great year for big data kinds of techniques to come into play and people to say, "I can use the cloud for parts of this problem. I can use the cloud to do the metadata indexing part. I can use the cloud to do the management part even if the data store itself is on site or in Glacier or Azure or a couple other places scattered around." So I'm kind of interested about that. So I guess the kind of last piece that some people sometimes talk about is encryption. Well you mentioned a little bit earlier, and we have I know, encryption in-flight and encryption at-rest. It's always curious to me though, where's the best place to put the keys for your encryption mechanism if you're building hybrid cloud? And where that gets stored...and is there anything happening in that space about managing the encryption itself and making that safe?
Mike Osterman: Yeah, I think people are becoming more cognizant of it. I mean, you don't want to store your keys in the cloud because now you've got another security problem. You don't necessarily want to store them on premises because now you've got issues of key management and so forth that can be very cumbersome in that can create its own set of security risks. I'm not sure the issue has really been fully settled yet.
Mike Matchett: I talked to one cool company that's going to put some storage above the clouds are going to put it in satellites. They're going to launch a bunch of satellites of storage and you'll be able to technically beam up from your data center on a tight microwave link data right to and from the satellite and the satellite puts it outside of of any national concern so that the patrician proper jurisdiction problem seems to be sidestepped there. And I'm wondering if that isn't the right place for your encryption keys? Get them above the cloud, get them space. Right. Low Orbit low earth orbit.
Mike Osterman: That's really interesting.
Mike Matchett: Yeah yeah. "Cloud Constellation" I think is the service. So what. Look at that. Anyway I think that's it for this topic today. This has been Mike Matchett with Small World Big Data.
Mike Osterman: And this is Michael Osterman with Osterman Research.
Mike Matchett: Thanks guys and we'll be back soon.