[Ed. Note: This week marks The Geek in Review’s 4th Anniversary. We thank you all for listening, subscribing, and telling your colleagues about what you hear. We’d love to hear more from you on what your favorite episodes are or what topics you’d like us to cover. Tweet us at @gebauerm and @glambert with your thoughts. Thank You Listeners!! – GL/MG]

We all know the saying “High Risk, High Reward.” But when it comes to data security, Peter Baumann, CEO and co founder of ActiveNav, we derive the value of the data because we just can’t get through the risk. There are three things always facing businesses whenever there is data involved, and that is the protection of the business’s reputation, the costs involved in non-compliance, and then the exponential growth of data within the organization. We are so focused on reacting to these three variables, that we simply cannot do anything on the value of the data itself.

Peter talks with us about the number of existing patchwork of regulations around the world, and how it makes it too difficult for business and organizations to comply. And while most experts suggested that regulations like GDPR would only govern those with businesses or people in Europe, it’s become the de facto compliance bar for privacy and data security for many businesses. He suggests that the US Government needs to step in an set a clear regulatory path around data privacy and security so that businesses know what the rules are, and the legal industry can better advise their clients on what steps they need to take to be compliant.

We dive deep in this episode and talk about what is structured and data. And how the existence of “dark data” within a business is what brings the highest risk of all. While doing data assessments on Terabytes and even Petabytes of data is extremely expensive, data breaches are even more expensive. The goal in Peter’s mind is to get to “zero dark data” so that you can stop worrying completely on the risks, and start understanding the value within your data.

Listen on mobile platforms:  Apple Podcasts LogoApple Podcasts |  Spotify LogoSpotify
Contact Us

Twitter: @gebauerm or @glambert
Voicemail: 713-487-7270
Email: geekinreviewpodcast@gmail.com
Music: Jerry David DeCicca


Marlene Gebauer 0:00
Welcome to The Geek in Review, the podcast focused on innovative and creative ideas in the legal industry. I’m Marlene Gebauer.

Greg Lambert 0:23
And I’m Greg Lambert. So Marlene, we’ve got a couple of birthdays to celebrate this week. I know yours is coming up a little bit later this week. But believe it or not, I was I was just curious this morning, because I knew it was coming up. But the pod is four years old today.

Marlene Gebauer 0:42
Isn’t that amazing?

Greg Lambert 0:43
It’s just crazy.

Marlene Gebauer 0:44
I know. I know, I was just talking to somebody this morning about this and was saying that it’s just four years old. And they’re like, really? It’s really been that long. I was like, yeah, it has been and I feel like we have have really grown into this. And as I was saying in my conversation, you know, we’re going to continue to grow, we’re going to continue to look at new things and bring on new types of guests. You know, because we want to keep it fresh for the listeners.

Greg Lambert 1:07
Yep. And I know my, my budget has grown. Or, at least my output budget. So I’ve got all kinds of cool little audio tools. Now. That’s true. Remember,

Marlene Gebauer 1:17
I remember when we started like, what what did we have? We had just like mics, I think. And they weren’t, they weren’t even good mics.

Greg Lambert 1:24
Yeah, they were they were pretty bad mics for I think I bought for $23. And then we went to we really upgraded and I think we both had yetis. Yes, that’s true. While that’s true, and now we’re we’re like, semi pro. That’s right.

Marlene Gebauer 1:41
That’s right. I mean, we started with anchor as the, as the platform, and, you know, had a lot of had a lot of challenges with that. And when we tried zoom, and also had some challenges with that. And, you know, finally we decided, you know, landed on Riverside, which we’re using now. And it’s a lot of it’s a lot of trial and error, trying to figure out what you know what works for you. But it’s been it’s been fun doing that.

Greg Lambert 2:12
I would be afraid to go back and listen to some of those first,

Marlene Gebauer 2:15
you know, I really don’t like I think about it. Like it was good interviews. I mean, it’s like really good content, but I’m like, I don’t think I could do it because I don’t want to listen to

Greg Lambert 2:24
Yeah, cuz I think I grabbed some like free music off the interwebs..

Marlene Gebauer 2:27
Yes. I remember going through that. Like, you’re like, what does this sound like? I’m like, that’s terrible. And then we finally got a couple words like, yeah, that’s pretty good.

Greg Lambert 2:37
But, but I still think we have the best music with The Jerry David DeCicca is

Marlene Gebauer 2:44
no doubt we were so fortunate that Jerry let us use the music because it is it I think it really reflects us. And it’s also like, very different than anything else you’ll you’ll hear in a legal innovation podcast.

Greg Lambert 2:59
Yeah, yeah. Because it gives us a nice, Texas based podcast now it’s got it’s got a twang, even though you were in New Jersey when we started. That’s

Marlene Gebauer 3:07
that’s that’s true. But I was already a Jerry fan even then.

Greg Lambert 3:11
Well, thanks, everyone for listening. I think, you know, four years in, and we’re just getting started.

Marlene Gebauer 3:19
Yes, thank you to everyone who you know takes the time to listen to our podcasts on innovation and creativity in the legal industry. It’s really been a great ride so far. This week, we decided to dive deep into information security and how it has consumed so much of our lives, both personal and professional, as we rely so much on the internet and cloud computing and storage.

Greg Lambert 3:41
So we brought in an expert to guide us through the justice regulatory patchwork of rules across different regions of the world. And even different regions here in the States. In the US. We have a talk about who within the legal industry are really even the true experts when it comes to structured and unstructured data. And we even talked about dark data. Ooh, sounds like the start of a really good ghost story, but instead is just just us geeking out. We’d like to welcome Peter Baumann, CEO and co founder of ActiveNav. Peter, welcome to The Geek in Review. Pleased to be here. We have talked a number of times on this podcast about privacy legislation and regulations, especially, you know, we throw out those four letters of GDPR. And then we also talked about California’s version that they’ve created on data privacy. So we just thought we would go ahead and just do an entire episode and focus on what’s going on here in the US. Who knows we may or may sneak in something abroad as well.

Marlene Gebauer 4:50
So in the past couple years, the number of states enacting privacy legislation has really boomed starting with California and most recently Connecticut, Virginia, Utah. and Colorado. And now Congress is beginning to hold hearings on their own federal legislation, the American data privacy and protection act. Peter, you’ve been in the information service industry for decades. What do you make of this recent proliferation of legislation?

Peter Baumann 5:15
Well, I think personally, when somebody says For decades, it’s a bit scary that you are writing is that it is now three decades, I guess I am qualified for that path at least. And I should then also say, I’m not a lawyer, I’m not an attorney. I’m not a lawyer. So my view is very much from on the ground experience of trying to do this with practitioners in law firms and elsewhere. I think this is fascinating. I, you know, I’m an entrepreneur, in the last half of my career, I run small businesses or relatively small businesses. And the last thing we want is red tape, and more administration, more bureaucracy holding us back. But I’m all for privacy, privacy regulation. And I was just having a chat early on with Greg before before the radio cast began. And I spent the last nine years in the US as living just outside of Washington, in Virginia. And so I was a European that went over to the states, and I experienced firsthand how there was a different approach to personal identifiable information, private information about myself and my family. And it’s far more lacks in the States. And it was it was not unusual to be asked for all kinds of things that I thought were very personal, and you wouldn’t dream of being asked for generally speaking in the UK, in Europe, date of birth, social security number now, I appreciate it’s a different system. No one’s ever asked me outside of a medical situation, what my national health my social security number is in the UK or anywhere else in Europe, for that matter. And so that was a little bit of a shock. And then I think the other thing was interesting was when we had GDPR arrive on the 25th of May 2018. I had a lot of my friends, a lot of my work colleagues, our Gushue Europeans your Luna you know, lunatics why you Why do you care so much about all this, just it doesn’t matter, just share it just share, it doesn’t matter. And then of course, breach after breach after breach came through. And then there’s the OPM breach when all our all our biometric data for all our top security clearance Folk is out there somewhere with a you know, with a bad actor, probably a nation state, bad actor, people started to wake up. And so it’s time to catch up in the US. And, and therefore, I think it’s a really good thing. I think, generally, we’ve got to lean into it. And we’ve got to make sure this is this is really successful. You know, there’s there’s a lot of bad, I call them bad actors, the various individuals, people out there who are literally trying to steal your ID, my ID and organizations ID and that their personnel and their customers. Every day, we all know about it, we all read about we all live in fear of it, but it’s literally happening. And most of us have been breached through one or two of the the major consumer data breaches that have taken place. So, you know, I think it’s great that these things are coming into place. And but there’s clearly a need for a federal law, because it’s getting way too complicated for at a state level. And then with individual kind of sectorial requirements, whether it’s the legal sector, or manufacturing, or financial services, there’s just too much burden from a privacy policy perspective. And so we need to kind of raise the bar at a federal level so that it becomes easier for everybody to play on kind of an equal ground, if you like.

Greg Lambert 8:33
Is there something specific that you think? Is it just the unification of data privacy, that needs it to be on a federal level? Or is there something else that that the federal legislation or regulation can do that the 51 states and in Washington, DC can’t?

Peter Baumann 8:53
Well, I, Greg, um, I, I kind of look at it from a customer’s perspective. And if you’re a customer, and you’re trading internationally, and you’re trading with a certain sector, what on earth are you supposed to do you want, you end up going for the highest bar, and currently the highest bar is, God forbid, and EU regulation. And so you have your us, you know, global businesses seeking to comply with a GDPR. And and then largely speaking, US state regulations playing catch up. And obviously, we know that California is kind of either on equal par or very close or overtaken imparts GDPR. But we have to put us businesses on a fair footing with the rest of the world. And I think what organizations are asking for is one regulation. They can build all their internal processes, methodologies, and Tech Tech stacks to comply with and that’s why I think that that just has to happen just purely from a commercial perspective, even before you get into all the other data privacy rights and personal views on this and stuff. It’s just it’s just getting too difficult for business and organizations to comply.

Marlene Gebauer 10:08
Yeah, one regulation to rule them off?

Peter Baumann 10:12
I think so. I think so. And you know, I remember in the early days of GDPR, and I’m sure you guys do too, there was a lot of press out there that said, Oh, the US doesn’t need to comply with this. US won’t bother complying with this. Oh, it’s only if we have European citizen folk in certain places. And we’ve all seen what’s happened, everyone’s got caught. And these things, people travel, I’ve traveled and knew, what do I cover on the European? I’ve worked in the US. People have got my data, you know, what law do we need to govern? Peter Baumann on. And there’s millions and millions and millions of people like me, even before you start talking about people crossing states in the US?

Greg Lambert 10:53
Ya know, again, we don’t necessarily like regulation in the US, especially businesses. But what are the risks to companies that don’t prepare for all of these new data protection policies that hopefully are coming out?

Peter Baumann 11:10
It’s probably a multi part answer, I’m afraid to say actually, Greg, you’ve got general risks of just not being proactive with your data protection. So if those bad actors, those hackers, if you like, infiltrate your organization network, undetected via an unstructured data source, there’s nothing stopping them getting hold of confidential client data and matter files, you know, it’s all there. It’s all there for the taking. Once you’ve broken into those networks, unless those files are appropriately governed and locked down encrypted. We know that’s not always the case. And of course, when this happens, it’s going to lead to a bunch of things, you’re going to have a loss of confidentiality, trust, data protection is tantamount to the legal industry, law firms are either full of money and sensitive private information by their very nature, or they have the keys to the money and private and sensitive information. And so you just think about the kind of reputational risk around that, we impart some of our most sensitive information to law firms, and we empower them, we trust them. And if they’re breached, for sloppy, or for malpractice, or just for not having the right process in place, it’s not going to look good. So reputation is everything. Second, I think, specifically related to legislation is the monetary risk. And so these breaches, they come with great cost. And that cost is a combination of the privacy cost. And so you’re you’re in breach of whatever regulation is that you’re looking to comply with. And you’re going to be here. And we all know what the GDPR ones are. But let’s be realistic about this, it’s the US penalties that are going to be harder than GDPR. GDPR is largely focused on large organizations, it hasn’t really demonstrated its teeth yet. We know given the US it’s a far more litigious society and economy. And therefore, there’s going to be far more litigation around this. And so it’s gonna be extremely costly in terms of penalties. And then finally, you’ve got a general risk, large volumes of unstructured data, prevent organizations from just doing their job. And I know both of my colleagues here today, Greg and Marlene, your roots are in information management, knowledge management, information governance, and you can’t run a business if you’re kind of in a swamp if you can’t see the good data for the bad data. And ultimately, if you can’t derive value, we talk a lot about risk. And this, this podcast can be about risk. It’s one side of the pinwheel, the other side of the pinwheel is value. And the reason we don’t talk about value, specifically on unstructured teachers, because we can’t get through the risk. And so those three things coming at you got you got reputation, you’ve got the penalties and costs of either breach or non compliance. And then ultimately, you can’t really run your business correctly with the exponential growth of data. If you’re not do anything on value.

Marlene Gebauer 14:15
I’m glad you hit on the value because I think we’re going to touch on that a little bit later. In the podcasts are the balance between the value and compliance. But, you know, we were talking a little bit earlier about all of the various state takes on on privacy. So like How should companies specifically legal teams, you know, whether they’re their in house or outside counsel, how should they prepare to be in compliance with so many unique policies, you know, what steps should they take?

Peter Baumann 14:46
I love that. I know that’s not a setup question. I mean, I love it. And for the same reasons, we say you need to have a bar, the appropriate highest bar in terms of privacy regulation, such as GDPR today. You also need to Set the same bar on your data. And ultimately, this is all about the data. And you can’t protect what you don’t know. And so if all organizations, if all legal practices knew exactly what data they had in their networks and their systems, they could then manage it appropriate through its appropriate lifecycle, which means that regardless of what regulation they have to comply with, they’ve got some chance of meeting it. And that’s just not the case today. So the first argument we would always make is know what you’ve got to do that you need to do something called a data inventory. In the past, it was called an audit. Mapping has been used as a term. But that can be a little bit misleading in this in particularly the cyber kind of market. And so we talked about a data inventory, and you need to have an inventory of all of your data assets. That’s where that’s where I would start

Greg Lambert 15:57
talking a little bit more about what you mean by data inventory. And I’ve heard it referred to I think we’ve even had guests on here talking about it as a single source of truth. And what that is, and you know, why it is that organizations should take a data inventory? And, and even kind of broader than that, what what kind of repository should they be looking at?

Marlene Gebauer 16:20
And how should they do it? Like, how do you even start?

Peter Baumann 16:23
-It’s top secret. It can be overwhelming. And, and so, you know, you’ll have heard the expression that we talked about, you know, a single source of truth, a single pane of glass, don’t boil the ocean in one go, these are all, you know, I’ve always I’ve been in this space for so long. Now, I think they, those those kind of expressions derived from the morass of unstructured data, and then hear them and hear them again, it’s like the 80% of your data is unstructured. And these things you don’t even know if they’re true anymore. Just everyone talks about it. By the way, I can talk about that a little bit later. But historically, and by historically, I mean, over the last 10-15 years, when people said they need to do you know, data audit to data inventory, what they do is they bring in, possibly their lawyers know, so lawyers may use their own privacy advisory council to come in and do it. Or they may bring others in. Equally, they might bring consultants in those could be information governance, consultants, and similar and there’s lots and lots of them out there in the market. And what they do is they generally interview the business, they’ll, they’ll get subject matter experts, they’ll get some heads, and they’ll look at what the flow of the data is where the most important and then interview them and then fill out what are commonly known as surveys. It’s it’s not a bad thing, in that it gives you a relatively high level, quick and relatively cheap view of what data you’ve got the problem with it, it’s immediately out of date, the moment that that exercise have taken place, the moment that white boards have been wiped clean, that data is old. You’re also at the behest of the human memory. If someone says to me, Peter, what are the important documents you’ve worked on? Just generally in your job? Oh, well, I had to write a board report yesterday. I remember that because I was on it late and it was stressful. What did you do before that? Right, I think I think I did some kind of proposal. I think there’s a proposal, what did you do nine months ago? And what did you do two years ago, that was really important and still has IP value to the business. But is it within within retention is good? You haven’t got a clue? And what about the guy that left that you took over from six months ago? What did his data look like? And so those things are, I would say they’re nearly useless. A day passed. They’re kind of, you know, production date, if you like. And so for us, a data inventory is actually looking at the data itself. And if this is this is kind of overwhelming thought. An average organization law firms say 500-750, people are going to have somewhere in the area of probably 30 to 75, maybe more, but 30-75 terabytes of data. If you take an average kind of par ratio, one to one, gigabytes, the documents etc kilobytes of documents and build it up. For every terabyte, you’ve got around a million files and documents. So you’re talking about somewhere in the 50 million plus, there’s no way you can do that without technology. But the technology has to look at all of them. And what it has to do is it has to do it very quickly. And it has to go through a triage process so that you’re not wasting time. And within a matter of hours or days, you’ve got an inventory of those data assets. Then where it gets really cool is you then map that against the survey process you went through. And then for those Have those kind of knowledge management buffs on the call, you start doing clever things like file schema of our plans, taxonomies. And you really start to understand what your data is. The key thing is, you can keep running the data inventory. And so it’s a single pane of glass. That’s always current. And that’s, that’s a true inventory. And there’s very, very few people that I think really understand this in the market. And there’s very few organizations that have really done a data inventory and all their assets.

Marlene Gebauer 20:29
What would the cost of something like that be? Because I imagine that’s both both in time and money? I imagine that is those are a couple of the hurdles that prevent organizations from from doing exactly what you said.

Peter Baumann 20:45
I think, Molly, I think until not that long ago, it was quite prohibitive in absence of a really good compelling event. And those kind of compel events. Normally, in recent times, we’ve been on breach. So once somebody’s been breached, they have no choice to have to go in and sorted out, they may have a consent order. And if they don’t, their business is shut down. Therefore, they don’t really care what the cost is.

Marlene Gebauer 21:05
There’s an accident at the crossroads. And we have to put us we have to put a traffic light in then. That’s

Peter Baumann 21:11
the more common reasons people acted were m&a, divestitures, acquisitions, and you’ve got to split the data, it could be also under some kind of governmental order, depending on the nature of the content through two simple things like migration program. So you’re moving from this repository to this repository. We don’t want to fill our shiny new Ferrari up with horrible dirty diesel. And and so that’s a good reason to go through the process. What we found in the last year or two, and I’d like to say that we’re kind of we’re at the front of that discussion, is the prices have come down dramatically. So 10, 14 years ago, when we set this business up, you know, a lot of data was two or three terabytes that was like, Oh, my gosh, how are we going to handle that? We now have customers in the petabytes. And so that price point is clearly got a Moore’s Law effect to it. And it’s come down dramatically. And it used to be about storage costs. It’s not about storage blobs, and how much does it cost to pull this data back down, or in case of a litigation requirements and stuff. So you’ll pay that off many times over within the first year,

Greg Lambert 22:18
and are you finding companies are much more comfortable with using an either cloud or a hybrid instead of doing everything on prem? Because I remember back in the day having to literally mail or FedEx, you know, hard drives from one office to another because we didn’t trust it or couldn’t handle that much information in the cloud at that time?

Peter Baumann 22:41
Well, great, we still, we still experienced that we have some customers that still either put us in a certain room, and we have to install on prem, or they will send us in a certain way, and we put it in our safe room and do the work locally. So that does happen. But those are very much the exceptions rather than the rule. The vast rule is because the nature of this data, it has been until fairly recently on prem, because people want to respect their security proxies protocols already in place. And also, I think more so than that is a lot of data. And if you’re going to do anything like textual analytics on that data, you’ve got to be really close to it. And so it’s only with the advent of customers putting their data in the clouds, and then it provides the solution to put your analytical your algorithms next to it in the cloud. And so it’s more about the law of physics and how fast the pipe than it is actually about a willingness. And I think we’ve all we all know, we’ve all seen in our personal lives and our professional lives, that very little hasn’t now moved to the cloud or won’t move to the cloud. And so we’ve basically followed that path, we had a few false starts, when we build our cloud product, we launched our cloud product just at the back end of last year. And until then all our revenue and all our customers are on prem. And so now we have some customers that are looking at both solutions, because ultimately they’re going to migrate across, we have some that are only going to cloud. And then we have some for various reasons, particularly in federal government to who still wants to sit within a cloud environment. Within legal space, it’s a little bit of a mix still. And they’re a bit more conservative, as we know, by nature. So it really depends where their data is, but then might but but the Microsoft stack is driving them into the cloud heart. And so often we’re finding that we just piggyback that story.

Marlene Gebauer 24:31
What sort of repositories should organizations consider in terms of housing the data? And the other question is, so you’ve gone through the evaluation and you figured out the valuable versus the not valid, but what happens to the stuff that’s not valuable?

Peter Baumann 24:46
Okay, so two questions. First one, where is it? Well, it’s, it’s unstructured data, right? It’s everywhere. What you’re going to get the 80/20 gains very quickly is going to be remarkably After doing this for 14 years, they’re still in file shares. The server, the file share people call them, it’s remarkable. But that’s still a thing. And it demonstrates that people never really completed on these exercises. And so there’s a lot of risk there and you’re going to be quick bang for buck there, then it starts to become a little bit more nuanced. Because SharePoint became so prolific, and people moved from certain types of enterprise content management systems and thought SharePoint was the solution. We know SharePoint just made the problem worse, generally speaking, SharePoint is often right at the top of the list. And then and then it’s very much dependent on the industry or the organization. And in within the legal space, you’ll find the Microsoft stack is second year at SharePoint as part of that. But the broader Microsoft stack teams exchange exchange, a lot of lawyers, you know, the they’ve not very good with retention schedules. And so they have an infinite number of emails in their exchange servers. So that’s a classic use case, go go clean that data, obviously, teams slack, depending on the organization again, and then they may have well, in the legal sector, you’re going to have things like was it net docks, I manage. And they are obviously put in place to manage the data and ensure that it’s correctly labeled and meets retention schedules, etc, etc. But we all know there are lines on the end user submitting that data, and end users don’t like adding metadata, let’s let’s you know, face up to it well, doesn’t matter what we do, they don’t want to do it. And so you’ll find a lot of poor quality information, unfortunately, in those kind of proprietary repositories, too. And so those will be the biggest hits, and obviously, cloud and whether it’s cloud, Microsoft as your 365, office docs, or whether it’s other Google, Dropbox, and another, it’s gonna be all over those places. Anyway, you shuffle your unstructured data, the way you go through it. And my gosh, emmalin, we’ve learned this over the years, we went through a lot of painful conversations where we’d find really bad data that customers and they just went delete it, and did a defensible destruction and all the legal terms are caving in. And it would be it’s a backup of a copy. And it’s outside of retention schedule. And it’s only accessible through an application that’s no longer in your policy, I think we’re perfect or something. Yeah, but we can’t delete it, we might need it, or I don’t want to be the person that deletes it. So we came, we came up, I’m sure it wasn’t just that, but we came up with workarounds. And one of the workarounds there is you identify that information, and there’s a trade off process, we won’t have time to go through it today. But there’s a trade off process where you, you get the easy wins, and you work your way down until you really had to do the text analytics on the final subset of data. But you take those slam dunk, because we’ve done billions and billions and billions of files, and therefore the risk is so low that they’re wrong. And you put those into quarantine. And if nobody screens in a year, you hit the delete button, where you have resistance, you allow the end user to claw the data back. But with a form, they have to fill in a fee, and no one ever does it. So I’ve been a little bit, you know, flippant in my response.

Peter Baumann 28:33
And then, and then as you get into the more interest in data, you know, you get into sensitive data you get into IPR, you get into whatever the business is, in the world of law, obviously, Case Files matters, is it on hold is off hold, etc, etc. Then you just build different rules and requirements around it and triage the data separately. And obviously, you got to spend a little bit more time and therefore money because it’s got more value associated with it.

Marlene Gebauer 29:01
So I know you have spoken about dark data. Tell us a bit about dark data. What is its relationship to unstructured unstructured data? You know, what sort of risks does dark data posed organizations and how should we reduce that risk?

Peter Baumann 29:16
Now? I’m glad I’m glad you bring that up. Because I should have mentioned at the front of the call. So I’ll strapline is zero dark data, first of all, and what do we mean by that we want to take our customers to a position of Zero Dark data. And that comes right back to the front of our conversation. If you have no dark data in your organization, oh my gosh, you’re in a good place, you’re going to meet all those regulatory requirements. If you are hacked, the you got a smaller footprint, you know, the threat attack footprint is far, far smaller. And you’re much more likely to have the right process Lifecycle Management potential schedules, etc on it. So from start data, it could be within structured but we’re all focused about unstructured it’s largely unstructured data. It’s all those things that you just don’t want out there. A really simple example are things like password files. And so some of the most kind of well known breaches, the bad actor gets into a network. And this is publicly available through Ponemon Institute’s. Now, on average, around 300 days in the network before they’re discovered. They’re not looking for your iPad and the flat screen TV, they’re looking for the juicy stuff, and the juicy, they’re looking for your crown jewels, and the crown jewels mean different things, different organizations. One of them, which is dark data are password files that haven’t been correctly disposed off, locked down. And not for malicious reasons. Somebody is moving load data IT guy just put it there, delete it later, forgot to delete it Excel spreadsheet, Word document stuff or with password files, they find those files and hey, presto. And, you know, there’s no shortage of examples of where that’s happened. And so for us a password far be a good example of dark data that’s out there. Nevermind, PII and, and all the other kinds of stuff. So as far as unstructured versus structured, well, you know, when I get asked this, I always do get bit flippant, I say, Well, it’s kind of in the name stupid. It’s remarkable how most of the market is still focused on spending a lot of dollars and a lot of time sorting this structured data out to what they know. It’s what they know. Thank you, Greg, why is that? Well, it’s because what they know, is because this, they know what to do with it. And because it’s structured, and it’s where they feel and get the biggest bang for the buck. Now, in fairness, there’s a view that they probably got some of their most valuable information, because they’ve gone to the energy and effort and cost to put it into a structured database, for example. So they the expectations is going to be quite valuable. But that’s not always the case. It’s largely, I think, and I, I do have a self interest here, I think it’s largely because the other problem, the elephant in the room is so great, and they literally don’t know what to do and how to start with it. And they don’t think there’s easy ways to deal with it. And so let’s just focus on structured for now. And I know, I’ll get a lot of a lot of angry responses to that. But that’s our experience. Structured Data largely sits in a database environment is going to have some columns and some labels and think Oracle, think, you know, SQL servers, etc, etc. And you should have the ability through your own SQL queries and search capability to find most of that data. That’s the beauty of being structured, unstructured, has no obvious framework associated with it. And as per our earlier conversation, it can sit in all kinds of different repositories. Some of them may be semi structured like Salesforce, or some of those content management systems. But even there, they’re largely unstructured because they’re free text files. And so I like to think of unstructured being more about comp data that’s associated with humans conversations. And certainly, from a textual point of view, you know, it’s the customer support desk. It’s the litigation file. It’s the slack chat. It’s where you have a human interaction. And of course, you can then add to it, rich media files, video, audio, imagery, etc. That’s also largely unstructured data, we talked a little earlier about the percentages. And forever. We’ve been saying 80% of an organization’s data is unstructured. And I was at the RSA conference last week, and I was with the data analyst. I won’t make mention and be unfair, I think at this point. But is that still real? What does it actually come from? Is it just one of those, like, what’s the word I’m looking for? Like an old wives tale? You know, it’s just gone round and round in the market. And everyone now believes it. And the individual said to me, Well, we’ve actually just recently done some research. And and they’ve got it’s urban legend, an urban legend, over myth. That’s right, as I was looking for, and it’s actually much higher than he

Marlene Gebauer 34:02
must have gotten more since then. Right. Yeah,

Peter Baumann 34:05
that’s high. And so what we don’t know. And I put a plea out to market analysts, researchers, what we don’t know is, what’s the correlation or propensity of being hacked in your unstructured versus your structured and the risk of the dangerous stuff being found in one of those datasets? I don’t think enough research has been done on that yet. But we know that most of the dark data, the unknown data is in unstructured. So the guess is it’s there, but there’s not empirical evidence to support that just yet.

Greg Lambert 34:37
So what about data protection assessment in? How is it that say a legal department should identify their company’s data risk?

Peter Baumann 34:46
Yeah, I think GDPR is a good way to try and answer this Greg. And when you talk about a data impact assessment, data protection impact assessment is a DPI a for trying to make our lives easier and I shot acronym. And then give you an example when when you if you’re following GDPR rules to the letter, when you create a new project, that’s likely to involve some high risk sensitive data, PII, then you need to go through a DPI a process, what is that? I’ll just rattle through some of the the main bullets if you like. So who in your organization is going to be involved in that process? The specific risky data that they’ll be handling the rationale for why they need this data. And this is really I love this one, the rationale, why do you need that? Well, because it’s there. Now, that’s not a good enough reason. Because if it’s there, and you merge it with other data, you can start to create identities, of course. So the rationale is really important. Which of your business processes will it be part of, and the specific risks to the rights and freedoms of the data subjects, you know, as associated with the risky data? And then obviously, what what tools you’re going to use, etc, etc. And so then, who who’s responsible for that? Well, one of the things GDPR did, rightly or wrongly, was it created 50,000 TPS, it created this this data protection officer function, which the US is, I think, largely been frowning on do we need one? Not sure? Well, we have one in Europe and, and under the regulations in the US. Largely, they’re falling under the CPO, if I may, again, I’m not attorney, but they’re largely privacy attorney, they’re largely fallen under the CEO function and there but there are depot’s in the US as well and plenty of them. And so that person is an expert when it comes to complying with GDPR. And so again, you know, how do you how on earth do you deal with all this stuff? Well, you’ve got to know what the data is that you’ve got. And you’ve got to do effectively a data inventory. So every time you do one of these exercises, projects involving a lot of sensitive information, personally identifiable information, you need to have at source what the data is about, and it needs to be in the correct places. And so this is this is a headache. And it comes back to our earlier point started at ground 0101. Understand what the data is, and then all these other processes should be a little easier. Does that help money?

Marlene Gebauer 37:21
Yes. And in fact, it’s it’s interesting, you touched upon the the evaluation of need for a CPO. So it segues nicely into my next question, you know, I’m sure you’ve heard, often IT teams want to control all things security related. But this is really concerned for legal teams as well. So what do you see as the role of the legal department versus it are outside consultants and privacy compliance.

Peter Baumann 37:49
So it’s, it’s forcing people to work together across the organization,

Greg Lambert 37:56
once we found our problem.

Peter Baumann 38:01
But as we also know, not quickly enough, it’s also creating new silos and power battles in most organizations. Maybe if I start there first, I think one of the interesting things is a my respect to any records managers on the call on the podcast, who really understands the data, in your average, you know, Western organization, a records manager, they they’ve got decades and decades, centuries of experience in how to manage data, largely it was physical data, and moved into the world of digital and became totally out of control. But those are the people that really understand data. And they can make the biggest gains and the data. Who are the people driving this discussion? It’s a mixture of legal as in legal counsel, General Counsel, see, so sick security, privacy compliance, a little bit of it? And how often do you see records managers in the same room, when they’re devising the plan? And strategy is very rare. Yeah. It’s very, very, very rare. And it’s really sad, you know, a lot of our our first half of our life ActiveNave was really dealing with records managers, you know, experts in the field of data, thanks to the data scientists, you know, they understand how data works. In the last few years, it’s largely been dealing with people who have a lot of power and authority and, you know, trying to meet certain rules, regulations, requirements, but with very little understanding of the data and how the users and the business use that data. And so I kind of start there, because it demonstrates that we’re not there yet. We’ve got to get those people together. And when you do get them together, it’s fantastic. It’ll, you know, you’ve got the right discussion in the room, and the right, there’s chance of success

Greg Lambert 39:58
and Peter so You know, you’ve talked about the CPOs and the DPOs, these these officer level jobs that are out there. And then you talk about all of the intelligence that’s in your records department, in this area, are those two completely different entities within within an organization? Because I can tell you I’ve, you know, having been on old records manager myself, there’s the technology part of it. And then there’s the records, keeping in the process part of it, and very rarely do those two cross. So how do you how do you advise that you take the intelligence from your records folks, and apply it to these, you know, high level jobs of DPO/CPO?

Peter Baumann 40:50
So it’s not always easy, Greg, I guess we’re blessed because we’ve been doing it long enough that the software actually does some of it. It does that that that transition, because we’ve already got all that knowledge and experience from the records groups on retention schedules, for example. That’s, that’s kind of inherently built into our thinking and our software, which means that if they’re not in the room, it comes up as as a feature or a discussion or possibility capability. However, you still need to make sure that you’re applying the appropriate policies and retention schedules alike to that data. So I think I think it’s fair to say that generates frustrating. And it, it needs to change for the market really, to get a grip of this. And they need to bring people in that understand the data and have been managing worrying about just the data and how users interact with it. And then what’s wonderful, of course, for that community for the records manager, they now have people who care and are interested at an executive level all the way up to, you know, board members non exec, there’ll be people responsible from ESG, governance, cyber, and they now have a voice, and they should have a voice at the top table. So I think I think it’s for both parties to recognize the value of each other and in the relevant to knowledge they have. And what’s remarkable is to really show how crazy it is sometimes we can do deals in either side of a business, you know, I’m not saying we do two deals in the same business. But one day, we could do a deal with, you know, customer a on the record signing, and the next day with Customer B purely on a privacy compliance cyber side, and the two should have a meet. And and that’s just, that’s just crazy, because the benefits clearly across. Now, it’s a little bit more complex and nuanced than that, because both parties have familiarity with different technology and tools. And so if you’re talking to a cyber person, they’re gonna be very familiar with cyber tools that will reach into some discovery capability. If you talk to records that we’re very familiar with records and classification data classification tools that by default at Discovery, but won’t have some of the cyber components. And so you can you can see why sometimes that gets a little bit difficult, because customers don’t want 50 tools doing similar things. And they’re trying to trying to find out which one is the right solution purpose built to solve their problem. And that that is a genuine issue that I think the market faces.

Greg Lambert 43:27
Yeah, one of the things that you had mentioned earlier still kind of is sticking in my brain right now. And that is to understand why it is that you’re, you’re you’re doing they’re creating the data. And it almost reminds me of we had a guest on a couple of weeks ago that talked about court forms, and that you know, there’s this process. But if you don’t know what you’re doing, you’re going to be creating things or exposing things that you don’t have to expose. And it may actually put you in a higher risk, just because you’re following the steps. But you don’t know why it is that you’re doing. You’re not asking the right questions there. So that I think that’s something that a lot of us don’t think about. And I think that the Chiefs come with it. There’s certain knowledge about why things are and aren’t exposed. So do the records, folks. So, you know, the more collaboration I think you have on this the, especially when you talk about the why, why would you expose that information or create that information?

Peter Baumann 44:29
I think that’s a great question, isn’t it? I think you’re right, Greg is is the question to us. Well, why do you need that information? And do you really need it and that that

Greg Lambert 44:38
helped because the checklist says we need to? We’ve always done it this way,

Marlene Gebauer 44:42
because I might need it someday like to drill. Yeah, I hate that one. How do you explain the relationship between innovation and information security? Do you see them in opposition to one another or do they work in tandem? And I’m thinking You know, in terms of both data, as well as any other types of innovation, I mean, if you’re bringing in new technology, for example, or a new workflow process, do they work together? Or are they kind of at each other?

Peter Baumann 45:15
I think, Jen, they need to work together, because particularly they work together early on in a program, then they’ve got some chance of innovating and changing and making things for the better. I think the problem is, you’ve got a bit of an oil and water issue going on there. Because one is just worried about the perimeters and security and locking stuff down. And the other one is worried about the nuance of the data and meeting certain rules and policies and regulations. And so by default, you’ve got a slightly different interest coming into that discussion. Now, I think I’m an eternal optimist, which for my, from my, my floors, if you like, we’re very early in the world of data privacy, we’re quite early in the world of data security, you know, and when we started this business 14 years ago, I never thought we would be doing this thing called post breach data analysis, discovery, long tail of post breach, to meet a privacy regulation that was like most that, you know, it was more about storage savings, because they were very expensive. And so you can see how the market so we’re very early in our journey at the toddler stage, really in our maturity of the market. And what’s happening is we as individuals are learning about the risks of privacy, and ID and theft and all the rest, that then filters into the workplace. So now we have everybody kind of aware of it. And we’re all going through security training on a very regular basis to avoid ransomware and phishing, and what have you. So everyone’s becoming more educated, which I think bubbles up eventually, to these strategic type discussions. I think you’re referring to Marlene, when there are opportunities to put these smart people together and reinvent the way that we lock the doors down or keep the doors sufficiently open. And so I think it’s, it’s beginning to happen, we see it because those people are now around the same table. I just saw it last week at RSA knew what was always a Cyber Conference over in San Francisco, it’s now probably 20% privacy, and they weren’t in the room four years ago. And now they’re there. So there’s a recognition, it’s the data stupid, because we’re gonna get hacked. And so there is a there is a relationship developing there. So I think maybe that answers your question. I think the reality is that it doesn’t happen that often. But I think it will more in the future. Or Peter,

Greg Lambert 47:40
we usually ask all of our guests, what do we call our crystal ball question. And so I’m going to, we’re going to ask you to pull out your own crystal ball and kind of peer into the future for us, say the next two to five years. And let us know what what you’re seeing as far as data security and what you think the future holds for us.

Peter Baumann 48:02
Yeah, and again, at the risk of being self serving, but I think it has to be the the world of structured data is beginning to get its act together. For reasons we discussed earlier, I already thought it had its act together. But apparently it didn’t. And it’s had to go through a whole, you know, reinvention, that’s largely either underway, happened, or is about to complete and be maintained. Looking out until we control our unstructured data, we’re never going to solve this problem, we’re always going to be breached, roads gonna be hacked, there’s always gonna be somebody smarter than the guys trying to keep us to keep them out. And which as we just discussed, 80 90% of that data is unstructured. Your dark data predominantly resides in unstructured. And so we have to be moving the market towards solving that problem. And what I love them when I really get excited is back to that pinwheel. Once you’ve got the data under control, you can start to leverage the data to actually drive what you care about. If you’re a charity, if you’re a law firm, if you know adding value back to your customers, giving money back to the charity to the actual charitable cause, rather than burning it up in management costs, if you’re a bank, you know, new financial instruments that you didn’t realize the market was asking you for because you never managed to analyze the data correctly. And so I think that’s the real crystal ball. That’s the innovation. And you know, I think, hopefully, I’ll be long retired, because it will happen quickly, and I’ll be retired. But it may still be 1020 years out. But ultimately, it’s no dark data in your systems. And if it is there, it’s immediately found because you have the right tools to catch it. And and then it’s leveraging the data for the reason you have the data and nothing else.

Greg Lambert 49:47
All right. Well, Peter Baumann from ActiveNav, thank you very much for coming in and doing this whole, whole interview whole show on on data security is something that we’ve been Want to do for a while? Thanks, Peter. Thank you. Well, we definitely put the geek in The Geek in Review this week, and so extra geeky that I mean, you look at you look at IT departments anymore. And it’s almost like, there used to be take care of the computers take care of the network. Now, it’s, it’s almost like a third, or maybe half of their job is security. So, I mean, this is just the nature of the beast now.

Marlene Gebauer 50:27
Yeah, it’s pretty much every decision you make is, you know, has security concerns in it. I mean, if we’re looking at buying, you know, a resource, there’s always security that you know, we have to go through and steps you have to check to make sure that that’s going to be in line with, with what you need to do. But it touches all of us, you know, whether you’re in that specific space or not. We all have security training every year about what to do and what not to do. This, I think is critical conversation, and very, very timely.

Greg Lambert 51:01
Yeah, I agree. So thanks again to Peter Baumann, CEO and co founder at ActiveMav, for taking the time to geek out with us and talk security.

Marlene Gebauer 51:11
And of course, thanks to all of you for geeking out with us and taking the time to listen to The Geek in Review podcast. If you enjoy the show, share it with a colleague. We’d love to hear from you. So reach out to us on social media. I can be found at @gebauerm on Twitter,

Greg Lambert 51:25
And I can be reached @glambert on Twitter.

Marlene Gebauer 51:29
Or you can leave us a voicemail on The Geek in Review Hotline at 713-487-7270. And as always, the music you hear is from Jerry David DeCicca Thank you Jerry.

Greg Lambert 51:40
Jerry was geeking out with us too. I’m sure

Marlene Gebauer 51:42
He Was.

Greg Lambert 51:43
All right. Marlene, I will talk to you later.

Marlene Gebauer 51:46
Okay, bye bye.