In order to measure what matters, it is important to have the data available to help. Sarah Lin is the Information Architect & Digital Librarian at RStudio, PBC, and is also a law librarian. RStudio wanted someone to help them manage their digital morass and to Marie Kondo their digital information. Is there anyone better than a law librarian with some tech skills to do just that?
Sarah discusses what the R Programming language does, and how she got interested in the profession of statistical computing. While some may not see a direct link between being a law librarian and an R programmer, there are actually a number of skills librarians possess which make them well suited for data analytics. One skill is our ability to understand, clean, and organize information. For RStudios, the Chief Scientist, Hadley Wickam created Tidyverse which helps in handling the clean data tasks. And there are also resources like to help organize. Throw in a law librarian to have it all make sense and tell a story and you have a fantastic combination of skills and tools. And we should mention that it is free open-source software.

Listen on mobile platforms:  Apple Podcasts LogoApple Podcasts | Overcast LogoOvercast | Spotify LogoSpotify
To learn more about the R language check out:
Information Inspirations
Roy Sexton from Clark Hill lays out what law firm marketing does as opposed to what law firm business development does in the latest episode of Steve Fretzin’s Be That Lawyer. Roy’s advice of the “Rule of Three” when it comes to promoting yourself and your marketing products makes this a must-listen episode.
Adam Smith, Esq. covers the new initiative by our friend Phil Flora and Leopard Solutions on ranking law firms by their vitality and resilience, not just once a year, but in real-time.
Feeling the effects of COVID, the election, the environment, or the hundred other stressors in your life? Maybe take Prof. Eric Janssen’s advice and put down your phone and go for a walk.
Did you know there was a Pirate who was a 17th Century Anthony Bourdain? Marlene teaches Greg about this culinary outlaw and also teaches him about breadfruit.

Listen, Subscribe, Comment
Please take the time to rate and review us on Apple Podcast. Contact us anytime by tweeting us at @gebauerm or @glambert. Or, you can call The Geek in Review hotline at 713-487-7270 and leave us a message. You can email us at As always, the great music you hear on the podcast is from Jerry David DeCicca.

Marlene Gebauer:  Welcome to the geek and review. The podcast focused on innovative and creative ideas in the legal industry. I’m Marvin Gaye Bauer

Greg Lambert:  And I’m Greg Lambert.

Marlene Gebauer:  Well, Greg, I’m enjoying pumpkin spice everything and baseball. And I hope to get out and do some cool weather kayaking soon.

Greg Lambert:  I don’t know about being on the water while it’s cool, but you know, to each their own. And I will tell you that so far, I think I’ve avoided pretty much anything that’s pumpkin spice flavored so well that’s

Marlene Gebauer:  Yeah, that’s amazing.

Greg Lambert:  It is true so far, so far.

Marlene Gebauer:  We got a few more weeks.

Greg Lambert:  we have a great conversation coming up with RStudios Sara Lynn. But first, let’s go ahead and jump into this week’s information inspirations.

Greg Lambert:  Alright, Marlene, so Steve Fretzin has this great podcast on business development coaching in the legal industry. And it’s called, Be That Lawyer. That don’t be this lawyer, be that lawyer,

Marlene Gebauer:  Be that lawyer!

Greg Lambert:  Exactly. Well, this week, he has Roy Sexton join him from Clark Hill in Detroit. And for anyone who doesn’t know Roy, I really encourage you to at least follow him on LinkedIn. If you’re really energetic, you’ll follow him on Facebook and all the other ones as well. Roy is just a really great person who’s always willing to share his ideas with others, and is also a pretty good sounding board if you ever have a crazy idea that you want to run by an expert before moving forward. So on the show, Roy does this great job of differentiating law firm marketing and law firm business development responsibilities, which, you know, some people don’t know that those are two different tasks. But they are. One of the topics that Roy hits on Steve’s show is what he calls the Rule of Three. And I found this really interesting. So it’s designed to help lawyers understand how to promote themselves and their work. But quite frankly, it really applies well beyond just lawyers. So, you know, that’s just one bit of wisdom that’s shared on this Be That Lawyer episode. So go check it out and learn more from an expert.

Marlene Gebauer:  Well, Greg, if you’re weary of looking at COVID Health indices, there’s a new

Greg Lambert:  We all are.

Marlene Gebauer:  We all are. There’s a new law firm Health Index published by Leopard Solutions that’s out. Now you may remember our friend Phil Flora from leopard was on the podcast a couple of weeks ago talking about their new law firm diversity tool. It seems Leopard has taken a very different approach to law firm rankings than traditional reports. It’s an evidence-based evaluation and looks at data sets that are very different than things like gross revenue. Leopard claims, its metrics provide a measure of a firm’s vitality and resilience. Data sets that go to the heart of a firm’s operational and financial hygiene. So what kind of example? Well, first of all, the past five-year record of growth decline in RPL, average lawyer tenure at a firm, and growth decline in lawyer headcount. Another point of interest is that the Leopard index is dynamic. So no waiting around until the new scorecard comes out next year.

Greg Lambert:  So you don’t have to wait till April, May to get the information. That’s good. That actually sounds really, really interesting.

Marlene Gebauer:  Well, they are going to be doing I know they’re going to be doing webinars about it. So I would check their website to see when what the schedule is for that.

Greg Lambert:  Okay. Well, in tying with your we’re tired of looking at COVID metrics. I don’t know if you’re seeing this, but I’m really seeing this in both my personal and professional life. But you know, people are getting stressed out right now.

Marlene Gebauer:  Really? Yeah, you’re kidding. That’s like the understatement of 2020.

Greg Lambert:  And I think part of it right now is not just COVID. But you know, we’re so close to the election that people are being, you know, ultra-cautious and are very, very stressed. But

Marlene Gebauer:  election, environmental, you know, everything,

Greg Lambert:  You name it, we’re stressed out about it. So, to counter that, I saw this great LinkedIn post from Eric Jannsen from the business school at Western University in Canada. And he was the middle of a Zoom class, and he made all of his students pull out their phones, place them on the table at their home, or wherever they were taking the zoom class. Move the camera. So It showed that the phone was there. And then he made them all get up and safely, go take a 10-minute walk outside. And I just thought it was a great idea. I, you know, I would love to do this with my team, you know, maybe the next time we have this, have a Zoom call, pull this out and go, Okay, everyone, pull your phones out, turn them off, put them here, put the camera on them. Now, go away for 10 minutes, leave your phone behind, don’t jump on the computer, go do something. So sometimes we really need to put self-care on our calendars. Put it in there so that we make sure that we allocate some time for ourselves, both for our mental and our physical well being. So thanks to Professor Jann sen for this idea, I think it’s great.

Marlene Gebauer:  Yeah, I love the idea of the whole walk. It’s a great thing you know, just for exercise and also just to clear your head. So my last one is focused on linguistics, and I know Talk Like a Pirate Day has passed. But did you know that a pirate was the first person to pin a recipe for guacamole in English?

Greg Lambert:   Hmm. Okay.

Marlene Gebauer:  Well, he also gave us the English words for tortilla soy sauce and breadfruit.

Greg Lambert:  Before you go on, I need to know what I don’t know what breadfruit is.

Marlene Gebauer:  What. It’s the big giant fruit that you see in the grocery store. It’s huge. And it’s kind of like a greenish-yellow and

Greg Lambert:  It’s not dragon fruit?

Marlene Gebauer:  No dragon fruit is cute and pink.

Greg Lambert:  Well, I learned something today.

Marlene Gebauer:  There you go. All right, okay. William Dampier began his pirate life in 1679 in the Caribbean. He was arrested in Spain wrote a bestseller called a new voyage around the world, with him playing the role of Anthony Bourdain, eating all around the world. His talk of breadfruit sold the British on it and the ill-fated voyage to bring some back to England became the basis for the novel Mutiny on the Bounty. Dampier was eventually given a commission in the Navy, which She subsequently lost due to mismanagement of his ship, you know, not too unlike a pirate. But that’s the way it goes. And that wraps up this week’s information inspirations.

Greg Lambert:  Well, if you know what you’re doing data can help you tell a story.

Marlene Gebauer:  You know, even if you don’t know what you’re doing. Data can help you tell a story

Greg Lambert:  It may not be the right story. But

Marlene Gebauer:  It’ll be a story though.

Greg Lambert:  But it can help you identify trends, it can help departments show their value to their firms. Of course, the problem with data is finding the best tools to help you tell that story. But today’s guest works for RStudios, which is one of those tools which can help you do just that.


Greg Lambert:  Sarah Lynn is the information architect and digital librarian at RStudio PBC, which stands for public benefit corporation, and has worked for many years as a law librarian at Reed Smith. Recently, she conducted a webinar with AALL about data science and what it means for law librarians.

Marlene Gebauer:  Sarah, welcome to the geek interview.

Sarah Lin:  Thank you, Greg and Marlene, it’s great to be with you today.

Greg Lambert:  So we’ve talked data science on the show before, but let’s start off with a little explainer for everyone. Can you just let us know? What do you mean when we say data science?

Sarah Lin:  Sure. So I’ve actually seen data science described in a few different ways. But I conceive of it as three, three parts, three interrelated parts. The first is tidy data, then statistics and coding. So let me elaborate on those.

Greg Lambert:  Yeah, tell us what tidy means.

Sarah Lin:  So tidy, is actually it’s very much an R thing. One of our sort of prolific, R package creators is from New Zealand, and they use the word tidy down there to mean clean data, data that you’ve wrangled data that you’ve managed, all of those terms mean the same thing. But he’s sort of a rock star in the R world. And so we first things as tidy data.

Greg Lambert:  Yeah, I have to say, my wife, if I hear the word tidy, that means that I didn’t clean something, I tidy something. So

Sarah Lin:  And it’s a similar thing when you’ve got data that’s like absolutely a mess. There’s a lot of cleaning that you need to do before you can do any analysis. So yeah, tidy data is all the things that you have to do before you can get started with your analysis. And then you’ve got statistics, which is pretty, pretty self-explanatory. But I do want to emphasize that, you know, statistical analysis can actually get quite complex. And I don’t think that that’s necessarily a bad thing. I actually think that helps us tell better stories, more targeted stories by doing a better analysis. And then the third part is the coding and data science, you can do it in either R programming language, or with Python, professional data science teams use both languages, so it isn’t really out one is better than the other, they do have a different history. Python comes from the computer science world. Pythons use is. Many folks. Now, I’m sure lots of listeners are familiar with Python, it can be used with a lot of other things. R is a statistical programming language it was created by and for statisticians specifically for statistical analysis, the history is that it came out of Bell Labs, they created the S programming language, and R  is the open-source version of that.

Marlene Gebauer:  Tell us a little bit about what RStudio does.

Sarah Lin:  Yeah, so RStudio. Well, it is actually a little bit confusing sometimes, because our product is also called RStudio, and the company name will talk about the company, we have a really unique role, I think in the R programming community, as we make the open-source software that the vast majority of R programmers use. And data scientists, of course, are the primary user group, we also sell commercial offerings to a number of corporate clients that have a need for data management. So you can think about large financial companies, insurance companies, social media companies. And so we take the profits from the corporate side of the business. And we use those to pay for the open-source software developers and we call that a virtuous cycle so that those profits really come back and feed into the open-source community a lot. I would even say most open source projects don’t have paid software developers who are fixing bugs, developing new code libraries, expanding, you know, creating new software.

Greg Lambert:  So let’s kind of get back to you then. How did you get interested in data science?

Sarah Lin:  Well, life can take you funny places sometimes. And that’s, I think the best way, to sum up my transition. So I was working at a Silicon Valley startup last summer and was part of the first wave of layoffs, my whole team was cut. And so I was looking for a new position, one that I could still work as I had in that position 100% remotely and have a flexible schedule. So I could be home with my kids when they got out of school back in the day when we missed each other. And the ad for RStudio said that they wanted someone to help them manage their digital morass and to Marie Kondo their digital information. And I thought,

Greg Lambert:  Wow, that is, that’s quite an advertisement,

Sarah Lin:  Isn’t it? It was great. It really caught my eye was like, Well, I have to apply for this. I mean, my favorite thing is to make order out of chaos. So it seemed like a good fit. And the folks there are just such wonderful people so kind, that it just felt like a good fit.

Greg Lambert:  So what were some of the jobs you do?

Sarah Lin:  So I’m really focused on managing access to information, which I feel is sort of a just general knowledge management thing. I’m focused internally on staff working on things like our wiki or Google Drive File. Management. And then for our external users, we’ve got Well, actually, sometimes our own staff are using some of our external sites. But we’ve got our paying customers. And then we do serve a role in the general open-source R community, which is your, say a few million people who are coming to, to our online presence to learn more about R and so they need to find information as well. So on that end of things, I’m looking at all of our different web properties and trying to standardize and make the information more final.

Marlene Gebauer:  Obviously, you were originally a law librarian. I think you kind of talked about this a bit on the double a double l webinar. But what are some of the traits that you think law librarians and information professionals have that make them well suited to do data science?

Sarah Lin:   I actually think there’s a lot I gave a presentation much earlier this year that was directed specifically to technical services librarians because that’s that was my role when I was at Reed Smith doing Technical Services.

Greg Lambert:  I will tell you Technical Services librarians are well underappreciated. So it’s, yes. I mean, I always say that you know, they’re the original database architects.

Sarah Lin:  Absolutely. Absolutely. So I think but I don’t think it’s limited, although that presentation was, you know, how tech services librarians can be data scientists, I don’t think it’s limited. That was a tech services conference. So I think I think the application is really, really broad. But for law librarians and other informational legal information professionals, part of the reason that I defined data science in those three categories, is because I think that we already have a certain level of expertise, already in those three areas. And so I think that when you look to sort of learn a new skill, like data science, which can seem really, really big, it’s really helpful to sort of start from a place where you, you already have a certain level of proficiency. And so like examples of that, you know, I’m sure all of us have been working with wonky spreadsheets for years, decades, even, right, so that the concept forever. So that process of needing to tidy data, why you need to do it, what are the different what’s the importance of formatting columns, splitting cells, you know, formatting the cell values, like all of that stuff you’re really familiar with already. And then on the statistical side, we’ve been reporting again since time began. So there’s already there’s always been data collection, data visualizations, certainly have, you know, grown in popularity we’ve already been doing, creating them and consuming them for some time. And then on the, on the coding aspect. I mentioned this in the webinar, I just, I had this real breakthrough, once I finally realized that MARC is a language that you use to talk to computers. And that’s what coding is. And so once I found that that was, that was a huge breakthrough and made me feel confident because coding can be hard. So it’s, in some ways, you know, it’s really similar to MARC where you’ve got this syntax, you’ve got punctuation, you have to have certain things exactly right. And I think that can turn people off if they don’t have to use it for their job. But you just have to learn it. And those are the rules. If you forget the period, the whole thing will blow up. And that’s exactly what happens when you’re coding. So in some ways, you know, like, what makes us well suited? Well, we’ve already you know, we’re already halfway there. It’s not too far, I think, to really round out that skill set. But I also wanted to mention that, I think for legal information professionals, particularly law firm, librarians, corporate librarians, they’ve had to justify their value for a long time. And they’re really practiced at having to do that. So data science then can become for them a way to sharpen those tools. I’ve worked in, gosh, I’ve worked in a for-profit environment for I think my first professional position was an academic library. And then I sort of went for-profit and haven’t gone back. But I love the focus. I mean, I wasn’t well suited, I think, to academic life, because I wanted to be able to serve my users. And I wanted to be able to use the data that we had to change our policies, not because it matched a national standard, but because it was what our users needed. And I like that flexibility in a corporate law firm environment. But I think because we’ve had that focus on the bottom line, it means that the data collection, it isn’t just something routine that you do every week, every month every year in a corporate environment, you are using that data for a very particular purpose. You have to use it very well and you have to be ready for the purpose to just change on a dime and be able to kind of reanalyze and go with that.

Greg Lambert:  Let me ask you about one of the I think it was the second trait and that was the statistical information. I can see immediately talking to some librarians And as soon as you say, Oh, well, I wouldn’t even be librarians any, almost anyone in the legal field would say, look, there was the reason I went into law. And that was, so I didn’t have to take the statistics class. So how do you address that issue?

Sarah Lin:  Um, I may have said something similar in the past life, I just have to be real with people. I don’t think that flies anymore. Right? Especially in a law firm environment, you can’t ignore the numbers, the numbers will affect whether you have a job or not, right, you can’t keep your head under the sand for that. And indeed, the more you know, the better you can do your job. And what I meant. I mentioned again, what I said in the webinar, anybody who’s learned about the law, that’s a hard thing. That is a really hard thing. So if you’ve already done that hard thing, like you can do it. Someone asked me recently, like, Well, did you already know how to program when you start it at RStudio? And the answer is, I knew nothing. Absolutely nothing. But then I signed up for a machine learning workshop in December, I started in September, beginning of December, I did a machine learning workshop, I spent two months relearning linear algebra. And it was hard. But being able to understand how research databases work, like to understand machine learning, I think it’s changed my life, right? I’m definitely really excited about it. And just to know, like, how does that actually work? really valuable?

Greg Lambert:  Well, speaking of machine learning, I know there’s a lot of buzz words around this. And so you know, we were talking data science. But, you know, there’s a number of things that people have heard these buzzwords over the past decade, things like big data and machine learning, artificial intelligence, what’s the difference between data science and these other words that they may have heard?

Sarah Lin:  That’s a very, very good question. We’ll start with big data. So there is a library book published, I think about 2017. And it pretty much equates data science as big data. And that’s really misleading, I think, and also makes it really inaccessible. To think, well, I don’t have a processor to do millions of lines of data, I don’t have millions of lines of data. So I think it’s really important to understand that data science and big data are not the same things at all. We talked about the need for tidy data, right? And data science. And so the amount of data that you have that’s tidy, that’s that can vary. When you have data that cannot fit on your computer, that’s when you’ve entered the realm of big data. So the bottom line is that you can use big data as an input for data science. But that’s not the only thing you can use last month usage statistics last years, doesn’t have to be big.

Greg Lambert:  Okay. What about things like, artificial intelligence.

Sarah Lin:  So I have a little bit of a beef with the prevalence of the term AI in the law library publishing this year, because I think it’s just bandied about. It’s like, the way my son uses the word like, it’s everywhere. And, and artificial intelligence is an umbrella term, it can mean a lot of different things. There’s a bunch of technologies that fit under that. I think the ones that are really relevant to us are machine learning, text mining, and natural language processing, but there are others. Um, and so I think it’s really crucial to know what type of AI technology you’re talking about. So machine learning, just to answer that question. Machine learning is data science. It’s tidy, big data that gets processed in code, using statistical analysis methods. And then machine learning is the purpose is to make predictions whether you’re predicting a classification or predicting a number. And usually, there’s a key bit of accuracy. That’s an important bit of the predictions are accurate, you know, to a certain confidence level. Yeah. So for me, data science is the foundation of machine learning, as well as text mining and natural language processing.

Marlene Gebauer:  Where do you think data science can have the most impact in a law firm or the legal industry?

Sarah Lin:  Well, I think the I mean, it’s a little bit what I alluded to is, you know, learning how to do machine learning the the utility of understanding by having done some data science is really impactful sort of for everyone at all levels. And we talk a lot I think about the legal industry being behind the times technologically. And I think data science falls under this umbrella a little bit. I listened to the recent episode you did with David Kamien, and some of the things that he said, actually, a lot of the things that he was talking about are underpinned by data science. And so I was thinking, you know, how much more understandable without event to listeners who had done some data science and they knew really concretely what it took to do those things. I think that most law firm clients have data science, if clients are of any size, they have data sciences, scientists on staff and the skills are growing, data science skills are growing, I think in law firms, but there’s still a lot of room to catch up, particularly with the law librarians. And I think on a like, just very practical level that the data tidying skills that you learn from doing data science can really enable the legal professionals to, like join datasets from different software applications that can’t talk to each other. Right. And I’m thinking of, you know, particularly, there’s data about what attorneys spend their time on, that’s in a number of applications, you know, maybe from the library to the finance department, and everything in between. So being able to join those databases, join the data sets, excuse me together and do an analysis and see a really holistic picture really has a lot of potential. And then I have a couple other examples. So of what I think really practical, practical, impactful skills, sort of beyond the tidy data because there’s a lot of things you can do with data science. So I’m thinking about making visualizations, doing more statistical analysis, as we mentioned, learning about you know, those AI technologies that underpin our databases. And those things, you know, the librarians can do that just about their own work. But then they can also do that to assist with, you know, the research that they’re doing for clients and attorneys are able to add value there.

Greg Lambert:  Yeah, them and that may have played into the next question, I want to ask if, if you were still back in a law firm right now, what’s some of the fun data that you would like to play with, in order to kind of tell a story about what that data tells us?

Sarah Lin:  Yeah. So when I was at Reed Smith, I was involved in that monthly reporting process. And I was really never 100% satisfied with it. But I didn’t know kind of what else we could do. So given what I know, now, if I was back in that seat, again, I can see that when the purpose of those reports is to show our value to the essentially to the profitability of the firm. Right, what was missing, was the profit numbers. I didn’t have any firm numbers to compare to. So you know, it doesn’t matter that I ll requests were up last month, year over year? I don’t think so. Um, don’t get me started on counting things that like don’t need to be counted.

Greg Lambert:  Count how many times somebody walked into the library?

Sarah Lin:  Right? Well, that would be interesting now. But I’m thinking like, we you know, we did a lot of that reporting on like, researcher output by type, money spent on you know, various contracts, but no point where we like bringing in that profit data, and

Greg Lambert:  So it’d be more instead of reporting on outputs, it would be more on say, impact, if you’re looking at, you know, and one of the things that we’ve kind of preached here is that it’s better to be seen as being part of the revenue side of the firm than the cost side of the firm. So it seemed like if you could find ways to use the data to show, you know, this is how we’re impacting revenue on the firm, then, you know, then you’re golden. At that point. So sorry, I didn’t mean to answer my own question here. But it, how would you convince the powers that be at the firm, just to understand how important it is anymore to understand the data that you have, maybe even to have data scientists, on staff now, to help them help you secure better outcomes for their clients?

Sarah Lin:  I have a number of thoughts about that about the powers that be, I think the best way honestly, to start is to show off your data science skills, and prove that you know what you’re talking about. So just like going back to some of the data that you might want to collect, I talked about at the AALL webinar about business development time. And if you could get the from the firm the number of new matters open, new clients signed, ideally, there would be a statistical correlation, like a very strong say in arguable defense, that these things are related. They go together, and if you break one, you’re going to break the other, which I think that makes your argument that this is essential. And I do want to mention, because I think like no, it’s fallen out of favor and is is controversial, but I really think it’s important to track librarians time. Now tracking it isn’t the same as billing for it. And no, it’s not going to be built for right that that part sort of isn’t up for discussion anymore, but tracking the time. Just because, you know, you might not be able to use the timekeeper software, you might have to do it on your own, like your reference tracking software. So it’s really important for you to be able to track, you know, based on client matter attorney number, what your researchers are doing, enables you to compare apples to apples, when you look at some of these other sources of how attorneys are spending their time or information on time spent for clients. So I think it’s really important to be tracking that time, however, you know, however, you need to do it. But if you don’t have the data, I think specifically about what you know how time is being spent by your staff, it’s so easy to justify it as being overhead because there’s no evidence to show that there’s any kind of relationship. So going back to your question, which was, you know, how do you convince the powers that be, I think you’ve got to have the data, and you’ve got to show off what you can do with what you already have. Once you sort of proved your mettle, then I think you’re able to get access to more data sources, I’m not sure you know, in every firm if you walked up to it and said, I need access to the data warehouse, if, if that’s going to come but if you know, if you’ve shown what you can do, you basically need to dazzle them with the data that you have, maybe doing something like making an interactive dashboard of spending, right. So you’re not recreating this every month, but you’re making one dashboard that’s automatically updated as you get new data. And the powers that be can look at that anytime that they want and know that they’ve got, you know, up to the… well, maybe up to the week, you know, data on spend or and I don’t think the idea of collecting data is controversial. I think pretty much folks are on the boat with that they know they need to collect data, I think kind of alluding to what you said before the being seen as a cost being seen as overhead. That’s the point that can be new is using data, collecting new data, documenting the stuff that you’re already doing that enables you to really make a make a strong argument. I think, too with I’ve noticed this with the data that I do in my own reporting at RStudio when someone is given the raw data. It’s like they believe it more. Mm hmm. Right? When they see it in the interactive that it’s not just a static graph, when they can say, Oh, well, what happens if we look at last month or last year? So they ask their own questions, and they get it answered in real time, sort of their converts? They’re real believers, that what you’re telling them is, is fact.

Greg Lambert:  Yeah, we, I try to tell my people, it’s very important that we measure what matters, and what and who that matters for it’s not necessarily what matters for us, but rather, what matters for the people that we’re trying to tell that story to or, or convinced that we need one more researcher or we need a data scientist to help us with that.

Marlene Gebauer:  So how or where would someone start on a path toward making an impact with data science at their job?

Sarah Lin:  I’ve got sort of a two-fold suggestion for that. So the first thing that I think folks need to do is to think back to those three aspects of data science data, tidying, statistics, coding, and figuring out which improving which of those skills is going to make the biggest impact for you sort of which Where are you going to get the most bang for your buck. And once you’ve got that area, you know, then pick and output that’s either something you’re going to create something you’re going to produce that’s either going to be totally essential for your job, or it’s going to be really, really fun, and you’re really invested in it. So the example that I had during the webinar was a data scientist Ryan Tempie. He’s a data scientist at Lego. And he gave a presentation at the RStudio conference, which is learning our through humorous side projects, and that will get you excited and 20 minutes of fun things but there’s, you know, every sport has statisticians in R who are, who are doing that thing. So, you know, find something that’s super fun. I mentioned in my learn text mining through the Jane Austen R package because like Jane Austen, but there’s an Office package that will give you the transcripts of the TV show The Office. So if that’s, you know, if that’s more in line, what you’d like, then, you know, go that way. But if kind of going back to work, just my own experience, I have worked with Google Analytics data for five or six years, you know, it sort of all the places I’ve worked in the last five or six years. And so that part, I’m used to data, Google Analytics, I’m used to doing the data analysis. But now that I’m at RStudio, I have to produce my reports in R. Some of my co-workers say that we make our own dog food. I have to do that. And so it’s been imperative for me that I learned the different packages that are related to reporting into the shiny dashboards that are interactive because that’s what the powers that be wanted. What my coworkers are expecting, you know if I give them usage data about their website. But I do want to mention, if you’re looking for information for like some of these specific skills just to give people something to start with, aside from Ryan Tempie’s video, which is really amazing, I nearly died laughing when the Golden Girls text mining thing came up, he has some stuff on Legos too, more dinosaur stuff. It’s It’s great. Anyway, but so if you want to look at like, if you want to focus on your data management or your coding skills, the organization, The Carpentries has workshops that are really wonderful for beginners, the if there’s not a workshop since they’re often more available just by nature for academic librarians. So if there’s not one by you, the lessons are online for free, you can work through, you could work through the R for social scientists curriculum in about three hours yourself. So it’s definitely doable. And then I think if you were looking at like those interactive dashboards that I’ve mentioned, they’re extremely popular. We have a website called And that is a place for general R users to post their dashboards for folks to see there’s also think about 80 templates from my coworkers who made the shiny package. So you can really see if you want to see what’s possible. That’s some really great stuff. And then, okay, so like statistical literacy statistics is the other bit. There’s a lot of sort of general audience statistics books that are out there. I think that is a good place to start not a statistics textbook. Don’t do that. But you can also look on Twitter, the R stats, and Tidy Tuesday, hashtags will bring up just hundreds of thousands of hits. Tidy Tuesday is an interesting one folks post on Tuesday, the tidy data sets that they’ve been working on in the last week, and they span the fun to the essential research every single week. So that’s a great way to see sort of what’s possible with statistical analysis.

Greg Lambert:  Very interesting. And of course, the best thing about is, as someone who uses R themselves, how much does it cost me to use R?

Sarah Lin:  Nothing.

Greg Lambert:  So we’d like to hear.

Sarah Lin:  That’s a great point, though, that I that we should underscore. So the software RStudio is free to download. Most of my co-workers have written sort of books on how to do data science in R and as a practice for our company. Those books are available for free. There’s tons of information about getting started, like on And within the art community. There’s chat local chapters for folks to virtually meet up and see what other folks are doing. So lots of stuff, no cost.

Greg Lambert:  All right. Well, I think I think that’s it free is where we want to post Sara Lynn, I appreciate you coming in and taking the time to talk with us today. It’s been fun.

Marlene Gebauer:  Thanks, Sarah. We really enjoyed this conversation. It’s this has been a really good discussion.

Sarah Lin:  Thanks, Greg. Thanks, Marlene.

Post Interview

Greg Lambert:  All right, Marlene, I don’t know if you could tell. But I was geeking out just a little bit with Sarah on the data analytics talk. And

Marlene Gebauer:  I’m so glad to hear that, you know, it’s good.

Greg Lambert:  So it was interesting that she mentioned that, you know, MARC, which for the non-librarians out there stands for machine-readable catalog. You know, it’s the data structure that we’ve used in the library field for decades to build our library catalog, and the information that’s behind it, to make it you know, searchable to pull all of that information together. So before there were data scientists, and before there were IT departments, the computers were in the libraries. So while some of us may wonder why librarians would be a data scientist, I think for those of us that have worked in tech services, this is no surprise.

Marlene Gebauer:  And it’s not just for library techies, she references in the interview that we did with David Kamien, talking about his ideas for producing results based on datasets, articles, and other text-based resources through the use of metadata. The data scientists and tools like R or Python help with those bigger more unstructured data sets as well.

Greg Lambert:  Yeah. So you know, I’ve enjoyed myself using RStudios over the past few years. So, you know, Sarah gave us a number of links. So she sent us an email after this and put a number of links that we’re going to place on the 3 Geeks website. So if you’re interested in starting to use products like ours, to help you crunched some of the data, go check out those links and dive in. It’s free, and it really is a lot of fun to learn a new skill like this. All right. Well, thanks again to Sarah Lin, from our studios for joining us today.

Marlene Gebauer:  Thank you, Sara. Before we go, we want to remind listeners to take the time to subscribe on Apple Podcasts, Spotify, or wherever you listen to podcasts, read and review us as well.

Greg Lambert:  Yeah, it’s been a while since someone rated us so

Marlene Gebauer:  I know we’re not feeling the love.

Greg Lambert:  Yeah. So you give us some Love hear, I just want to hear how many people actually listen to the end of the show.

Marlene Gebauer:  If you have comments about today’s show or suggestions for a future show, you can reach us on Twitter at @gebauerm or @glambert. Or you can call the Geek in Review hotline at 713-487-7270 or email us at And as always, the music you hear is from Jerry David DeCicca. Thank you, Jerry.

Greg Lambert:  Thanks, Jerry. All right, Marlene, I will talk to you later.

Marlene Gebauer:  I’m off to get some more pumpkin spice whatever.

Marlene Gebauer:  Pod Dog

Greg Lambert:  Pod dog