Video: Office Hours: Lock It Down. Secure Data Access with ABAC, RBAC, and Fine‑Grained Controls | Duration: 1986s | Summary: Office Hours: Lock It Down. Secure Data Access with ABAC, RBAC, and Fine‑Grained Controls | Chapters: Introduction to Starburst (23.135s), RBAC Demo Introduction (232s), Column-Level Access Control (369.565s), Row-Level Access Control (554.82s), Attribute-Based Access Controls (765.745s), Q&A Session (1380.4s), Role Filter Configuration (1553.67s), Closing Remarks and Questions (1670.09s), User Filtering Implementation (1985.388s), Closing Remarks (1985.388s)
Transcript for "Office Hours: Lock It Down. Secure Data Access with ABAC, RBAC, and Fine‑Grained Controls":
Hey, everyone. It's, Lester Martin, your friendly developer advocate from Starburst. Hey. What are we gonna do today? Today, we have a quick, this will be our second ever or second reboot of this office hours routine we're going. And if you joined us last time or if you didn't, what we're doing here is, secure about thirty minutes of your time. Gonna spend ten to twenty minutes on some kind of recent interesting demonstration, a live always a live demo so things can go wrong. And then we wanna open open mic and have a good old classic kind of professor office hours where ADP any questions can come up. Not just questions on what we're doing today. So feel free to ask those. I will be peeking over at the the q and a at times, but I do wanna kinda hammer through my demo first. So if I don't notice you, realize I will notice you, and you'll have a chance even to come off, mute and ask verbally if you want to instead of just typing or even join video and that kind of stuff. Quincy, is in the background. She's gonna help me out, keep me honest, make sure I don't forget to share my screen or something fun like that. Alright. Without that said with that said, I should have stepped forward because that's a really boring slide and put, the pretty picture. So you got two pictures of me on screen there. If you're interested in in connecting with me, that QR code will find me. And if you miss it, that's okay. There's only so many Lester Martins on LinkedIn, so you'll find me. There are a handful of us interesting fellows out there. Okay. I wanna mention before jumping to it one more time is, as I said at the beginning, it's our second one of these. We're do we we're Quincy and I are laying a whole bunch out. What we really wanna do is make sure we're showing demonstrations of things that people want to see, not just things that we want to show. So in absence of that wall, we show things that we think are interesting, but you absolutely please please share. Oh, yeah. From. Hey. How you doing, buddy? Please do share those, and you can share them a a variety of ways. DevRel has Starburst IO one way to kinda mention it. Find me on LinkedIn. A lot of you might already have my email address directly. Feel free to do that, etcetera. Okay. So today, the topic was securing secure access and that kind of stuff. I could show you a 100 slides about what is Starburst, but this slide kinda shows all at once that in general, we're, you know, we're built on tree now. We're just we're, you know, a massively parallel processing engine that really focuses on federated, query access. We can connect to lots and lots of data sources, even put those together in a federated query and join across those. And there are more things that we do at Starburst that aren't just the core engine, and the ones we're gonna focus on today are around there, some of the enterprise platforms. Security, little bits of governance here. There's more tools that we have than we have time to show, but we'll focus on kind of classical security stuff. Access controls, RBAC, RBAC, row level, column level, masking, filtering, all that kind of fun stuff. So and tags as well to do attribute based access controls. We leverage or I like to think that we do a subset today. We do tag based access controls or tag based attribute access controls if you're familiar with that. If you're not, we'll see it in just a minute. We're not gonna get into it today, but I I I would always do myself or this company a disservice. I didn't mention what are we doing in AI. We're doing a lot of cool stuff in AI, lots of webinars that Quinti and I'll make sure try to point you to. But to do AI elements, we do have to integrate with models, and we treat models just like another asset, that comes under management as well. And we can create all the things you're seeing today. We can create them on models, including other actions like could you not just who could use, Lester four dot nine and Lester five dot one models, but how much they can use it. You know, we don't sometimes we don't we want folks to have access to stuff, but, know, we don't wanna run a giant bill up. Okay. More importantly, it's for after, and I haven't started demo. And that's what this session's about, demonstration. So, again, this is the rough order that I'll tackle my demos in. I'll probably come back to this a few times, but let's run over here, and I'm gonna make sure my table's alive. So I'm a do some RBAC. We'll do table and column level, role based access controls. So what you see on the screen is I got a quick table here. It's called the customer table. Just did some quick reads on it. Yep. Neat, nice stuff. It's a bank. My cloud bank customers where I have that information. And I want to what do I wanna do? I wanna create a marketing team that can't that a marketing team and then ultimately give them access to that table. So what I'm gonna do, I have a lot of administrative rights. I'm gonna go over here into roles, and I'm gonna find out that I don't have a role. So I'm gonna create one, US marketing. MRNKTG, t g, US marketing. That's my thing. Nice description about them. And I will just add myself in the US marketing because we do have a nice way to toggle in in Starburst, to toggle ourselves and see ourselves. We could create separate users, but that'll take a little time. So let's do this. I'm back in my query editor, and I'm just gonna toggle my role. I'm gonna go into The US marketing role. And to be honest, I noticed already I don't have access to this, but let's verify. I'm gonna try to run under my cloud. I don't even see the schema, so no wonder it's really gonna blow up. My cloud country. Schema must be set. I can't see the schema, so I should have typed it all the way in my cloud dot I think it's called mycloud.bank.custom. So they're fully qualified the name, but it's gonna tell me, as you would expect here, the relation is not found or allowed, the customer. I don't have access. I can't see customers. Let's fix that first and foremost. So let me go back to my administrator role, go back to my roles and privileges, find my US marketing, and then I'm just gonna do a good old kind of thing. Give a privilege to my cloud my cloud bank, and let's give them access to the whole scope of that data. There's only one table. That's okay. And what do we wanna do? We want to allow only select from tables. Boom. That ought to give us the right. So once that's saved, I'll just toggle back over to my query editor. I'm gonna switch to that marketing role one more time. And good news, I can see it on the left, so I probably can see it, on the right. Yay. And I'll go ahead and toggle this down since I know I don't qualify everything. Alright. So marketing can now see the table. Yeah. Role based access, pretty normal stuff. Granted them some read access, verified it worked. But I realized along the journey that there are some data I wanna talk start to think about. So the limited way of taking care of things like PII and stuff might be the old classic column level rules. Like, maybe I don't want this group to see the phone number. Okay. I'm gonna tackle that as it says here as at a table at a column level classical, access control. So I'll go back into my, admin user here. And, again, if you guys are asking questions, I'm not looking. I'm gonna keep pushing, and then I'm gonna have plenty of time for questions because I will keep it moving. Went back into my roles and privileges, back in my marketing team, looked at my privileges. I'm gonna, add another privilege. This time, I'm gonna do the same thing, my cloud. Oops. My cloud bank. I'm gonna go all the way down. Bank customer, and I'm gonna pick phone number. I don't want them to access. So I'm I'm gonna deny a select from that column. Deny, select, add that privilege. Again, these are straight classic r box columns, schemas, tables, you know, columns, all that kind of good stuff. Let's verify that we no longer can access that by becoming our US marketing rule one more time. And I'm gonna run it yet again. Let's see. Run it yet again with phone number right there on line 20, and it should blow up. It did. Can't select blah blah blah blah. But the short and sweet is the US marketing doesn't have select privileges on phone number. Okay. That's what we wanted. So how do we verify or how does my customer usually deal with that? They just take phone number out of the list. Boom. There we go. I have it without phone number. Now there are some other cool things we can do in that. I'm gonna come right back to in a minute because that was kinda PII. There might be a better way to solve that, but but I wanna make sure, you know, we do have column level access. And if you talk about column level access, it doesn't hurt to talk about row level access. Now row level access, the problem with that is it's not always well, I'll just let me just speed it up. We're gonna solve that as a filter as opposed to I you know, how do I tell us table only this, only that? So we'll we'll define that. Our problem is this. Our team is called the US marketing, and I'm noticing they can see Canadian customers. Let's solve this. I don't want them to see Canadian customers. In fact, they still can. Right? Yep. They can see own they can see Canadian as well. Let's fix that. How might we fix it? Well, one way, and the way we're gonna fix it is or the best way is we're gonna go create a row filter. Now I went ahead and set this up. I went ahead and created a row filter, hit the create button. And when you hit the create button, it just drops you into a form that looks a lot like this, and you can make up a name. Call that name US only. Allow access to only see US. And row level filters are really snippets of SQL. So you're gonna write some SQL that once applied well, won't blow up. So there is some testing in low level filters to make sure it works, or you might have an invalid SQL statement. But, arguably, this is what the filter is gonna say. There's a field called country. It equals US, that's the filter. We can do different things. We can use it, include it, whatever. I'm gonna show you how that works here. Alright. So I created that already, and then I went back into the role of US marketing, and I'm gonna add a policy now. So this policy is gonna say, US only only. And I'm a just go ahead and call it US only, US only just in case you see them in two different places. They're two different things. You know, this is the policy here. I can write a description next put some how long it's good for, etcetera, but all looks good. And we wanna say on that, my cloud catalog and the schemas in bank. And I'm gonna lock this one down to that customer because this one I know has that field. So that's my scope. Now anything in that scope so it's one table, so it's no big deal if I had more than one table to try different things, but this is gonna come into play in a minute when we talk about attribute base. So right now, I'm gonna say that's what I'm looking for, this this specific activity or or or asset here. So I'm gonna create a policy on that. And what is my policy? I just wanna apply a role filter, and there he is. US only. And I'm just gonna add that. Create me a policy that prevents that lets you only see US fill that only see that access. So let's go look and make sure that works. So we I think it should work. So I'm back to my query editor. I'm just gonna test it right away on Canada and see what happens. Finds no records. Let's look at US and Canada like we saw before, and we should see only some US activities. So you could play out not only and verify that works. So cool. We did some, you know, row level filters. So table, column level, row level type stuff. But now let's talk about stuff I think is a little more interesting, especially if you think about that phone number. What else could we have done to tackle that? Well, over in our world, we can talk about starting to tag data, putting some kind of labels on those. And that can be done by humans, and or AI because we we are not gonna go into that demo today, but we could ask our engine to proactively look and see if you can tag things for us and then offer that back to us to review it. So I've already created a few tags. I have some PII parent with some chill children, SSN phone number, date of birth. I'm not gonna use the phone number one because we've already, you know, done it different way, but there's date of birth, Social Security number. And then there might be things that are not quite PII or those ones that in conjunction with a few things might be at PII, so I'll create one called personal. Now what we have to do now is actually go apply that. And as I said, you could do this by hand. That's the way I'm gonna do it, or we could do it with, with, some AI tools or a combination of the above. So by hand, I drilled down and found the customer table. Now what do I wanna do? I wanna find those PII things. I'm putting my governance hat on it. I thought I saw first name, last name, date of birth date of birth. I wanna add a tag to that one. And guess what? That tag is the one we know. Well, date of birth, both It's added. And I think there was also one called, what was the other one? Social Security number. There. So SSN. Let's go ahead and tag that appropriately. SSN. And then if you heard me right, I said I created another tag called personal. So last name is one of those fields that it's not PII all by itself, but in combination of a few things, it can be. So I decided in my terminology, in my world, we'll call those tags personal. We tag them. Now what we have to do is build a policy that understands those tags and and interprets them at runtime. So I'm gonna do very much the same thing as before. I'm going back to my roles and privileges, find my marketing team, back to that policy. There's that US only policy. Let's add a new policy, and this one is deny PII. We don't want them to see any of our PII information, this marketing team. So let's lock it down again. Catalog will be my cloud. Schema is gonna be bank. And, actually, I'm gonna say everything in, all tables. There's only a there's only a customer at the moment, but, know, everything in this schema or everything in this catalog, if you really want your everything. You can say for this role, no matter what data we have, wherever it is, if it's got a PII tag, they can't have it. Okay. Actually, I missed the big step, and I better do it right now. So I'm saying, but but but here's the problem. It has to know how to link that, so we call that an expression. So we wanna say as it's act actually, we're gonna type exactly what you see there. We're gonna say, hey. Look for things that have a tag called p I or things that have a tag as tag, p I I dot splat, anything. That's our role. We can get very specific, but I wanna catch everything under the PII umbrella. So we found that. And what do we wanna do with it? Well, we could do a variety of things. We could apply filters, and gonna see masks in a minute. But, arguably, all we want to do is deny, select, and arguably, anything it finds. So any columns in this situation, we don't want them to to read to see access to. So let's verify. We saved it. We switched back over to my marketing team. We went into the query editor, and they are on this this one right here. Right? They can run US and Canada. Boom. Can't select date of birth and Social Security number. You don't have select privileges. So was it one I specifically picked those columns? I did specifically tag those columns, but it does let me say universally anything with that tag. This is the kind of rule I want to do. You could apply tons and tons of rules if that's what you want to do. So how do we fix that one? That one gets pretty easy. Well, let's just take those two fields out of our query, and then, you know, there we go. Just like we fixed phone number. So I showed you a couple ways to fix, PII. Now you could go one more and and be somewhere in the middle and say, well, maybe I wanted to see some stuff. I wanna use tags to build a rule, but maybe decide what to do. So we're gonna implement one more thing called masking, column level masking, and we'll do it on that last name, this Wilkins one that's I call personal because it's not PI all by itself, but with Jessica Wilkins and the date of birth or something that would be PII. So let's just decide. Marketing team could see a little bit of their name, but nothing else. So to do that, I need to also go back to my admin user and build a column mask. Now you probably saw in in, in filters, there were none provided. We give you a handful to get you started here, mask things, make hashes, all kinds of fun stuff, full show first this, first that. I went ahead and just made up one from scratch, and I totally implemented it, to show you how what this is really about. So you would create a mask. I'm editing, and I called by the first two characters first two characters. And you have to kinda say, what am I gonna send? What kind of field? Okay. You know, I'm gonna be sending a bar chart to this. And then some kind of description. Show only the first two characters followed by underscore. And what do you put here? They put whatever SQL that you can that will do what you want it to do. So this is catching it behind the scenes. It's seeing, oh, I found it. Oh, this is masked. Let me apply this. So it's gonna bake this kinda like the row filter into the query. So just like before, do need to make sure this works. So do a little testing with this to really, really work it. But at heart is this. You're handed the field called column. So I just cast column to VARCHAR just a 100% in case they send me an integer or something, turn it into a VARCHAR. I read a substring on that that says, give me the star net one, give me two characters, so the first two letters, and then wrap that with the concatenate that says put some underbars, some underscores over here afterwards. So it looks like, you know, their name isn't c r. It's c r underscore underscore underscore something like that. C r log. Alright. So I build the mask, and then I have to apply the mask. It is really just like we saw with the other. This is some kind of attribute based role here we're doing. So we're gonna go back to marketing, go back to policies, and let's add one more policy. This policy is our, I think I called it yeah. Trim personal. That's what we call trim the personal information. Okay? Come on in. And we're gonna trim personal. You know what? We can just I I don't usually do it this way, but we'll just do everything in that entire catalog, trim personal. If it's marked personal, then slap that on there. And you have to apply, as I said before, something that says, what are we talking about? So hashtag personal. And this is where, you know, maybe personal has 16 different tags. Describe them all or and or this and this and that or whatever makes sense. Ours is a simple, simple example. So when you find those things, do what? For anything nope. Doesn't like something. Oh, I gotta tell it. All schemas. Yeah. Boom. And I guess I don't have to see anything there. Yeah. All schemas. Yes. And like I showed before, could we apply filters raw filters with this? Sure. Can we apply, privileges with this? Sure. Sure. Sure. We're gonna just tackle it on here. So you can mix and match the heck out of this stuff, but it's gonna say, hey. Anything in the My Cloud catalog, any scheme, any table, that meets the rule, the rule was it has a tag called personal, then do what? Let's do the first two characters, and I think that'll be enough. Create the policy, and let's just go back and verify. So switch back over to my marketing team, run that same query one more time, cross our fingers, and hope the last name. Oh, I applied it. Oh, silly me. I applied it to first name instead of last name. I won't go back and fix that, but that would have been the answer. Or in fact, maybe all of sudden, a lot of stuff is personal. You know, maybe the city's personal. Maybe other things are personal. We can go back and plug that rule, and then we get the j e underscore and Wilkins underscore. I guess I just for fun, I'm gonna fix it. I said I wouldn't, but sometimes I do that. I say I'm not gonna fix that. I do. Real quick. Fix it. We're go back in there. We're gonna look at the bank. We wanna look at our table called customer, and then we're gonna go I'm actually gonna take off this one. So, like, manage the tag, delete personal there, and then we're add it where I said I wanted it last name. There it is. And personal. So this is decoupling, you know, the rule from what the rule is firing on. Pretty good stuff if you ask me. If I did that right, I'll run that same query again. Well, I'm also on the screen that that user couldn't see. That's okay. Let's try it again and see if it switches with first name and last name. Can't select oh, deletion not found. Can't oh, that was fun. What did I do wrong here? Oh, that user. I wanted the marketing use. It's all working like it should. I just have some operator error. There it is. Jessica something, Ryan something. And to go to be honest, about, two minutes later than I wanted to be from now, I did finish the demo I wanted to show today. So classic RBAC role based access controls applied to tables, schemas, catalogs as well, column level rules, row level rules, used attributes to create tag based or attribute based access controls, which were just classic, you know, don't show this column or don't show this table. Can be all the normal kind of style, ABAC with ABAC, or you can play other funny cool stuff like, column masking that I also showed. The good news is that takes us to some questions and answers. And two things before I jump over there. I think Quincy is letting you know if you wanna, you know, raise your hand and come on screen with us, you can do that. In the meantime, I will assume that everything's in the chat. So I'm looking over there to see what's in there. Hello. Hello. Hello. Hello. I'm from Atlanta. I didn't mention that. Charlotte, Poland. I know, Thora, good to see you again. Bangalore. Been there twice. What framework is being used in the back end for data governance? Now, Shreedhar, data governance means a lot of things to a lot of people, so it doesn't hurt to kinda qualify a little bit. But I'll say this, data governance for a lot of people just talks about security of access control. So what you're seeing is what we call BIAC, built in access control. So it's our own engine. We built we built it to lessen our rely lessen our our dependency to Ranger, Apache Ranger that's out there. So we built our own. We still can work with Ranger. We can work with Premiseera, Muda, possibly a couple others. I'm forgetting off the top of head. So you can outboard this if you have your own access management system you want to use, and there's good reasons sometimes to do that, and there's good reasons sometimes to go, you know what? This is gonna be that access control because I didn't show it. I pointed at one table. But since we can federate across lots of things, you can set lots of rules up. And by connection by connection, you may say, yeah. Use what Starburst, what the Treno engine itself knows as the policy management, and then just let Treno and Starburst, you know, have a super user to get to the real stuff and trust that'll take care of it. So this was our own, but we have some other options that we can use Shreedhar. Do we have policy feature in enterprise edition? You you're you're if last, I have to go back and look at the road map and see where we're at. We just released four eighty. 100%, the intention is, full on parity between these two platforms. Our goal, believe it or not, is as soon as we can, we're working diligently, is to have Starburst, not have Starburst Enterprise and not have Starburst Galaxy. How are we gonna do that? You're gonna use a a front end called Starburst portal. It looks a lot like Starburst Galaxy where you can see lots of clusters and lots and lots and lots. So if it's not there today, Akhil, it is coming, and you're welcome to you know, we can catch up afterwards and make sure you have a better answer than it's come it's coming fast, or maybe some of them might have made it in there already. Can we add role filter dynamic based on role names or something in Starburst Scout? Can we add the role filter dynamically based on the role numbers? I don't think not based on the role, but you could create a role that's very comprehensive. You know, you're right. You have 50 roles. You wanna set up that policy for that, whatever which one is the role filter? For the role filter for 50 things or 20 things, no. You don't want to, but what you could do is create a role that they're all all these other roles are also in and then build that role filter into the bake it into that role, and then just make sure all the other roles are aware of it, then that process of what is at least what they call least most restrictive access kind of theory or whatever the thing is would get you kinda what you want, one place. And it wouldn't be dynamic. It would be dynamic by putting whatever role that you want to do that in that generic kind of filtering role. So we can call it I would call it a way to intelligently configure that. And it is it's inherently dynamic, but I wanna make sure I didn't oversell what they could do there, Akil. Todd, can we add role filters down the oh, that's that was Todd's question. Apologize. So thank you for that one. Alright. Bernard had one Raise your hand button. Oh, couldn't see any platform any plan to get ABAK yep. Robin, that we were just talking about that. We're we're bringing, with intensity, bringing back into Starburst Enterprise. And sucks. And it's going to that. We're trying to not even have to call it enterprise in Galaxy. You just have choices truly hybrid. You can have clusters on prem, clusters in the cloud, clusters in different clouds. The Yeah. Current open OPA for access control. Sure. I I think the answer is yes, but we're gonna have to sync on that one. So please please reach out to me. Quincy, you it's oh, yeah. Great. Quincy was asking if there's other topics, you know, you wanna see in these office hours. We want them no one came on stage. I'm a little sad. If anyone wants to before we go here, just come on stage and say hi and tell me how good or bad this was of you a good use or bad use of your time. You're always welcome to do that. And then, I I'd like a user can see only data belonging to yeah. Oh, like a user can see only data belonging to his or her name. I don't know if I know how you would declare that data belongs to his or her name, but let's think about that one. If you hang around, maybe I'll come on. You can explain that a little better. A topic that would be great from Bertrand is how tools like Tableau send multiple information. Starbucks filter data. I have the use case, which was. Alright. Bertrand Bertrand, if I I'm gonna look at that afterwards today, and if I misunderstand the question or the topic, I will probably reach out to you directly and make sure I have it. With that said, I think the questions have slowed down. What I would encourage folks to do is just go to our go to our starburst. Site, and, should put a nice link in the side there. But under resources, we have this nice little section called, event calendar, upcoming activities and stuff. And, Quincy and I are about to load it up with a whole bunch more things, but there's quite a bit of good news in here, including, next week, we'll be doing, a manage iceberg pipeline, webinars, a follow-up to one we did recently. There's some some more new features that are in there. It's really cool stuff. I'll be doing a migrating from Apache Hive to Iceberg, workshop. That's our other kinda normal thing that we do, and it's more like ninety minutes. You'll have a hands on lab. You'll have an environment, and you can do it then. You can do it later, that kind of fun stuff. And a lot of other good stuff, MCP, our own AI agent, lots and lots and lots of things coming at you. But, Tron, the main issue was the ABAC. And what you see here, the fact that I can actually see more than one cluster, this is a galaxy thing today. Those that know Starburst, you know, you go to your console and you see that cluster's definition. Starburst, we have a control plane, basically, that can see lots of stuff and then wizard driven and that kind of stuff. So there's some things that are not the same, but BIAC core all all their same kind of setup once it's configured, and then the ABAC, the attribute base with those policies is the p set. I do not believe quite made it to SEP, Starburst Data Enterprise Editions, enterprise platform, GA yet. But, again, reach out to me if you need the exact date if we don't have them out there. I'm I see one last question. I'm gonna take a glance at it and see if I have time. Data is an Oracle DB on prem and Starburst on cloud. Server is using query or the which came yeah. Starburst used to query Oracle DB, which contains the.com. Can I apply the column mask of it? Then can, Ola, if you, if you're using the access controls here. And so, basically, that connection to Oracle under the covers, we set it up with a pretty pretty powerful user. So we're expecting that queries that come into Starburst have some, know, access policy engine that then declares what's going on. So absolutely, yes. Regardless if it is or isn't marked as PI in, Oracle, we're gonna use a pretty powerful user, then we'll notice, oh, the policy says, yeah, that's PII, so go ahead and mask it before we see it. The consequence, of course, is if you set a bunch of rules in Oracle already, then you you might be, how how do I either duplicate or unwind those? Because the user that's coming across will be a super user unless we do what we call credential pass throughs. But when you do credentialing pass throughs, we don't wanna for that setup, we don't wanna, like, do do two different access in access policy systems on that connection to Oracle and say, okay. Well, it's Fred. Send Fred down to Oracle and see what he wants to do. So you have a lot of options, and that's one of the key things that we like to use around here, a term called optionality. It's a whole lot cooler than flexibility and configurability. You have all those things, but you have a lot of options. Sometimes too many for some people. That's okay. We have a lot of opinions too. So, if the options are too big, ask us for opinion. Alright. I think we're gonna end it on that. I would say, please, please, please, if you had a good time, let us know. If you have some more topics, I'll go back and read the one that was in there. Let us know. See you next week, I hope, for the workshop, converting, migrating Apache I to Apache Iceberg, Panzone Lab. And with that, I bid everyone a beautiful day, morning, evening, whatever it is in your neck of the woods. Thanks now. Bye bye.