Video: Query Plan Analysis Deep‑Dive with Starburst | Duration: 5379s | Summary: Query Plan Analysis Deep‑Dive with Starburst | Chapters: Welcome and Introduction (1.7780000000000005s), Workshop Resources Setup (172.338s), Workshop Setup (291.303s), Galaxy Account Setup (467.13300000000004s), Creating Data Catalogs (661.163s), S3 Setup Configuration (799.413s), Tables and Troubleshooting (987.743s), Catalog Setup Demo (1164.573s), Troubleshooting Setup (1347.678s), IAM Best Practices (1380.653s), Workshop Query Setup (1423.488s), Query Plan Exploration (1678.458s), Query Execution Analysis (1982.008s), Parallel Processing Splits (2112.873s), Query Statistics Analysis (2318.568s), Splits and Files (2717.818s), File Splits Logic (2858.288s), Learning Journey Discussion (2937.283s), Iceberg Table Conversion (3022.183s), Iceberg Splits (3210.013s), Column Projection Benefits (3478.2980000000002s), Query Plan Analysis (3767.843s), Multi-Stage Queries (4171.9130000000005s), Multistage Query Sorting (4746.763s), Query Grouping Analysis (5041.888s), Partial Aggregates (5085.843s), Broadcast vs Partition Joins (5282.123s), Broadcast vs Partition Joins (5492.7880000000005s), Closing and Q&A (5766.298s), Workshop Introduction (6009.108s), Setup and Configuration (6282.088s), Live Troubleshooting Setup (6806.258s), Troubleshooting Setup Issues (7353.888s), Workshop Setup Demo (7432.158s), Query Setup Troubleshooting (7552.213s), Query Plan Tools (7686.408s), Query Plan Examples (7938.098s), Parallel Processing Engines (8061.933s), Stages and Parallelization (8182.248s), Query Statistics (8325.898s), Query Statistics Analysis (8445.003s), Splits and Files (8725.148s), File Splits Analysis (8806.878s), Learning Journey Discussion (8939.333s), Iceberg Splits Configuration (9029.513s), File Size Limitations (9334.903s), Scale Factors Optimization (9485.868s), Column Projection Optimization (9595.068s), Query Stage Analysis (9776.693s), Predicate Push Down (9906.578s), Multi-Stage Query Processing (10179.243s), Parallel Data Operations (10355.443s), Broadcast Joins (10536.648s), Parallel Query Sorting (10815.958s), Partial Aggregates (11072.048s), Final Aggregate Merge (11196.573s), Join Operations Demo (11289.613s), Broadcast Join Optimization (11413.468s), Query Plan Optimization (11600.488s), Closing Remarks (11773.628s), Q&A and Closing (11935.838s)
Transcript for "Query Plan Analysis Deep‑Dive with Starburst": Hey, everybody. I'm Lester Martin, your friendly neighborhood developer advocate from here from, Starburst or Starburst Academy or Starburst anything you wanna call it. And, we're gonna do another one of these workshop, webinars. This is a series that, Quincy, myself, our boss, and some others had some great ideas a while back. And we've well, this is, I think, number three, of these were started just recently. And the goal here is to have a a ninety minute window, but the, you know, I don't wanna bore you with ninety minutes of presentation and all that kind of stuff. What we wanna do is have a, a hands on workshop, a hands on activity that you can do. You can either do here with me live, or if you're watching this on demand or something or if you're watching it today, and at any point you go, wow. I think I just wanna listen. Just listen. If at any point during the day you wanna ask some questions, especially for those that are here with me live, Quincy in the background is probably, excuse me, gonna tell us what you have to do, but you pretty much there's a button that says, hey. I wanna go on stage. Feel free to do that. You'll come on audio, video, or just the audio if you like that only, and we absolutely have the chat. Now, when I look over here, that's me checking the chat out. So you see me doing that. That's what I'm looking. Quincy saying yes. Feel free. La la la. And, all that good stuff. And I want you to be, open and comfortable here doing that. And for those, again, who are watching this after the fact sometime, don't hesitate to use this email address you see down toward the bottom. Let me go into slideshow mode just so you can see some of this stuff. Just because you're not here live doesn't mean you can't ask some questions. A variety of ways to do that. One, worst case easy, best case, worst case solution is that email at the bottom of there, devrel@starburst.io, and you're gonna find me. You'll find a few others too, and they might jump up in there, try to help you before me. But for for the most part, I'm gonna, try to help you and get you rolling. So, again, I see on the chat people are are mentioning where they're from, and I'm from. I'm from Atlanta, Georgia. Good old USA. Not the not the nation of Georgia. I've had a few folks, that I've met online who, when I say Georgia, go, oh, I live near Georgia or I live in Georgia. Well, I'm not that Georgia, but, hey. And I see lots of other cool places and school flags and that kind of stuff. So, again, I'm welcoming you, everybody here. Going back up a slide, I'm welcoming you to the what? Our webinar on Triennial Query Plan Analysis, Starburst workshop, blah blah, yada yada yada. And there's old commander bun bun for us too. Alright. So you saw enough about me. This material, these slides, are in the session. So if you go look under by the chat, there should be a a tab that says docs, and there are a few things there. So let's do some housekeeping. I'm gonna hit docs myself. Number one, it says, it'll take you to the one that says the three node query fundamentals, webshop, workshop. And I hate to admit it. I click on that. That is gonna take you to here. Let me drag it back over here. It's gonna take you to a Lester page. Now the truth is Lester page is just a bunch of wrappers, of some Starburst pages. Maybe we should, fix that at some point, but here we go. And I encourage if you haven't seen some of this material or you don't you know, if you if you're already a pretty seasoned person on query plans and tools like Hive and Spark and maybe Apollo and others. You know, maybe this material is not, that often, probably, at least definitely not the first one or, the first one for sure is just concepts about things that happen. You're gonna see some slides in a minute from that just to say those words and see those terms and visualizations. But these next two are pretty, exhausted. They and we go great detail, lots of good content there. I'm really happy and proud of what we did there. So I encourage you again, if you get lost today, go back and visit that. Go back and take a look at that, or peek around for that kind of stuff. Use this as a natural extension of that. Now that said, I know that probably a high percentage of you never saw that. And of that, high percentage, you probably can spell query plan, but maybe haven't thought about it much more than that. So, you know, some of it is gonna go pretty fast, but I am gonna try to keep that in mind and make sure folks are, going there. What else is in there? Let me go back to this one. What else is in there? I guess I can go here. What else is in there is a presentation link. That's this deck, it's a PDF, all the links, all the QR codes are in there. There's an instructions doc. The instructions are gonna take you to over here on our good friend GitHub. I got a project here called DevRel. Inside DevRel, I got workshops, and inside of workshops, I got today's query plan. It's going to take you right here. And what's nice about this, and I want us to get started on this, is if you're gonna plan on doing the labs with me today, it's time to go ahead and get started, which means, hey, you need to hit this word called instructions, and you need to see if you have an a, Starburst Galaxy environment up and running. If you don't, all this stuff is gonna tell you, hey. Guess what? You pretty I mean, there's a lot of words here, but what you need to do is go to our sign up page while I'm talking. Oops. Go to our sign up page where that is. While I'm talking somewhere in there, get started. Sign up. There we go. Sign up page. Give us a few pieces of information, you'll be up and running. And then after that, to be honest, all we're really gonna do is create a catalog to, to Amazon s three, to an s three bucket. And, I'll give you lots of links here, but I'm gonna kinda skim through in just a minute for those that are fast doers at the same time. But all the information, again, for you to look at this later or you're watching this on demand, all that information is chock full in all this. And if you get lost, there it is again, devarelle@starburst.io. I am gonna call out what it says here. The workshop facilitator will provide you with some s three bucket credentials. So in this lab, there are some credentials. They are no longer good. In fact, we're we're stopping the store in them that way, but I didn't change that yet. So what that means is there's a second doc that says another doc that says credentials. And that doc looks like let me find it over here. Pretty sure I have it open. Window credentials. Yeah. It looks like this. And again, more links, more words. I'm gonna show you what this looks like. It's all about 20 clicks. Pretty intuitive stuff, especially if you've seen our Galaxy interface. Have you seen any data drill down, query, set up, all that good stuff? Ultimately, there's some keys in here that you need. They're going to be good for a little while, not forever, for a little while. There's a lot of cautionary tales along those journeys. If you click on those links that say, hey, you know what? This information does get zapped pretty regularly. Really, the trick is, if you ever want to use this, you need to come back and just take three, four minutes and reset it back up that day. You can do a little less than that, but that's the easiest way to explain it because what you're gonna have is a meta store that thinks you have all these tables that live on this s three bucket. But I'm swiping that s three bucket pretty regularly so you don't go crazy and run my bill up and, and, put bad things and all that kind of good stuff. So you have this information. Again, I'm gonna run through that here live here in just a moment though. What What else do we have? Instructions, credentials, presentation, and the the foundation link, the videos. Yeah. That's it. Alright. So let's come back over here. So what I wanna do now is, see if and I see a whole bunch of chat showing. Let me go over there and toggle. Maybe they're just where people are from. Yeah. Good stuff. I'll check them out a little bit, and it's seeing person questions. The main setup is go ahead and let's get creative creative and set up. And what I want you to do is be able to get to this space right here, the landing page. It looks funny because I haven't zoomed way in to help with the video, but when you get logged into your Galaxy environment, and again, how do you do that? You go to that link or just type starburst galaxy Galaxy sign up. There you go. Oops. Sign up. There you go. Sign up u UIP. Either way, we'll get you there, I'm pretty sure. Galaxy. All kinds of ways, to get started there. Yeah. Or just go to Starburst IO and hit product galaxy. Alright. Starter free. Fill a few pieces of information. Now when you do this, I'm just gonna glance at that lab and show you. So back in the setup information, it says, hey, you can do, this there's a lab here. Now this tutorial. This tutorial, like I said, it's over over, some over exerted. It's pretty simple. You go to that page. Yeah. You say, I wanna create an account. You know, first name, last name. Oops. Back. And an important thing you get to do I know it sends you a confirmation code. You come back to hop fast fast an email. And A really cool thing you do is you get to pick your domain. My domain is lester.galaxy.starprice. Put whatever you want as long as it's available to all yours. You're gonna put a password, pretty normal stuff. To be honest, after that, it's gonna drop you into the user experience. It's gonna have a query ready to go, connect this some data, and you can run it. You're gonna go great. Now this is where at this point, if you're following this lab, either now or later, again, mainly, you can click here and I got a lot of stuff here. You you got the hard part, done at that point. You don't have to necessarily, do all these additional things. Oh, there it is. It's buried. I didn't know it was buried there too here. So anywhere you see these credentials. They may or may not work. I would say once you get into this thing, you really just wanna make sure you jump down to this state statement down here. Create another cluster. Call it AWS. Use these one for you. Again, I'm gonna be here to answer those questions. This is pretty boilerplate. There's a lot here, but really, first name, last name, email address, pick a domain and password, you're in. And then it then once you get that set up, the instructions are really just saying, hey, you also need, and you can do any Trino environment. Starburst Enterprise, if you got that set up, you want to just do Trino all over the Docker instance, great. You're on your own. Today's the easiest, very textual for the most part. And then, and then you need to set up an AWS s s three catalog. This is that other one I said that's pretty easy too. You know, you log in, and I'm gonna show you on the screen in just a second. And you get all logged in, and you say, hey. I want a new catalog. And in fact, I'm gonna stop right here and go show you what that kinda looks like. So if I'm in Starburst Galaxy yeah. That's some good question. I'm gonna come to those questions. Yeah. Good stuff. So when you're in Service Galaxy what was that talking about? Oh, yeah. Catalogs. You can go to data catalogs and guess what? Zoom back in again because I thought I set this up already. 01/25. Zoom in. You just say create catalog. You say Amazon s three. You pick some name. I'm gonna tell you use the word t m p under mark cat, just temporary temporary catalog because my instructions will use that name, but you can call it whatever you want. You need some credentials. My instructions on the credentials give you these two values. Again, they're good for a little while. And then there, just leave it iceberg and and, and I'll give you the and I'll give you the bucket name and folder I want. Now I'm gonna glance at that and tell you, look at this. Let's see if we can find it. When you put that folder and I'm asking you put project slash your name. I'm not asking you to put the words your dash name. I'm asking you to replace your name with your name, your first name, last name, or whatever. Just some unique value so we don't step on each other, and this will help us not see each other stuff. It's a way to trick things around there. There's some little caveats for those that have been all over the place. This is a US East 1 bucket. You need a cluster in US East 1. The instructions will get you out there pretty much. Should be able to see all this stuff. Now I'm going to go back. Instead of saving this, like the instruction tell you, I'm going to show you the one I set up. It's called t m t. Temp cat. Let me just go look at edit. And this should be again, I got it zoomed in so it's kinda kludgy. Temp catalog. I give you a description that says something like, remember to delete this every day. There's that use there's two keys from from AWS. Hi, Iceberg, tutorials. There it is. Projects, Lester Martin. Turn these two things all. That just tells you that, and that's it. You know, there'll be a big button that says test, connection. Should go all green. If it doesn't, let us know, and we can stop and try to figure out there. Alright. I think what I'm gonna do well, one more thing and then I'm gonna circle over there. So for those that are hoping to do this, please do that. Start raising a question because I do wanna, in a few minutes, get started with the actual activities, but this does take a little bit of few minutes set up if you haven't done it before. So I'm gonna probably, I'm asking now for those that are trying to set up, maybe drop a message in there. Hey. I'm at this point or I'm still doing it or maybe a couple more minutes. Give me an insight. You don't have to say I'm all done, because I know not everyone's gonna, do all this live, everything like that. Alright. So one other thing I was gonna say, once you have all that set up, the good news is the instructions get kinda interesting at that point. It says, great, we're all set up. Now it's time to do the stuff for the query plan. In practice, really, there's one more important thing you're going to have to do, and it's this thing called setting up an S3 location privilege. I'm not going to click and run this little video I made just yesterday, but I'm going to show you just so you can see it. You know what? I'm going to let you look at that, But I'll show it real briefly. You basically have to go in here and say, hey, under access, under my roles, you're going to base it in. The video shows you this. Go into the account admin role, look at your privileges, and then you're going to add a privilege to a location, and I give you that location to put here, something like yeah, something in there. That's probably the one. Then you're just gonna once you put it in, there'll be a box that says approve it or yes. I wanna access that. Click it. All that's in the video. If you're doing it live, this is the one part that's a little bumpy, but it lets us get a hold of a lot of data really fast instead of trying to build too much data on the fly. Again, that information is going to be, this is the data value you need. It takes a couple of minutes. If you know what you're doing, it can take about thirty seconds, could take a couple of minutes watching the video. I do not mind if you're doing this live just to say something and I'll go in and look and show you exactly the steps, and you won't be so understanding. This is part of, part of this workshop is the setup bits out there. For once those, if you do get it set up now or later, the good news is it's going to say, hey, open up two tabs, grab this. This is the setup information, and then it says also and put that into your right over here on your query editor. I put that right here, and I say go ahead and run that. It's gonna take about six minutes to run. Why? Because I'm building some I'm pointing to a lot of data, but we're building some data too, And I want you to do that. So kick that off. It'll run-in the background. Just highlight it all, you know, control a, and then that'll toggle the thing to run everything. And I've already done all that. And then the other part says come over here and grab the activities. So everything you need is gonna be right there. So copy all this. You know, if you don't have to copy it well, I can get raw. That's the easy way out of GitHub. And I already put all that right over here. In fact, one of the earliest things it says, it says something like, hey, you know, make sure you're, make sure you're in the right schema. That's what the other little setup thing did for us, and toggle it so you'll use it. And then I say show me tables. You're only gonna see three, which is okay. We're gonna use some other tables that you can get access to, but three of them specifically help us out. Get going here. If all goes well, my cluster is not running, so it's gonna spin up. It will kick out and say, yep, I found three tables. You're good to go. The other stuff here was just to force it to go. By default, these free clusters that don't cost you anything do, QIS, shut down after about five minutes of non use. You're welcome with the Galaxy cluster. That initial setup, you get $500 of credits, so go nuts. If you want to make a bigger cluster, just know that the bigger cluster is, the more you know, the basically a node, an hour is a credit. So This is a really tiny cluster of one thing, and it costs nothing. If you have a cluster of four, it's four credits an hour or well, they have some magic there. It's all linked, in the side itself. Nonetheless, I got three tables. This is what I was trying to show. Show tables. One, two, three. These three are what's trying to show up. Alright. At this point, I'm gonna take a moment, and I'm gonna go look over at the chat and see what's going on here, see if we got any problems. Invalid key, that's a good one. Okay. The invalid key, are you using whoever has the invalid key, are you using this key right here, or are you using the key that's in the lab? So the labs key is gonna be bad. You need to come use this key, and the one that says, there was some other thing burst banks. Well, don't worry about that one. This is in the second lab right here. Create a galaxy data lake. But none of the keys are working. That's we're we're changing that. We've been putting them in the clear. We're getting in trouble for that. So let me know if that's working. I know it was work was working last night. It's working right now because I'm using it, but there could be a problem getting it typed in and that kind of stuff. Alright. So I'm gonna go back and look at the chats. Quincy. Hello, Liverpool. Ask us some questions. Buck County. Yeah. Yeah. Yeah. Georgia. Yeah. Quincy had some comments there. Please share the blog URL. I think Quincy probably got that for you. It's in the it's in the link. Have to drop. Thanks. I understand that. I'll come back and watch it. I hope you're watching it now later. Need a video search. Sure. There is. We did some public searches out of curiosity to kill. Didn't get a lot of excitement from the market maybe, and that was about two years ago. We focused on partner certifications right now, but always a concept to, for us to be thinking about. I will share your comments again with, the person who owns the academy, the education element education team. Okay. Server search. Best way to deploy your own. So, b g, the best way to deploy your own AWS account is, two things. We got Starburst Galaxy. That's a software as a service, and we have Starburst Enterprise, which is you can run it anywhere you want. Probably the answer is probably running Starburst Enterprise and set it up using Kubernetes. Reach out to us. We'll make sure you got someone on account team, account rep, solutions in here, someone to kinda give you a little jump start on that. We have a lot of that stuff in our academy, academy. Starburst.io as well. That can show you some courses that you have to bring your own account, but it walks you through when, where, why. Then a second one is at some point, we'll go public with bring your own, what do they say, BYOC, bring your own container, I guess. Our software as a service today, it's our control plane. Do everything through us. You can do some things with your account, but obviously very soon, other than the control plane, we're gonna let the clusters and elements themselves be in your your account. And, so that's coming for as a service, It's definitely there for our enterprise product. Hope that helps. Katie, let us know. Yeah. Katie, still having trouble troubles. Katie, would you say type one more time. If you're still still having troubles, then I'll do one from scratch just so you can watch me do it. You sure that did not work for testing the connection. Okay. Just a sec. Okay. I'm gonna do it live just so you can see it. So I got this on the left here. Let's just do this one time. So, Katie, if you wanna watch, just watch me first and then, and then, at least we'll know we're doing the same things about that. So let me run a query because I wanna keep this cluster alive. Alright. So I'm gonna go to catalogs. Let me give a little bit of room here. There we go. And I'm gonna hit the create catalog button, which is kind of what we want you to do. This is for you, Katie. Hope you're watching. I'm here at Amazon s three, and I'm gonna call it nonsense because I just need a different name. I'm gonna tell it AWS access key, and I'm gonna see if it works right now because because you're right. There might be something that happened that I missed. Let's see. Boom. There's that. And then this long key key key again. These aren't gonna last forever, but they're working they should be working today. We'll find out. So I pasted those in there. I'm gonna leave it iceberg, and I'm gonna tell it leave it galaxy. And in the bucket, and the bucket says you need Starburst tutorials. And this one, I didn't highlight it because it was the same, and the other guy over there. And the other one is projects slash your name, and I'll just put Fred, Jones. Fred Jones. Yeah. Yeah. Turn these two things on, and here's the magic connection. Katie, I hit the button, and it went green. So they are working. It doesn't mean there's not a problem that I shouldn't or couldn't help you out with. Yeah. I hear you. I hear you. So, keep check trying, and we'll see what's going on. You can if we need if you end up just watching and you're still around, hang on at the end. We'll come on live and debug this one. I just don't wanna, slow I've I've already talked a lot. I wanna kinda get into the lab. I apologize. But I do hope, we can get you going. And if not, we'll definitely yeah. Didn't work for you. Okay. That's weird. Quincy, if someone came on screen, would they be able to share their desktop? Yeah. Let's let's Katie, let's try that one. Can we not use a role based access? You can do yes. You can do all those things. I'm just trying to do the lowest common denominator. And in fact, what I'd rather you do, so rather, is bring your own bucket, bring your own cross account role, all that stuff's in our docs. I'm just trying to, make sure there's a way to, you know, have something for you. So absolutely don't use, in general, don't use the key pairs. There's other more sophisticated IAM activities. You we're hitting around the money. Yeah. Do a do a cross set up a cross account role is probably the best answer. Yeah. Yes. Bring your own bucket. Do it your right way. Okay. I'm gonna move on now. I think we're there. And if we're not, we're gonna see what happens anyway. Alright. Back in our thing, once all that setup was done I hope someone's here. If not, the rest says, well, run that, you know, run that seed stuff. So you need to kick this off as I promised. These values, they took took about six minutes. I'll tell you exactly which ones are gonna this one takes a minute and a half, this one takes four and a half, roughly, if you do it in the free cluster. And then, and then it says, you know, open up the activities and then just follow along. I'm supposed to follow along. So those that are here and watching on demand, I hope you fast forward to God here if you were comfortable with what was going on. Those that are here, one way or the other, let's do this. Let's do something. I'm going to jump over to the workshop query plans. I think I'm bumped up to one, hey, one twenty five. I could probably go bigger, but I can't see anything. So here we go. Alright. It does make sure a few things like, did you remember to use your first name and last name and all that kind of stuff. It says run this. It's a forced disk cluster. I'm using the naming conventions that are in those tutorials. AWS, US, use one free. Call yours free server. There's actually one that comes in default probably called free cluster. The problem with free cluster, it ends up arbitrarily getting assigned wherever it wants. Right now it's in US East Ohio instead of US East North Virginia, so two versus one, if you know the region numbers and all that silly stuff. That shouldn't be a problem, but we actually are doing all this on the cheap, so we want to make sure that our cluster resource and our bucket resource are guessware in the same place on Amazon, so we don't get in trouble and don't spend lots of money and that kind of stuff. So absolutely cross accounts, cross region, cross accounts, all that kind of stuff. Clustering, making a query between Amazon, I'm sorry, s three and GCS and the Azure Data Lake. Yeah. That's what this Trino does. We're not going to get into all that, but absolutely you can do all those things. Alright. So I had a few things. I said, look, run this. A few commands here. Again, those are here. I hope you're running. If you're not, you're watching this, great. All I'm doing is in case the cluster, we offered a feature a while back called results at caching. Cool feature. But we don't want it in what we're doing. We're gonna turn it off, so this is just forcing it to get kicked off, and then I'm just picking the temp cat and query plans that we built in the other SQL bit. It's right here. Yeah. Yeah. Then again, we should see three tables. One, two, there they are, logs, daily. Those three tables that you see scrolling at the very bottom. We're set up. Now, I must say this right now and say it a couple more times. Don't panic because we're using a really tiny server. We're running the entire Trino cluster, Starburst Galaxy Trino cluster in one node, a cheap, cheap node that has one worker and then the same worker is also the coordinator. So don't expect miracles, but we will see improvements in everything we do for sure. And also the size of data, don't expect we're talking, you know, twenty twenty gig ish, something like that to play with just to kind of get us a ballpark. Alright. So what did I do? I said, hey, let's do an example zero. I said, why don't you highlight or just put your cursor on this line 41 and hit run. I said, oops, you know what? I think, maybe I had an error. Temp cat query plans line them. Oh, did I forget some you know what? I did it. I left all those out. Okay. That's fine. I thought I'd I'd okay. What I wanted to do, and I realized now I had a whole bunch of setups here and I threw them out with the baby here, and orders customers. You know what? We can fix this briefly. We can fix this. This one's going to be on the fly and I'll update the notes next time. I think if I say something like use t PCDotTPCHDot tiny first, so I'm going to switch to that up here, tpch tiny. Then I think this query will run. And if it does, I'll add that chat in the chat and say, let me see if that will at least run. If not, I wanna bear. Okay. I've made copies of all these tables. There's a catalog called the TPCH. It's a, you know, it's a, kind of a test generator right here, TPCH. I just pointed to it. I originally had those all set up over here, as local iceberg tables, but nonetheless, I've missed that piece. Let me add that in the chat for query zero, and I'll swirl this up for query zero to run execute this first. Okay. Now I'm going to write myself a note. Example zero broken. If you're watching this on, on demand likely that will work without that little used to p c h tiny. Mainly what I wanna do is I want to play around and say, in query in query one, I don't wanna really just tear this query apart. You see six, seven, eight tables joining. I wanna show you some of the places you can find the query plan. And, where are they? Well, one way is when it finishes, you can hit this eyeball, show the query details. And when you go to query details, you know, there's a couple ways. You can go to stages. And I'm gonna go back to look at some slides here in just a second. You can see a list of what they call the stages, if you're familiar with that. If you're not, we'll look at that and revisit it briefly in a second. There's a top to bottom one. There's a visual version of that query plan itself. While I love this, we'll probably use it very little today because I want to focus in on some very specific values and that kind of stuff. But at some point, it's going to take in a pretty picture. There it is. You can, you know, go to full screen and you can zoom in. Again, if you don't know what we're looking at right now, that's not the end goal. The goal is to show you that, yeah, the tool can let you know. I did a table scan of region and and I did a broadcast oops. I did a a broadcast to that, and then from there, I did a filter on the, you know, the nation table and joined those and so on and so on and so on. I did all the activities. Actually, did a lot of stuff. There's a visual version. Then of course, there's the good old fashioned textual version under Advanced called Query Plan. Just hit click to see more. I'm going to go back and revisit this. If this stuff is totally new to you, this is where those three videos are going to help. If you're still totally new to you, stick with this because I'm going to point you at certain areas that's going to help anyway. There's our fragment zero. That's the last thing we read query plans from the bottom to the top. The easy way to do to see how complex it is, you go to the bottom and you look for a number like fragment eight. And you say, okay, this is a nine nine stage job because it's going to work eight, seven, you know, six, so on. So five, four, three, two, one is going to end in 0. So that's a couple of ways. What else can you do? Some of these we're gonna do live in a second in our stuff, is you could actually just say, you know what? Instead of seeing all in the tool, especially for those that are doing this in raw tree now, you could just simply go in here and do things like this. You can say, remove these comments. My notes down here say, okay, we looked at that, we looked at query plan and everything. We're going to run it with Explain and Explain Analyze. Explain says, putting the word explain in front of your query, simply says, go figure out the query plan and tell me the query plan. That says, this should look a lot like what we saw a minute ago, but this is just logical. This is what it will do. Do some broadcast joins, a bunch of broadcast joins. I know I'm not explaining it. I'm just we'll get there. But, yeah, that's one way. Now that one happened really fast because it wasn't explained. It said only do the, only plan. Make a make a plan. Don't run the plan. If you say explain analyze, it'll actually run the query and behind the scenes and give us when it's done, it's already done, what they call a I call a full explain plan or an analyzed explain plan. It's the same thing. If you look, it should be the same numbers and the same values, but what it fills in is details like what really happened, how much CPU, how big it is, how much data it was, when or how many rows. This is the kind of stuff, that we'll be looking at after query zero. That's query zero. Just get our hands wet. I'm going to go back. If you did that other statement, I'm going to just paste this use, tip back and say toggle back to back to this for query plan example one onward. Again, this will be sorted, for you on demand viewers. Let's look at my example number one and start doing something interesting. We're way into it. Thirty minutes into it, ninety minutes we're doing something. Let's do something super easy. I'm going to highlight this. Run this command, example one. What is the command? It's like I said a minute ago, it's explained on a table called LogsDailyRollupWork. If you don't know what LogsDailyRollupWork is, you can just run a query on it just so we see it. I just want you to see it one time, select all from. Boom, boom, boom. Just so you can get about it. It's like a server log type information. There we go. Event time, an IP address, an application, a PID, and then just what happened. This is a little generator that I've created a while back. It's a little GitHub project. There's a lot better stuff nowadays with AI to generate some test data than that, but it was something I did and you and still use at this point. Alright. So there's the table. And, again, explain to analyze. What is it gonna tell us? It's gonna say, to do a full table, I'm gonna look at every row and every column in this table. You know what I think we need to do before we look any further? Is we should take a moment and look at my, slides. Alright. Let me let me pop into this because I wanna show you a couple of things. This is for the folks. This is, so we got our environment set up. That was important. A big deal. And let's talk about the engine. What we really wanna make sure is if you look at this on the right and you already know that, oh, Lester, I see that's a join. It's a partition or a hash join on two tables called orders and customers to get some kind of results. If you know what that picture means, then you already know how to do a lot of cool stuff because you learned it here, you learned it in Hive, learned Sparkler, made in Trino, or you watch those videos I mentioned. So those videos will help you ultimately understand what that picture is. And when I went ahead and did and I put about 10 or 15 slides here, we're not gonna we're not gonna, exercise all those, but I want you to see these slides because they're the kinds of things that if you don't understand, you can surgically jump into those videos and try to understand. I'm going to rattle through this really quickly. Parallel processing engines like Hive, like Trino, like Spark, do what? They run out of big giant dataset, and I'm seeing a logical dataset. It might be a bunch of files even, not just one file, but a bunch of files, and they end up working on chunks of the file at once. We use the word splits. There's other terms like blocks and partitions by other people, but it's the same thing. There's a thing. It's got 200 splits or maybe 200 files, then we got a big enough cluster, we can work on all 200 at once, the key to parallelization. Just the same concept here. It says, okay, there's 200 splits. They all have so many records. As an example, maybe 200,000 records. And if it takes about twenty seconds to work on split one, well, if we can work on all of them at the same time, it only takes about twenty seconds to get all we did. These are just what's happening in the engine, especially with the the the source reading parts of the engine. And then what happens in a cluster, there's a little bit of a Trino cluster terminology, the coordinator, the brain, and then a bunch of workers. What happens is once it figures out how many splits, how many splits are there? The datasets, the data, the table drives how many splits. The bigger it is, the more splits, generally speaking. And as a heuristic, as a complete rule of thumb, not an exact truth, is you might say, a file equals a split. So I've been saying 200 files. It means I can I need to process about 200 pieces in there? To my point earlier, if I can only work on a 100 at a time and I have 200 pieces and it takes let me go back and it takes twenty seconds, in fairness, it's gonna take about forty seconds because I can only work on about a 100 and then I need to work on the other 100. So parallelization works awesome when we do everything at once, but if we can't do it all at once, well, you know, if we have not a big enough cluster, it takes twice as long or four times as long. And that's the premise of what people have always said. We'll just add more nodes. That works until the dataset's smaller than the cluster, but there you go. Alright. We work on the splits, we handle that at each on the machines, and then more terminology. Those splits that happen in the machines called the workers and in a process thing called a task, I guess, that unit of parallelization is called a stage. Stage or maybe a term called a fragment. Again, I'm not trying to teach this. I'm trying to remind us that you already know this or it'd be beneficial to kinda learn this stuff. And I would say if you're in this world, even if you're not in Treito, but you're in smart only or high only or something, these fundamentals are critical, and I encourage everyone, to learn about them. And and I have a lot of content if you'd like to reach out, and you need some help. Okay. Good news is stages do a lot of work, but to be honest, sometimes stages one stage is not enough. You're going to see when you do a select star from table, that's a single stage. There's nothing else to do but work on it in parallel. But if you do other things, groups, orders, joins, that kind of stuff, then we need multiple stages. And what happens is the data has to move from one stage, which is a unit of parallelization, to another stage. I know that sounds scary. Stages move data between there with they call them exchanges. So before we look at this picture right here, right there. Let's just do a couple of those single stage ones, and then we'll come back to this. How about that? Alright. This is where we work, query editor. So this was this. Explain, you know, all from log troll up. And that's easy. This is a single stage. There it is right there, fragment zero. It's a stage that's doing what? A table scan of a a a table right there, logs daily roll up work. Since I said select all from, it says, well, this is what I'm looking for, that's the columns it found. In fact, it's calling them out here. Here's what they are specifically, data types and stuff. What are some other interesting things in there? And it's gonna on the way out, it's gonna give the same thing. So just echoing the full sets of, parameters in and out. You'll see in query plans, there's a lot of, like, programming. Come into, go out to. What's coming in? What are they calling out? And what do I do in the middle? In the middle, he's doing not a lot. He's just doing a scan and then passing it on. Some things to notice are this, estimates, bunch of question marks. I'll show you that in a second here. I think that was the main stuff I wanna show. So let's look back at this. I said a couple of things. I said, if you look in the output, yep, tells you it's Trino, tells you it's a single stage. Why I didn't find any more? And it started at zero. Zero is the last one to run. And I showed you that it was a full scan in and out, reading them all, outputting them all, and I mentioned those statistics. Why are there no statistics and what are statistics? Statistics are something that's been around a long time. Since databases have been around relational databases and they do things like this. They tell you column by column, you know, what's going on? How many distinct values? What's the count? All this kind of fun stuff. Often, you know, at high and a low mark, a minimum value, a max value. The bad news isn't this one says no. No. No. No. No. Which means there are no statistics. Okay? I would say the good news is, for statistics for most of our system Yeah. If the it's gonna be ninety minutes, Rich, and I'll let, Quincy verify that and puts it in the chat, but, yeah, I don't expect it's gonna hang up on it. So apologies if we set it up for sixty. It's it's admit to be ninety minutes. You're right. And while those stats, as my comments kinda say I think it says over here, they probably won't hurt us too but say it somewhere. They probably won't hurt us a lot, because what's gonna happen I'm gonna run another one here called, show stats for logs daily roll up or stat. So it's the exact same table, but one that I calculated statistics on, and this is what we should see. You know, process IDs have these like a process ID. The lowest one's at 99. The highest one's at 5998. If you know anything about Linux or Unix process IDs, that might make some sense. There's some time stamps. Some things don't get a lot, some things just don't have an answer. And then a real important thing is this one at the bottom where the column is called, whoops, the column is called null, and that one means, hey, how many rows are there? What is that? There are a 100,000,000 rows, it looks like. A 100,000,000, 100,700,000 rows, something like that. Okay. Cool. What I'm gonna do is example two, just say, do the same thing again on that other table. Now to calculate what I could have done is did analyze that table, but as you saw in my other one, it took about a minute and a half to do that, so I just did it for you in the background. I didn't want to slow us down at that point. We did already have it slow enough. And it says, notice in the output, the only difference is that there are some estimates right here. Estimates. I think they're in about this. Where does it get that? It gets it from a combination things that said, hey, there's that many rows, so that's the number we saw. Then it says, oh, this is about 20 gig. Well, it looks at the data types and everything and it makes a pretty educated guess. These are not real values. This is what it thinks that data probably is. That's probably about 20 gigs of data. Everything else here is exactly the same. There's nothing going on here. So we'll just keep going. And, again, I'll look for, questions. Actually, I'm gonna pause and look for questions. There's the ninety minute one. Yep. Yep. Yep. Okay. Good. Good. Alright. So let's see if we can do something keep going with this. Alright. So, analyze okay. Now I'm gonna do the same thing. Explain, but I'm gonna sneak in the word analyze. Remember, this is the one that's gonna say, don't just figure a plan out. Figure a plan and I mean, figure a plan, run the plan. And when you're done running the plan, don't give me the results. Give me the plan back, filled in. Now another way to do that in Starburst and Trino is just to run the well, in in Starburst because we keep track of the history of this. So if you just say, you know, just run the, one twenty seven and one twenty eight, Then as I showed you earlier by clicking that eyeball, when it's finished or even when it's running, it has an active version of that. You can see what's running current range up. You'll get what I want what I'm gonna show you in the, output at the bottom there. Now this is taking longer. Well, what did I say earlier? A couple of things. I said it's a 100,000,000 rows. That's not very big. 20 gig, that's not very big. But the problem really is I got a teeny tiny cluster. There we go. I said it the second time now, and it's saying I'm working on it, dude. Give me a second here. It's truly reading everything, all the files. I didn't tell you how many files yet or anything like that. We're about to see, all those kinds of things. Hopefully here, maybe in a two minute total, so I'm at eleven and twelve, We're going to look in the output and hopefully, we're going to see things like you see here. We're going to notice it took about two minutes of clock time and in fact, it took about two minutes of CPU time. We're going to notice, that there were estimates, but there were actuals as well. So we're gonna see the actuals, were this pretty close to the same number. Actually, the exact same number because I ran the stats a little while ago and it, at that point, was perfectly current. And then we're actually gonna see that the data footprint was even smaller. It wasn't really 20 gig. That was really about 4.73 gig. And you might ask, why is that the case? I'll go ahead and tell you if we did a show create table on this, we'll see it's, it is, an ex hive external table, but the file formats is the Apache ORC format, and that just means it's encoded and compressed like Apache Parquet. So instead of it being logically about 20 gig, this dataset through encryption through encoding and compaction came out about four seven. And then the weirdest thing is we're gonna see the splits. Alright. So let's so let's see if we can see all that stuff. Speed, stats, split size. Okay. I'll say it ran. What did I tell you? There is one stage, same stuff we saw before. Do a full table scan of all the fields. There's our estimates again. But then this line and this line got filled in with real details. This one said maybe it would be I I wanna be just going downward here. This one says it took that much of CPU time. Okay. Sounds about right. To be honest, it means it was waiting here and there, but that's another story for maybe a little bit later. But it actually gives you exact details. Hey. I read that many rows. I actually read 4.73 gig. Remember, it thought it was 20 gig, but it's four seven. That's okay. And they give you a lot more details. One that I'll look at is this, splits two zero five. There were 205 splits. Now anyone remember in chat what what drove a split, what a split was, and all that stuff? Remember, splits were something like, like this. And I said, how many splits are there? I kinda said the number of files is the number of splits as a heuristic. I think I said that. So maybe there are 210 files. Let's go find out. So we're gonna run a quick query right here that says, hey. Go look at, this is a Hive table, and Hive tables have these, fields called, hidden fields or metadata fields, like dollar path, dollar file size, dollar last modified. And what do they tell you? They tell you this row is aligned with this this file. So I just did an aggregate. I'm sorry. I didn't do an aggregate. I said, look across all of those actual distinct, number of, paths. And then we said there's a 105 files behind the scenes here, and the average file size is about 48 bags. So if you know about small files, I'll let you add answer the question on the side. If you want help, is 48.4 meg a small file or not? That's what I'm working on. A book right now is sort of the topic. I'm on this I just finished that chapter or a chapter that included that. But nonetheless, there's a 105 files, but there were 210 splits. Now you might notice that a 105 times two is two ten, but you might ask a bigger question like, well, Lester said that, let's see. Command click will get you there. Lester said that, number of files equals number of splits. Well, I gave you a link here to something I published a while back that says, for Hive and TreeNom, what's really happening? And the short answer is this. Hive says, hey. I'm gonna start out with about 200 splits. So no matter what he tries to be, he's trying to paralyze out of the gate, which means he assumes it is kinda big anyway. So that got us closer to why is it not a 105. Then after that, it said, okay, cool. Then what should I be targeting of split size? Somewhere between thirty two and sixty four. So long story short, it just worked through all that and realized you could run the math here with the numbers that we have, and you go, okay. A 105 files became, that probably should've said well, this one's this particular one was, you know, about 02/2005, but our 105 became two ten, and it makes, you know, a little bit of sense if you walk it through there. Alright. So that's gonna be dependent on high versus iceberg versus whatever. Alright. So what do we wanna do now? We're still playing with that single, and I'm gonna do one or two more then we'll get into the multi, multistage looks. And the questions are drying up. So either no fun, too hard, make are too easy. So don't hesitate to you know, you can say something like that. No one's gonna hold you to it. I'm not gonna feel, heard if you say too. Or I I this is an advanced conversation today. And as I said earlier, you know, it's it it encourages I think we did not I think I did a bad job of not making it crystal clear on the registration page that it would be who of you to learn a little bit more. But if you're here, hang with me and realize there are other other resources. This is a long journey. You're not gonna I'll be honest. I've done this for twelve years since Apache, Hive, and Java MapReduce, and it wasn't something I grasped immediately. In fact, I think about four or five years ago when I came to, Starburst and really started looking at Trino, a lot of knowledge that I already had really hit hard because I think Trina does a really good job of showing these query plans and making it kinda easy. So if I make it look kinda silly or easy, it is silly and easy to a point, but it takes a little time to get there. So these are all materials that you can use on your journey to get you there, and I'd love to help you out along the way. Alright. So what I wanna do, I wanna say this. What if we did another table? Here's one. And I think my server hopefully didn't die while I was talking. There we go. Still alive. I have another table now, and I call this table, logs iceberg. Right here, logs iceberg. If I open him up versus the one we were just looking at, it's the same thing. It's the exact same number of records and rows. If you really wanna see where it came from, back over there in the setup script, it's right here. I did a c task. It said create me a table called logs iceberg, make it iceberg. I went ahead and used the iceberg v three spec for those that are kinda following all that fun stuff, and I just did read everything from that Hive external table. So I rewrote everything. I rewrote a 105 work files into into what? K. Well, I rewrote them into hopefully, I can see that. I rewrote them into eight files on that on average are 569 meg. Yeah. Eight icebergs. Yeah. They are big icebergs. Yeah. Good good, Rick. Rich. And you saw earlier that we had 205 and they were about 45 meg. So that's a little different. So the question that no one answered was, was 40 meg a big, a we never really say big file. We say small files. We look out for small files. That is a whole another conversation. I'm sure you've heard the phrase. And, yeah, anything under about to me, anything under about a 100 meg is technically probably too small. 40 meg is not terrible, but it's not big, so it's not great. This five sixty meg is actually really good because, and again, there's a deeper conversation I'll show you at the end when we talk about next steps. If you're interested in learning some of that stuff, I'll show you where to look there. But this is good. I got eight files that are big, and let's do the same thing. Let's do that explain and analyze. It's going to run through there. Like before, it's actually gonna read the whole darn thing. They're gonna hold our answers, throw in their results to the bit bucket, because we don't really want the results, we want the answer. In fact, people sometimes do this when they want to exercise a cluster maybe for some benchmarking, and they don't want to be penalized by the return sets. They may make their JVMeter or something, their tools, they run a bunch of these explain, analyze, and then some complex query because they just want to get quick answer back to say, Hey, how long did it take him to run? And they want to see if this runs out. The good news is it's going to finish here, and it's going to finish a little faster. We're going to show you. I'll go ahead and lean into it and show you what we'll see in the in the query plan. Suraj, I'm gonna answer that question in one second. That's a good one. It's gonna say, hey. It's gonna it's gonna run a little bit faster clock time. It's gonna say, I was, you know, one and a half instead of a little over two. Okay. Small cluster, blah blah. CPUs, quite half as much resources. That's good. We're gonna see that it had a lot smaller splits. It's gonna say, oh, I got 40 splits. Not too ten, forty. And then I'm gonna say Lester said a file equals a split and you had eight files. Why are there 40 splits? Well, we're here. Why don't we go ahead and click on, that command click. I thought that was it. Click in click. Oh, it did it. There we go. Number of splits in the hive connector, split split. You know, I'm supposed to put a link to did I put the wrong link? That's okay. We can get you there. So I'm gonna click on this. So let me say, another note for myself, it's, iceberg splits. So, I also have a blog. This is this one didn't make into one of our resources, but I have one called splits. Let me just search the word split. Split split split. Number splits with iceberg. There we go. This one says, well, how does the iceberg system work? And a long story, sir, sir, it's pretty simple. He kinda works on this notion of a split size, and this is universal because this is a standard and all engines work this way. They say, hey, what's my read? What's the size that I should be calling a split? By default, again, I'm writing a book. There's a lot of topics on this right now, but let's just call it simple. One twenty eight is, probably the low watermark for definition. So if you know those files were five something, you divide 500 and something megabyte by one twenty eight, you get 4.4, which means each one of those files was about five splits. Five times, eight files is 40, which gets us to the magic number of splits being 40. The size of the datasets drive the number of splits. I'm gonna pause there because there was a question about a limitation of file size. So RADA, they're yes and no. These are big not just Trino, but these are engines were built. Java MapReduce is probably the first engines that implement the Google MapReduce paper. Their theory is we should be able to run anything. Arguably, size is not a matter. I will say that Strino, which is the engine we work on, is designed to be, you know, fat. It favors this notion of performance over reliability. So it does and to do part of that, it likes to put everything in memory. Now that might sound like, oh, I read two terabytes of data. I need to do two terabytes of memory. That's not exactly true, but it because it's always trying to just hold on to just enough to solve the full question. So the short answer is you can do some cool queries on some ludicrously large tables. Even in this tiny cluster, it just might take a while. But if you wanted to say, give me every row and every column from the table that's, you know, 60 terabytes or 600 exabytes, you'll still probably be okay because this this notion of in memory doesn't mean it has to get it all and then give it. It's a streaming implemented internally as a streaming model, so it actually can start giving me the answers as fast as it can. So the short answer is there's not a limitation of thal size for Trino. Some scenarios could cause an out of memory exception. So how do you solve that in Trino? We either revisit your query, you revisit your data, or you turn on what we call fault tolerant execution mode, which then starts using something like object storage for intermediary data. We start to get that resiliency that Java MapReduce and Hive and Spark arguably have, which says, I'll go slower because I'm going to burn intermediate data between stages to disk, and as long as the disk doesn't run out, I'll eventually get done. So we can do that too, but by heart, we we don't that's not our default setting. That's something that showed up about five years ago. So I didn't answer it perfectly. Tree now the limitations on memory, and memory does not mean the size of your table needs to be the size of your memory. That's not what that means, but but that is technically the answer. So, again, we can go into if you hang around at the end, you wanna go into it more, we can gladly do that. Okay. Again, a couple of things that went a little faster. There was that one four three instead of two something, little faster c little less CPU time. But arguably, the main thing was it had fewer splits. And while my notes say something like, it finished a bit faster, I will say that the truth is this model of having fewer larger files than many smaller files is gonna win out at size and scale. Again, these both of this is two gig of data or half actually, it's more like 500 meg the size of the park case and works fine. So it's pretty tiny and pretty small cluster. But I do have datasets that I can show you. Not here because I wanted you to be able to reproduce this. I didn't wanna, make you create calculate all that. And if you really wanna play, that t p c h, you can load that data to your own catalog. And these s f's are called scale factors. It just means that whole, the the output of this entire eight table schema should be about a 100 gig or one gig, whatever that number is as you times it by gigs. And it's a data generator, A deterministic data generator always have the same answer, and these aren't the finite numbers. You could you could make up, s f seven. If s f s scale factor 10 is too big or you don't see it here, you can call it scale factor 10, scale factor seven, scale factor 22. So good stuff in there. Just, whispering nuggets and then of stuff there. Bigger engine, bigger data will probably be in better shape, but nonetheless. Okay. Let me see. Cool. Cool. Alright. So I think I wanna get to the multi stage stuff, but let's do a couple more quick things. Guess what? We'd rather you do projection. We would rather you not ask for all columns. These are column or file stores. They may be hundreds of columns wide, but you may only need six or 10 or 12 or 30 of those hundreds of columns. And my server shut down, so it's starting back up. So we're gonna pick a penalty here while that's coming up. But if we do something like this so the only thing I did is I swapped out the star select splat for five of my eight or nine columns. And here's what we're gonna notice. We're gonna see in the query plan, yeah, I see that. I only grab those values, but what we're gonna see is a significant improvement on performance. We're gonna say, hey. It's taken forty one now because it hasn't done anything yet, but it's gonna take less than twenty seconds as opposed to close to two minutes. Same exact query. Instead of burning under the covers a minute and a half of combined CPU time, it's gonna burn thirteen seconds instead of one four. Instead of, you know, the physical input time, two and a half minutes is the more numbers that you can kinda play with. These are significantly smaller by asking for over half of the columns, but I'm leaving out a couple big columns like one called comments that had a bunch of nonsense in it. And in fact, I want this to finish at eighteen seconds, so we'll run it again now that, this cluster's up. So you don't think I'm making that up. So that's good. That's a huge thing. And you might ask, Lester, we're an hour into ninety minute conversation, and we've talked a lot about inquiry plans, a lot about that first stage. Guess what? They there's gonna be two places that take a lot of time in your queries. One of them, at size and scale, will always be that source stage when you read some data. It will always be a big deal. So all those principles that you either know or you don't know about data lakes, which are concepts like file formats. Use things like column or formats like parquet instead of JSON. File sizes don't have a billion, you know, 52 k files or a million, fifth 15 meg files have a 100,000, you know, 256 meg files. So getting fewer, bigger files, these engines work better. Longer conversation one, but they were better. Then what's the last one? Oh, a beautiful thing called partitioning, which I purposely pulled out at the last minute. If you watch those videos, we show a lot about partition pruning. I'm glad to talk about those. You can run some examples that you don't have access to the data, but to get size and scale, it's harder to do it. Alright. So there we go. It finished. I'm saying the point that this is one of almost always, if you open up a query plan like this one and you look here and you say, hey. How much time, the whole thing here we go. At the very, very top, it's gonna say something like this. How much time did this query run? Right? And it took clock time, twenty seconds. Right? Well, it also took, you might say, okay. Well, what's going on? We've been looking at one one stage, but each stage tells you things like that. How much time, did I work on that? And it took, it didn't touch it didn't take it didn't show the, clock time in here, but it showed this. I burned 14%. Now what I'm looking for, I don't see it, is the percentage guy. Usually, we see there it is. This is what I was looking for, the lowest level. This tells us because there's a bunch of stages you'll see in a second, but every stage is gonna do something like this. It's actually doing it at the what they call it, a fragment node, an element of parallelization of sequence inside of a parallelization. But you can quickly open something up and say, hey. Look for, you know, look for percent close error comm or something. You can walk through and see, hey. Of all these 50 stages, tell me stage by stage who took the most resources. Well, this one had one stage. It took all 100% resources, but very likely, most of your queries, a huge percent, often more than 50%, sometimes eighty, ninety plus percent of your work is in that first stage. I'm harpy on it because it is the place that often go. In fact, that's the what you do, and I'll run a bigger query and and show you in a second. You rush you raise to okay. There's a 100 there's 20 stages. Do I wanna look at them all? Do I wanna see who's the high pole in the tent? Who's burning a lot of time? Because that's the area to start looking at a little harder and decide, can I do something or is that the best it's gonna be? Do I need to change something like that? Alright. So I rambled a little there, which is my, specialty for those that know me right, Quincy. Ramble ramble ramble. Alright. So power projection, that was significantly better. Significantly better. And I guess I left it off. I should've I should've said it. I I don't know why I took it off here. We make another point. Projection data reads. Data read amount. If I look really close one more time, I should've called out the fact that let's see if it's in there. This. I read all the rows, still a 100,000,000 rows. And you remember the 100,000,000 rows, the input of that was, some big number. I don't know anymore. Five gig? I think it's five gig. Logically, it was yeah. There it is. Logically, it was about five gig, $4.00 9, the estimates. Yeah. Nonetheless, I guess what I'm trying to say is by only getting those values and using a columnar file format, this is using parquet, it only read from s three just those columns. It didn't have to read all the row like we saw before. So please use something like Iceberg or Delta Lake. But if you're still in Hive, use orc. Use Parquet. They are columnar stores. They will make your life and your queries run a lot better. Okay. What else can we do? We could do a concept called the predicate push down. Yay. Fun stuff. What's a predicate push down? Same query. I added a where clause where the app name is in CRM or ERP. There's 26 distinct app names in there. This is going to run. We're going to see it move from a table scan to a scan filter. I'm going to try to look at these side by side here. Let's go to the top. Down here at the bottom, this no became a scan filter instead of a table scan. It's the same table, but as I'm highlighting the way down here, it's saying, look, I was able to see, you know, I took your I took your work loss. I made a predicate. And what's gonna happen is that column restore in those files, in fact, within those files, a concept called a row group, even smaller like a file within a file, knows the min and max of values. The bad news is everyone of those files, everyone of those little row groups have all 26. So I'm not really getting the value right now out of this one because everything has probably CRM or ERP, and so I had to read everything. So it's still red down here, a 100,000,000 rows, and there's just the columns. If that data was already resorted or it was partitioned on those values, you can read even less, but here nor there at this moment. Not at any faster because they all had to be read is what it's saying there, but depending what you did. Then I guess another big thing is this, estimates number records. Oh, this is a kind of cool one. So it made an estimate right here. It said, hey. There's a 100,000 rows, but I'm gonna find seven whatever that is. 7,000,000. Seven million seven fifty five. Seven fifty two. Yes. 7,700,000 rows where the plan the query plan planner, Tree Nose Query Planner, the cost based optimizer said, I think there's gonna be about 7,700,000 rows. And in fact, it found, it outputted 7.7 almost exactly. Right? It's a little bit off here. And you might ask, one, how did it figure that out and how come it was so darn good? Well, this is a data generator and I purposely wait did not wait in those, call those those application names like ERP and CRM. They were very evenly skewed. The estimate is gonna base on looking at all those files and ask a lot of questions. But for the most part, they all had a lot of values that looked from a to z. So it just kinda said, I don't know. It's probably two two twenty sixth of the data, which means there were 26 distinct things. I knew how many rows there were. It looked about average, so I just decided, hey, you know, give me a basic math. So it came out really close because I get my generator purpose of this building to the fly. Yeah. Maybe I rambled on that one, so I'll move on. Estimates are good. The actuals are good. That's kinda cool stuff. Alright. I think I'm gonna take a moment, take a deep breath, and I'm gonna toggle back over to my slides and say, hey. Questions are quiet. Don't know 100% how to read it. Hopefully, it's been, it's being fun. Hopefully, folks are hanging around. I know it's deeper for those that are with me already, you have my salute. Thank you for hanging with us. This is deep stuff. This is not day one stuff. If it's day one for you, hang with us. Listen, because that's how you find stuff out by hearing it a bunch of times. But use this as a resource. Come back to it on your journey, and and maybe share with others when you need to share some time. Alright. So the truth is queries are not single stage. More often, unless you did select a few columns from a table and that's it, they are never gonna be single stage queries. You're gonna do an order by. You're gonna do a group by, you're gonna do a join. You can do some other stuff, but let's just talk about those three briefly. And, again, this material, I don't know I'm not gonna walk you through all this, explain it all. But if you remember right here, I said, look. A stage is the unit of parallelization that we can work on a dataset in parallel. The reality is to do something else, you you have to say, okay. I have to move all that data to another logical dataset. That itself is in pieces and chunks. So you have to move you have to shuffle, exchange. There's a lot of cool terms here over the years that different engines have used, but it generally means I'm gonna rethink what data is in I might go from 200 splits to another data set with 200 splits, but the data that's in split one will be in different places, so on and so on and so forth. And that's what not going into great detail. Generically, that's what this is saying. Hey, Stagette, we had four data files, made a logical dataset here. I used to keep using the word logical because rarely do these things ever just fully load and persistent as one giant in memory dataset. Inceptions might be Spark, who at times you say things like peg that in memory because I have a reason. But for the most part, engines don't want to hold on to all that data. They want to read it and throw it away if they can or move it on and so on and so forth. That data has to get moved, it has to get shuffled or exchanged is the term we do. Why? Now I have another multi split dataset that's no longer tied to the actual data source anymore. It's part of the processing, and then I do something in parallel against it. If I'm not done, I do it again, I do it again, I do it again. These are the stages that you might have heard about before, stages in the step. So let's just talk about three of them real fast and then we'll go look at them. One is an order by. Select these two columns, order by those two columns, and it just says and this is, again, the bigger series. If I read a bunch of splits, the bad news about sorting is sorting is hard to do in parallel. So this is a simplified model that says, well, guess what? There is another stage that would sort, but everybody has to send their data. We're going to exchange it, but we're going to exchange it down to a single place. This sounds painful because if I've got billions of rows, this suggests that I have to have enough memory to sort this in memory completely. I'm going to show you in a second that's not exactly what happens. We have a way to do some partial sorting in parallel and then do a final sort merge activity at the end that will scale to a ludicrous level, so don't worry about it. But this is multiple stages. Again, I'm purposely going, okay. I have a long I have longer sessions that we walked each one of these. A group by. Well, a group by says I'm reading this 20 split, logical dataset, 20 files or something, and I'm doing a group by column five. Oops. I said column five. Look there. I was trying to click on it, which just means it has to make sure that when you're in split one, any distinct values from column five gets sent to a specific split in the next dataset. Any distinct values, another distinct value from column five goes somewhere else. It doesn't matter where they go as long as they're consistent. That whole first stage can send all of column value a to one place, all of column value b to another play to a place, all three. Why? So when that stage runs, you can go, cool. I can figure out what does it say, the max of column three and, you know, so on and so on and so forth. Again, longer conversations. This is telling you if you didn't know what all that meant, those videos that I'm pointing to will help you there. And then just joining. Joining has, a solution, and I think I'll just go to the picture. Joining has a solution. This is the one at the very top of the thing I said, hey, if you can look at that and go, oh, that means I'm joining orders and customers together, you know what that means. Again, I can't master all that for you right now, but I can just give you a high level and says, this suggests at the bottom, there are two different stages running. One that's reading orders, one that's reading customers. And if I'm joining on the customer ID, it just means that they both need to send their data. So customers, customers as split one, two, three, and you see down here, you know, customer x has to go to one of the splits up here. Customer y needs to go it doesn't matter which one, it just has to go to one. And then the orders, which will have a bunch of hopefully, a bunch of orders for each customer, will make sure their orders go to the same logical split that based on the customer ID that the customer data did. Why? So when they're all done through, that stage can go cool. I have all the orders. I have a slice of, a chunk of all the orders on all of them, and I have a distinct set of, of users in here, customers in here. But I do know if I've got customer seven, then I also know that me and my split, that one slice of it, has all of that particular customer's orders so I can then do whatever I need to do to join them. I don't have to I don't have to go, oh, we need to look somewhere else. And this is how we're starting to talk about doing things in parallel. We have to create a a stage says, I need to be able to do the exact same work, but on a chunk of the data. Okay. Again, longer sessions if all that makes no sense. And then the other one's called the broadcast join. Those have been around a long, long time. I call it a map side join. Yeah. I think there's a couple other terms, but broadcast join. And a broadcast join says, you know what? What if what because really only join two tables at a time. We might write a query that has eight tables, but I'll show you in a second. To be honest, you're never joining more than two things at a time. You just may say a and b join and then that joins with c and then that joins with d. You might do something like that. Broadcast join says, what if one of the tables is small enough to fit into memory in that worker? Then if so, what can happen is the engine can very quickly do a stage and read that customer and then broadcast it, push the contents we have into memory to all of the machines in the cluster. Then what we can do is say, oh, cool. Now we can just read the bigger dataset from disk, from the data lake, from the Teradata, wherever we're coming from. As I get each row, I'll have that information. Maybe the example here is we're looking through orders and we're trying to say, let's identify, this order belonging to a gold customer or something like that. Maybe we only know that because it's part of the order or something. Or maybe we're trying to do a max order, an average order by customer, something of those natures, something of that nature. Alright. Again, can't explain it all in the short time we have. And as I promised, and I'll go right back to the the output it says and look at the time. Twenty minute. We're we're in good shape. We'll finish. This just says, guess what? This is really what happens. Someone does, hey. Let's do an order. Let's join orders and customers by customer ID. That's the middle of join stage. Yep. Orders. Orders. That's what it kinda says here. Like, it says to the second. We read everything from the bottom up, and then they said, oh, let's do a group buy on maybe their, medallion level or customer preference level, gold customers, silver customers, all that kind of stuff. So medallion or, you know, whatever valued customer lay leanings. And then you might say, okay. And then from there, I wanna do an average. And then from the average that I'm coming out of the group, that's the aggregate. I might want to sort by, you know, who has the highest average number. So maybe I'm doing it by customers by their, country or their province like states in The US, something along those nature. This is what happens in a query, visualization wise. Alright. So let's do this. For those that wanna hang with me, we'll go look at it some more. For those that are running away because you've been wanting to run away, I'll show you my next steps and then I'll get back into the stuff and do it live and hopefully come on there. What are some other steps? I gave you a couple of links, Dev Center. Again, these links are in that PDF. They're gonna get mailed to you from, Quincy. The system will generate all that stuff. I gave you that video series right there, query plan analysis. I kept being a shameless self promoter. I am working on a book. I see title over there. We're in the early releases now. We're pushing early release one. We got two more after that, and then eventually, a little bit later in this year, we'll release the book, if they're out there. Artifacts at the top, at the bottom, in the links, you know, to get you to GitHub where all that is. So I'm gonna go back, and we're gonna do some multistage, queries. Why not? And talk about them. So we'll do sorting, grouping, joining. Sorting, grouping, joining. Alright. So I know my system by this point shut down while I was talking, so I should've quickly came back over and started. So we're going to burn a good thirty seconds while it wakes itself up again here. Not a serverless solution today, at least, that I have here. It's actually a spin a little instance up. But what I'm at it what did I do? It's the same query before, select a few columns where where app is CRM, and then I added an order by. So I just don't wanna complicate it. Now while it's spinning up, we can look over here and say, Lester, you said an order by has to do this, has to have a stage to read the data. It has to send or exchange the data to another stage. It's a sorting stage, and then we're done. I'm going to show you that in practice, there we go. In practice, as it says here, it's going to actually add two stages to help us sort because I want to sort in parallel. Let's see what we got here. Here's our bottom fragment. We read bottom up. We did a read of our logs iceberg. There's our predicate push down, all the stuff you saw before. Then it says, guess what? I need to send this data. This is the when they say output partitioning, how am I going to send it to another stage? There's a couple of them. Round Robin is what it is. Just randomly send it places or walk a tree or whatever, walk a list. But there's hashing, there's single, all that good stuff. We're going to say, send it to a bunch. Why? There's another fragment. This is called, he says, Look, hey, I'm getting my data from, so the bottom one said, I'm getting my data from a source. This one says I'm getting my data from a round robin. This is not reading from a dataset. This is part of that logical, it's a logical dataset, it's coming from somewhere else. Where is it coming from? It says my source is actually fragment. Fragment means stage. It says, I'm coming from number two, and this is what I'm getting, these values. Well, that's kinda what it said down here. Two said I'm sending a round robin. I'm sending these values. There's a lot of echoing of what am I sending, what am I receiving. Then what it does, it says, okay, you work bottom up within a stage. It says, okay, I got it. What am I doing? I'm doing a partial sort by the app name. What he's doing is keeping track of because there's a bunch of this. The stage runs in parallel. If we've got 50 slices of this or something, it's going to say, okay, 50 individual pieces keep in memory sorting this. This is how we're going to start our sort. We're going to say, I have a bunch of independent sorts, and don't worry about the comprehensive sort, just build sorted lists. It might build, say, 10 of those. At the end of that, it's going to say, guess what? I'm going to send this, I'm going to output this into not round robin, not hashed, into a single place. I'm sending that dataset. Where is it going? Well, there's this fragment. He's receiving data into itself as a single place. It came from number one. That's the output of the the the, the partial sort. And what is he really, really gonna do? He's gonna say, hey, all I'm gonna do is do I'm gonna do a merge. So what he's saying is that my fingers are two different sort of datasets. One, two, three, four. One, two, three, or seven, eight, you know, two two, four, six, eight. He's gonna just kinda walk them over. Here's one. Do I have a 1? No. Okay. Here's 2. Do you have a 2? Yep. 2. Okay. Here's, what's next? This is 4. Okay. Wait. What's this 3344? It's just gonna do a if you've ever done a sort merge of more than, two datasets that were already pre more merging assorted several sorted lists, that's exactly what's happened. That is super fast. Super duper, uber duper fast. If I actually ran this instead of doing an output, you'll see that that takes no time at all. In fact, maybe maybe I yeah. I didn't I didn't run it. You can get rid of get rid of explain and run it or explain analyze. You'll see it happens super fast. I wanna kinda get to the finish line here. So apologies. What else is gonna happen? Well, you might do something like this. You might. It's the same thing. No. It's actually a total different something different. I'm grouping by app name, looking for a window of time, looking for about three, four days in a thirty five day window, grouping by the app name, and I just wanna know how many events, how many by app name, how many log messages were created in that time window. Okay? So let's go look at that guy. He does something. They all start with the bottom. You gotta start from reading data. So there's always gonna be the source single cell single table, and it looks a little different. Instead of a table scan or a table or filter or or filter table filter, I wanna scan filter, we're gonna do a scan filter project. K? Just means we we don't want everything. We're we're doing this, scanning with a filter, and we're only on certain values. That's all cool. That's neat and stuff. But, really, what does it really need? It just needs that one field app name logically. Right? Because you're just trying to do a group by them and account. So if you care less about any of the rest, meaning the dataset that it actually reads ought to be pretty darn small, not just the estimates here, but the actual dataset. And what it's gonna do is do this. It's gonna do a partial aggregate. So if there are 200 or there's what do we say? 210 or 40 or whatever, 50, splits of this data, that means independently what we're gonna do is build the count aggregate 50 times. So each one's gonna have all 26 of the app IDs and what their count is. Then when they are done, we're going to say, hey, why don't we output that, my data, based on a hash of the app name? That's the code word for saying, let's make sure that all of us send ERP to the same split in the next stage, and all of us with the CRM send it to whatever. It doesn't care where. We don't really care where, but send it somewhere. We hope it's all skewed nicely and that kind of stuff. So that way we can have something different. We have this stage that says, look, I got my information from number three. I got it as a hash because it was outputted as a hash. And then really what he's gonna say is, guess what? I got a pretty easy job now. I just need to do a final aggregate. So I have, you know, n number of, say, 50, lists, 50 in results that look like what I'm trying to produce, but, you know, the first one knows there's 20 ERPs, the other one knows there's 40, and the other one knows there's 60. So it can roll all those up. Kind it's a little bit more a little bit tiny tiny bit more comes sophisticated in the sort merge, but not much if you think about it. Grab all those things from list of 26, total them up, and then then send them on. And to be honest, how does it send it on? Well, we added a sort, so we're gonna use this and this. This is gonna look like what I just said a minute ago. It's gonna do a round robin with that. You might say, why don't we do a round robin? There's only a few things, but it just works its way every time, big or small, and it works It's just that way you have one piece of code. And then it's gonna do the same thing we saw before, quick partial sort, and then send that partial sorts to a remote merge or a sort merge element. It gives us an answer. So not a master. Not today. Maybe we need maybe we need a second one of these and go deeper deeper, but I'll wait for some feedback at the end of this session from others, what makes some sense there. Alright. So we did the aggregation. I got two minutes. That's not enough time, so hang around with me if you can or watch this video after the fact if you had to run. I'm gonna show you two more things. And, actually, I've got a whole bunch of them, but I got three or four more things. So let me just show them quickly, then we'll come back for those who wanna hang around. This one says, show me what happens if I did a join, on two datasets, an orders and a customers. And I did on scale factor, 100, and I went back over here and open this up and it said, guess what? I did a source of my customer and I did a source of my orders, and both of those did an output partition based on a hash of the customer key. So this is that notion. I said, yeah, orders and customers in both where they're gonna independently read each other, independently gonna send their information somewhere else. Where's the other where else? It's this other guy who's getting hashes, and he's actually getting it from both places. He's getting first. He's getting, I think it should be the smaller one. Let's make sure. Yeah. So it should be getting the the engine figure out. Yeah. This is the smaller one. So he's getting the customers, and he says hang on to that. And then he's gonna say, hey. Just basically work walk the other one, the bigger one, numbers one right here, and guess what? Just do a join. And I had no sorting, no more grouping, or anything after that. I just wanna show the basic sort. Two things come together, send it on. I'm gonna show