Video: Office Hours: Automated Iceberg Table Maintenance | Duration: 2337s | Summary: Office Hours: Automated Iceberg Table Maintenance | Chapters: Welcome & Introduction (1.975999999999999s), Office Hour Format (21.591s), Session Introduction (81.196s), Metadata Tables (211.821s), Performance Testing Results (683.176s), Maintenance Best Practices (1380.9859999999999s), Q&A Session (1489.2959999999998s), Maintenance Order Strategy (1940.321s), Scaling & Support (2021.576s), Closing and Resources (2149.011s)
Transcript for "Office Hours: Automated Iceberg Table Maintenance":
Hey, everybody. It's Lester Martin again, your, confident yet trusty and stable developer advocate here at Starburst. And for those who know me, maybe a little bit of a goofball. But, we're gonna do another episode of our, Office Hours webinar. And for anyone not familiar with the Office Hour webinars, while I'm talking, I might just put my connection before content screen up. See if I can make that happen. Yeah. Yeah. There she goes. What we do in the office hours, we we keep it to about thirty minutes total. It's a quick session. Right now, we're doing only once a month, but we anticipate maybe doing it once a week. So we keep building up a group of audience and stuff. What I'm going to do is talk about a topic. The topic today will be iceberg table maintenance or even automated iceberg table maintenance. My goal is to keep that demo no more than any presentation in demo, no more than twenty minutes, preferably ten to fifteen, you know, if we're lucky or even shorter sometimes. Why? Because the other half of this is to give folks a chance to to ask some good questions or questions about the topic today, you know, iceberg, automated table maintenance, or anything. It's kind of an ask me anything. It's in office hours if you think the old professor back at the uni or something. So let's treat it that way. Get your questions ready. What I would say in the meantime, you know, like Quincy put in the chat, feel free to just shout out where you're from. I'm from Atlanta, Georgia here in United States. It's a beautiful what is it? Wednesday? Thursday? Thursday? And, let's jump on in and talk about iceberg table maintenance. Alright. There was my connection before content slide if you didn't see it. I think this doc, this presentation is available to you up here. Where are we in the whole concepts of starburst? Well, you know, I actually have a kind of starburst, the core kind of data platform graphic you see there in the background, and then I really zoned in on, data connectors today as much as anything else. Why? Because we connect to all kinds of different data sources. We're we sit on we work we built on top of, the Trino query engine. Trino has a connector architecture, and one of those connectors is an iceberg one, so that's the one we're using. And then I put on the left, a little snapshot of a little teeny bit of what we call ADA because it's pretty important to us today. But nothing pretty important to us holistically, but nothing we're going to talk a lot about today. And suddenly at the bottom, I show you Starburst Enterprise, Starburst Galaxy just to make the point that we do have the install your own software your way in the Cloud, on prem, wherever that's Starburst Enterprise or use our hosted software as a service model. Quickest way to get up and running in Starburst. I'll use that. And if we're going to talk about Isoberg table maintenance, I'm not going to explain in great detail this slide. I'm we have a lot of materials here, that we could present to you. You saw that dev rel at Starburst is a great place, to, send me some questions or comments or thoughts later or find me on LinkedIn or whatever. And I'm writing a book on, with a tech guy on O'Reilly about performance optimizations on on iceberg. So another place. I think we're about to do an early release any anytime now. So there's a lot of good documentation describing all this going on. What it really says is there's a bunch of datasets, bunch of different files that live in this metadata folder underneath your tables folder. Metadata files, manifest lists, manifest files. And then just like the old days with Hive, you know, we have data files. We just kinda put them in another subfolder now under data. So that all lives together. Now we could spend time and always go back and forth and look at those files on your object store or HDFS or whatever. I actually did a lot of that when I was not when I first learned Iseberg four or five years ago, but when I really, like, two, three, maybe three or four years ago, really was digging in really deep and, wow, it can get pretty gnarly pretty quick. I'm not gonna I'm not gonna lie to you. Doesn't mean anything's wrong with it. It just gets a lot to go on in your head to see it all. So what all the engines, including Trino do, I believe they all do this, Spark does, Trino does, offers you ways to use your language of choice, your compute engine choice, say, would you look into that metadata and tell me the relevant information that I really care about? So we have these metadata tables. That's how we see them in in a Starburst, in Trino like this, select all from, and then its, table name, and then dollar whatever the metadata table's property snapshots. And I'm gonna show you some of those. Now these on the right are the old leftovers from Hive. Those are metadata columns. They're still valuable and useful. And there's an example. We'll do some of this here just a second about to get to the demo. I just wanna kinda make that big point. Metadata tables instead of looking at the actual JSON and arrow files is my recommendation. There's a shameless plug for some other, materials I have if you're really, really, really into iceberg metadata. Again, you can download this off the docs. There'll be links in there to all that fun stuff. And then the demo after we talk about a few few things and try some things that will absolutely set up some automated configuration. So why don't we jump on in? The good or the bad is I don't see any hard questions yet in the in the chat. So it's already sad. You know, I wanted some hard questions in there. Not gonna lie to you because, I love to answer questions, but I actually don't mind getting stumped because that's how we all learn something new. If someone asked me a question I just don't have a good answer for, you know, sometimes I might have to take it. Or we might save it to the very, very end and find it out together interactively for a few folks who are hanging around. Okay. So what I did in the background here, I'm spinning up my free cluster here on Starburst Galaxy. I only have this about 125%, you know, because I wanna be able to see a number of things. I do get that it might be tiny if you see me in that. Hopefully, at least it's high resolution. And again, plug for dev rel at starburst dot io is a way to reach out and contact, me later. Alright. So I think that's gonna start up here. What I ended up doing, I have a little cluster call, free cluster. I got this, s three, kind of Great Lakes connector. I can create all kinds of tables in there including iceberg. I have a schema set up called table maintenance of which there is a little, table I already preset up. It's called custom one. Custom one was just a quick you know, I I built it as a CTAS from the TPCH customer. And then first, I just dropped maybe a 150,000 records in there. Now I'm not so sure why my clusters doesn't seem to be wanting to start, so let me just see. Oh, there it mhmm. K. Well, we we need it up here. So hopefully, it'll be up in just a few more seconds here when I need the most. And then when I did with that customer one table, I just started playing with it. You know, add some more records. I did some some updates on the market segment, things like that, and I then I added some more, couple million records from a different skill factor. I'm just you know, I'm not gonna walk through all this, and I did all kinds of fun stuff, including changing values that were, partition key values just to make a lot of metadata and make a big mess. Then, ultimately, after I did this and that and that and this and I kept running and running and running, I ended up with something like let's see here. Select all from customers. I'll do both of these queries very quick. Ended up with, again, a pretty small table that has, records that look like for the most part, look like the TPCH, if you're familiar with that. If not, no big deal. It's a little data generator used for benchmarking. There's some customers with fake names and addresses and phone numbers and balances and so on and so on and so forth. And my table has, roughly 18, million records. Tiny table, but enough to get us to start. And, you know, we think about, the performance gains you're gonna see with maintenance, compound them, put them in a real world with your datasets and go, wow, you know what? I bet I'm in trouble. Alright. What do I have in there? I have a bunch of snapshots. Here I am querying that dollar snapshot's table I promised exists. Here's a bunch of those in there. I think there's 22 different snapshots. I've done a bunch of things just to set the stage here. But if I really, really want to know, some people think, oh, I'd go to the bottom and this snapshot is the current snapshot. That is not really true. It is probably true if you've never done branching and tagging. If you haven't created a branch, you haven't added a tag, you haven't done any rollbacks, and it very likely will be the last one. But there is a table called dollar reps or metadata table. And the easiest way to say what's the most current version, look in there and look for, a type called branch and then the name main because that's the trunk. It's the main, main it's the current snapshot. There could be all lots of other snapshots on branches and tags and just in existence. But I need to know that because, well, I needed it because I wanted to roll back to it and make sure I was in a good shape. What else is in there? I'm gonna take a look and make sure that's what I thought it was. I think I will note there's 8972. Good job. I don't need to roll back to it. I think I got it back in the state here. Now that, if you looked at that snapshot back in the snapshot table, you'll see a reference to the manifest, the list, the file, the actual Avro, and all that good stuff. But then from that, you might go look in that file and say, Tell me the manifest files. This is those the kinda last mile before they touch the data files, and there are, 50 I'm sorry, 22 of those. And each one of those references some number, as you see here, some number of files. So meta I'm sorry. A manifest file is the bottom of the three metadata, file, manifest list. There's one for one, and then the manifest list has many, usually, many manifest files, and they reference physical files. That's what's happening here. They're referencing all this. And if you counted all these up, these these, how many files there, you'd find out how many files you had. I didn't make the query do that, and they show you how many rows. And and you're gonna see this, and I'll mention in a minute. There's this notion of data files and delete files because those that know know that you don't really change the files. You create new files or in this case, and you have concepts like merge on read or copy on write. We're implementing that merge on read strategy, which means if you do an update, what you really do is you make a deletion marker and then you net new ads record again. And you say, ah, the new snapshot has this big file, this little delete file, and this little, insert file. Alright. So that said, let's look at the files. I decided, I think there are 400 yeah. 422 of those files. And I went ahead and sorted them from small to high. So the smallest one is two k. Eek. And I'm scrolling and scrolling. I'm not I I'm halfway through and I'm still, like, six k. That's not that's not awesome. These are some small files. They need some work. Now if I get way toward the bottom, then good. They start to get a little bigger. They're still pretty darn small. Twenty twenty meg looks like the biggest, but I wanna make a point. I'm just gonna do a a group by on that feed that that that column called content. And content is zero or one today, at least. Zero means this is a a data file. Classic data. Anything in there is beta loading, etcetera. Content of one is a delete file. This is the one that has those deletion markers. Now I went ahead and did this with v2 even though iceberg v3 is supported. The iceberg connector and all that good stuff, because I knew it creates more of those nasty delete files. Version three has a thing called the binary deletion vector, and it helps in this problem. It has less delete files. It'll have less of these Delta files that we need to merge on read. But But there we go. I'm in a situation where I've got 90 files of data with about an average size of 12. We saw a 20 meg. There you go. Average size of 12 meg. Small, but not terribly small. And then I've got a 332 delete files all about, average about 11 k. That's, would be realistic for a scenario where I was changing bits and pieces all over the place. So this is not unheard of. Again, super small, but it let us set the stage. Alright. So let's do something. I want to, run a query. I'm gonna do a very, very simple query on that customer one table, and it's just gonna do, I'm forcing it to do a table scan. There's not enough, details in these work clauses to do a really good predicate push down, and I want all the columns. I'm just forcing it to do some work. So it ran, you know, found a 128 rows for those criteria out of the 18,000,000. But I wanted to take a minute and say, okay. What did that mean for the engine? And we're not gonna go into the query plan and, because the data I'm looking for actually trimmed it down. I only want one value that I care about, and that is this, the CPU time. How much of your, you know, processing capabilities across this cluster? It's a very tiny cluster, one machine. It burned across that time, ten seconds of CPU time. K? So benchmark, it's a number, something to compare. We could do a whole lot of other metrics, wait times and all this good stuff, but clock time, but just look at CPU time. Alright. Ten seconds. Now I said that's about right. I was expecting about ten seconds. Now, I'm going to do this manually first and then we'll go back and do this other activity. This is probably a great time to go look at this. I'm gonna type in I'm gonna purposely type the Trino iceberg connector. We have a a Starburst one on top of that with a few other features, but the Trino one will definitely call out the fact that there are. Well, there's those metadata tables all documented here if you ever need to know know all about them. And then there should be a nice here we go. Alter table execute. Here are a list of those maintenance activities. I'm gonna scroll in maybe on the right. Things like optimize. Optimize is our is our is our, compaction. Read those files and try to rewrite them. Anything that's too small, read them, combine them together, including that those merge on read things, the delete files and the ads, roll all that together and make some better files. That's an important one. And it all depends, you know, on your use case and your scenario and all the good stuff, but It could be as easy as what it wants today. It could be something you do all the time. Those manifest you saw, there's ways to kinda rewrite all those so they're better and will clump together. Another important thing is this notion of snapshots. We don't have enough time today unless there's questions about it. But the more you, the more snapshot you have, the more, this space you're gonna hang on to. Don't think of snapshots necessarily trying to affect performance. There is a thing that would affect performance, but, arguably, it's mostly affecting storage. So periodically, we do need to purge off some of the snapshots. And then, you know, there could be some leftover things that hang around. There could be what we call orphan file may orphan files. Maybe something blew up in the middle of an update and it didn't wasn't totally able to clean itself up. This is a very not rare, but not constantly thing where you say, hey. Let's just go really do an exhaustive look out there. Alright. So I'm gonna focus on the compaction one right here. I think I got it all queued up. I'm gonna say I'm gonna do just a plain vanilla. Hey. Look out there. Anything you find, clean it up and rewrite it. And you can even go look at this. This is just a job. It's right. I'm a hit let's see what's going on to it here. There it is, alter table customer, execute, optimize. It's just this classic job doing its thing that's reading stuff and moving on. But really what I'm going to do when it's finished, because I'm going to be most curious about how many files are left when it's done. So let's let it finish. And before we run any results in the appearance, go run those same things from before. And we'll just hone in, on the files themselves since it's almost done. Grab my microphone and tap dance. Hey, everybody. Again, maybe I should've spun up a giant cluster, but I you know, then it would be overkill for what I have from data. Small data, small cluster. But it is finishing up. Alright. Again, no questions yet in the chat. And nobody put where they're from. Feel free to let us know what part of the big wide big ball that we all live on, where you're at. Alright. So I'm gonna look at that same thing again. I'm gonna say, tell me what's up with the current files. And if you remember, we had 422 of those. Now we have 12 of those. So compaction took all that, read all those in, rewrote them. Now the good or the bad news is, are they gigantic or yet not? A few are are good size. 180 meg, but there's still some smaller ones. This is because of the problem I have and the size of the cluster I have, etcetera. There's nothing wrong with next time compaction runs. There still may be things that it might go back and say, these are still Canada. They're these are still small enough. These are big enough. They're gonna be left alone. The default threshold from Trino. There they are. Oh my goodness. Someone was, Andre was, like, typing all those up in his in his notepad. Instead of one at a time, he's like, I'm just gonna get them all at once. Thank thank you, Andre. I appreciate it. I'm gonna come back here. I'm almost done. So there we go. I've got a bunch of different ones, but 12 files instead of 20, 422, that's a good thing. Now in practice, you're probably talking about, you know, hundreds of thousands, tens of millions, maybe hundreds of millions of files and all that kind of good stuff. So there there are a lot of things happening in this space, and there is a lot of ways to be intelligent with partitioning and all these other things. Again, feel free to, keep checking in with us. We're about to do a early release on our performance optimizing with iceberg book here. We're talking about all these topics and how you make it best and whatever, but we still have some pretty good blogs and videos out there on this stuff. But there we go. There's 12 files, average records about a million and a half each, average size 82. Alright. We're getting better. We're getting closer to at least a 100, a 128 meg, 150 meg, 200 meg, whatever number, and I feel good about that. So let's go ahead and run that query again. Same queries before, less files, and what we hope is actually runs a little bit faster. Now, again, clock time, not the fastest thing, but I can tell it did go a little faster. It says six seconds. Now I'm gonna go and pick. I think I was expecting the CPU time to burn about six and a half seconds. Remember, we saw the other one about ten seconds. And we'll just look. You know, everything looks good here. Everything's fast. The heart the big work was in executing, and took about seven seconds. I bet if it ran again, it'd be smaller or whatever and close and that kinda good stuff. And I did a a previous one where I was getting, like, oh, three, you know, three times better, but it was so small. It wasn't even fair to run it. And and I didn't wanna set the stage that every time you run compaction, you know, you would get, 300, 600% better performance. Now, if you've never run compaction, you got a gigantic table, yes. But when you introduce compaction and the other maintenance jobs in your workflow, automated or put it in your workflow via some other automated via tool like ours or automate it in your own, data pipelines, you know, you're looking for an opportunity to not have such a swing because you're trying to stay ahead of the performance issues and problems and concerns. Alright. So again, I only got, like, 30% better. And you would say, wow, you went from 400 files to 22 files. 400 files is still nothing, you know, on a table. So, you know, what would be more fun is, you guys do some do some exhaustive testing sizes, scale, create a scenario. But net net, I think we're in good shape there. Alright. So let me finish up my demo with this. What would I recommend you do? Well, we got a nice little thing called data maintenance over here. Data maintenance, you just create a data maintenance job. And I'm just gonna drill in to my catalog, which was called my cloud, and I'm gonna drill into the schema called it was called, something like yeah. Nope. Nope. Nope. What did I call it? Table maintenance. And you can just say everything, but I'm trying to make the point it's this table. There we go. CUST one. And then I'm gonna tell it, what do I wanna do? Absolutely. I wanna fire off compactions. I wanna fire off some of those other steps. I didn't even mention regenerates the statistics and that kind of stuff. There's that kind of remove open files, delete open files. And when I do snapshot expiration, I'm gonna say, by default, I'm gonna say, hey. I don't want anything. Any snapshots are a week old. Throw them away. That may or may not be your right answer. And then I'm gonna say you have to run it somewhere as somebody. I got a big a lot of admin rights here, so we'll just run it to the world. This is Lester set up, and we will just say, hey. Run it. Let's try it daily at three in the morning here in eastern time and save it. Boom. We got we're done, you know. And we have reporting. We can come in here and see what's going on, what's all set up, these are the things, etcetera. Now if I really, really got in there and said, did you do anything? Has it run anything? It actually says it was completed. I don't know what that's asked. What I was gonna do is I'm getting triggered now. I'm gonna force it to run now. Now it's probably gonna run pretty fast because, once it does execute, there isn't a whole lot to do. It's gonna read those files and say, hey, in good enough shape. They may actually get, rewritten since we have that note that notion of, like, a bunch of 20 megs and that kind of stuff. But I think we're done. April what is it? April 16. Okay. Now I guess it's running. I thought it was all done. You know, if in doubt, you can always go to query at that query insights and see what's actually out there running active queries and see if it's scheduled, if it's running or what. It says they're all done. And there we go. I see them. There we are. At least see that one, alter, execute, etcetera. So I'm going to go back in that data maintenance. View one more time and drill down. Yep. There's the completed. Hopefully, I see one that says, yeah, April 16. There we go. Yay. Yay. Yay. Then I go back up one. Are there any errors? And that kind of good stuff from it to kinda walk from here we go. Schedules, errors. No errors. Nothing good. Lots of good reasons why this is important to do. And I would say look for maintenance to start to be more and more and more table stakes. But I I still put a little bit of a cautionary tale personally on the largest, most intense size and scale, the access tables, that those may, today at least, might be best. Again, maybe you have a thousand tables or if you had a 100 tables, we're probably talking about two tables or not 98 of them. For the things that are working great for the most part, let's just let these tools do that for us. But the tools today, including ours, are very schedule oriented. Now look for those tools like ours to get more intelligent and be proactive about when is it right to do things. That's why these maintenance jobs automated will make more sense. But today, at size and scale, it may make a little bit of sense to do it, a little bit more in stream in your workflows that you're firing off. You might know enough about the characteristics of your data. Okay. I think that's my demo. What I might do is shift over and look at the questions that I have in there. I'll put my logo stuff on the screen here just for fun. I'm gonna have something on the screen, or I could turn it off. And I will read the questions. And I think let me see. I know Andre had this power one. Alright. And a few other people. Alright. Andre, you you you you'd made the longest one, so I'll go last on you. I'll read a couple of them. Vitali, this is our this is our q and a time, and I think Quincy probably told you, you're welcome to come off mic. You You know, we rarely do this except for things like this. In fact, somebody was complaining at a webinar yesterday. Why can't I talk? When we turn it on, nobody wants to come on and talk. So please come on and talk. We love that. But if not, I'll read your text. Optimize commands heavy on cluster resources, so it takes time to complete. Can the table be accessed during maintenance, especially optimized, or is it locked? I forgot to turn off my little notice here. Let me just kill it real fast before it starts to do with everything else. Sorry about that gang. Usually, you'd turn these things off. Turn it off today. Turn my focus on. The question is really saying, hey, is it it's heavy. It can be. And can I run queries while I'm running? Absolutely. And here's why. Couple of things. If it's so heavy and it's impacting other jobs, you might actually say run the maintenance on another cluster. That was part of that automation. I said, hey, where do I ever run this at? The reason why you can run queries just fine while something like that's happening is because everything is based on a recommitted isolation model. So when someone says read a query, even though here, look at this picture. Even though another snapshot is being built, until the very end, till it does this optimistic swing, this acid commit to say, okay, this is the new version, then the current version is the current version. So 10 people can be building a new version and a 100 people can be querying the same table and they'll get whatever at launch time the table is. So so you have no no. Isolation's not an issue there, Vitali. And the only thing I would say is if it's so intense, yes, could it impact your performance of those queries? It could. If it's that intense, it may not be that as dense. Or you could out and if it is, maybe outboard it on a like a more maintenance kind of a cluster. So hopefully, that what do we recommend to reduce downtime tables? What would be recommendation to reduce downtimes for tables and maintenance? Not have a downtime for tables and maintenance. That's my recommendation. I'm saying you should be able to do this without, and probably that caveat is what I said. If it's if if it's a compute burden, either bump your cluster up or have an isolated cluster. It's only the one that computes. Not locked according to documentation during maintenance, Eric. You are right. It is not locked. It's using I used the phrase optimistic locking. If you remember your database fundamentals, there's optimistic and pessimistic. Pessimistic is that true data lock. An optimistic lock is this. It says, hey, there's a pointer that knows about this file. And when someone knows about this one over here and the one on the left circling my mouse. I'm pointing with my fingers. I know you can see it. And then it says, hey. I'm the new guy. I'm ready to go. It says, update the pointer as long as my old pointer is still the one I have started from is still the right one. Now that should scare you. It shouldn't scare you that, it won't work. It worked perfectly, but it'll prevent and it works great when only one person's changing. What does it do when the snapshots can keep rolling in? Well, it's gonna do a concept. It's gonna say, well, I'm not valid, and it could just bail. But what it really will do is it'll try to rebaseline. It'll actually go, okay. Great. What is the current snapshot? And then it'll look at itself and say, what I've done, will it work with that? And if so, it'll build a new snapshot. And again, right, before it commits, it'll say, is it as as you I found out a few seconds ago or something. So it is a it is a I wouldn't call it a highly concurrent change model, but it has current concurrency controls, that will prevent you, from stepping over each other. Alright. I didn't look at, the tally, Eric. Let me go up since I made I made Andre wait a little while. He's got a bunch of questions. I'm gonna scan him and see if I can summarize it here. Oh, that's a you got a lot of questions. You might we might have to hang around and just talk, Andre. Alright. Recommendation for iceberg tables, when they're being populated by streaming jobs, and they act as data sources to other streaming jobs. These streaming jobs have streaming, skip over snapshot. Wow. So the question, I think, at the end was all that, Only batch and micro batch jobs are I don't know if I followed that 100%. I know what you're saying. You're you're loading the iceberg table from a streaming job, and that is one of those ones where where, you know, how often we purge the snapshots, how often we have to come back is gonna be very important. A stream skip over snapshot. The only batch. I I think the answer is yes to your question. One, Andre, I might have to come back and save that one, see if that works. How to reduce the size of the all manifest metadata is a 100 k plus records or all snapshot. I think some of that is, well, there's a property called how many snapshots. This one's probably more about the metadata file, not necessarily the manifest. That that metadata file keeps a whole bunch of snapshots, and you can purge that and say, hey. You know what? In each new man of metadata file, that's what you see on the right here. It's keeping track of a whole bunch of snapshots, including what it knows is the current. It could keep track of less than all of them. That's that's one way to make this file not ridiculously large. But I think your question is more about the size of all manifest metadata. Yeah. Yeah. Yeah. I wow. You're you're killing me here. Killing me small. We're gonna Andre and I are gonna loop back and close some of these out and maybe even, you know, post some stuff on our forum side, etcetera, with these answers because I think that's what we just have to do. And, ladies, Berg of Iceberg one ten, are you experiencing issues with rewrite, position delete files, rewrite, position delete files, procedure failing? I don't know that answer. I don't, like, I don't I don't have an access to if we're having that bug or not. I don't haven't heard anyone saying that in our situation, so I'm not sure. Number four, I'm just gonna read all five of them. Is it true that we have data files please position leads for five years to the working? I don't know. I this is the guy or the person, whoever this is that does that. I said give me some great questions. Andre, what we're gonna do on those is, I'm gonna work with you. We're gonna make sure those are getting out there. We'll probably put them in the forum If you're familiar with the forum site, if you have or I'm just gonna show everyone how to get there in case you wanna get ahead of it. You just go to Starburst, and under resources, we have a thing called forum. There it is right there. This is our this is our formal, q and a site. Feel free to add a thing, hit new topic, you know, put in the category. I'll make sure that myself or others look at it. Maybe for Andre, for you, I might post them for you and that kind of stuff, and we can see if we can get them all to good answers. And then order of maintenance, I believe you know, there's a long time ago, I think there's some debate on this. But, yeah, I believe, expiring a snapshot, is either way, it doesn't help or hurt by the newest stuff. You know, that's the gotcha. We're going from that 705, 400 files to 20 files. If I the snapshot deletion isn't as important because user things you just did are there. So I used to say, for sure, inspire the snapshots, rewrite. Actually, I think I was saying do come back to the first and then do rewrite. But I would say that I have learned that it's not so as critical how how where you put expire snapshots is not as critical, as everything else. But I would definitely rewrite the data files, rewrite the manifest after the read the data files, and then I would periodically, you know, slow down that orphan's file. In fact, you saw mine when I checked them all. What I might do is make two jobs for the orphan file. I might say, you know what? I'm gonna I've realized for me that's not it's doesn't yield. It doesn't really delete a lot because there's not a lot of problems, and it's pretty exhausted to figure it out. I might make a second schedule that says, hey. For everything in this schema, only do that, you know, once a week or once it, you know, once a whatever that seems to be working out for us. All good questions. We're gonna put them in there. How do we scale heavy maintenance jobs for iceberg? I think the the answer I said earlier, Amit, is is the right answer to me at least in the Trino world. It may be different in Spark, is consider if you're at the point where these maintenance jobs are, you know, resource intensive, I would consider having a smaller or, you know, scale the cluster up that makes some sense, but have a dedicated cluster that's, you know, can be very small. That's thinking about saying, hey, I'm gonna target my activity there. Now that's different today for I'm thinking, you know, easily in Galaxy how easy that is. Obviously, in enterprise, that's a different story today because we don't have fully everyone using kind of that control plane concept in enterprise like we do in Galaxy. That's pretty much there. It's not rolled out to everyone yet. But, you know, you might only have one cluster for production. That might be what you have. So, you know, we're gonna have to figure out how to how to spread it up. So so a lot of the work is happening on the coordinator to your question a bit, but the actual writing and the reading is still gonna happen out on the on the on the workers themselves. Alright. Issue with the UI, the queries are running, treating the compact details, and you haven't been as a query. Spatial rolls it back. Okay. Listen, it sounds like we got a real the good news is we got a support problem there on that one. Unless, you know, that we can work on that one again. I will sync up with you, and we'll see how to raise those out there. Alright. So I read Vitali's, read Eric's, Andre. I love it. I'll I'll go to the bottom. This is Eric. Secondly, it says, Starburst scaling works well. It sure does. Either Starburst enterprise where, you know, you're kinda configuring that today with, you know, really at the end of the day, you're kinda configuring Kubernetes to help you with that or GUI wise and Starburst Galaxy where, you know, you're you you know, we're still doing the same thing under the covers. We're just kinda managing it for you from, a UI itself. Alright. So what I'm gonna do because there are a few more job few more questions on Andre. I'm gonna I'm not gonna lose these questions, Andre. I got, six minutes over, so I'm gonna stop. I encourage everyone. I'm gonna put that quote. I'm sorry. I'm gonna put the link up there real quick for anyone that's still around in the chat. Here's the place that I would love you to help me. Hopefully, I put the right place in there. Yep. Starburst Community Forum. This is a place I would love you, not just right now, but as you go and you got some questions. You know, you have other venues. You got all your classics. You got the Trino, Slack community as well, but especially things that aren't Trino. That automated stuff, that's not Trino. That's Starburst. Bring them here. Of course, do all the other things you're doing. If you have an account team, an owner, if you got a support contract, leverage all those things. But don't hesitate to start or finish or in the middle, here in the forum. I will Wiki gnome the heck out of that. I won't let something show up. Maybe we don't get you the perfect answer, but I won't let it just, like, sit and wait. I'll make sure someone's taking a look at it if I can't resolve it. And with that, I'm gonna have to, and, Andre, I'm gonna reach out to you directly since you're, got the biggest questions. And I think there was at least one other one in there above with the UI that I think we can probably capture it here or or decide maybe just do that in the in stream, support held. Okay. So we're only, seven minutes late. That's great. This was great. I really do. I I thought no one was gonna ask a question. And in fact, I was stumped on a number, and I like I promised, I wanna solve those for us and, figure them out. So I'll keep closing the loop, and I'm hoping the ones that I did try to mention, the the ones I did try to, respond to, were appropriate for you. If they were not, that email address, this forum site, reach out to me on LinkedIn. This is my job to to help you. Not just job. It's my joy. So, please please reach out. And we'll see you next time. We got a we got a workshop coming up. I don't have all my call to actions on the screen. That's okay. If you went to that same resources site, there's an events page. We're doing this once a month. We're doing one called workshops where we build out, we do a hands on lab, and I give it to everyone. And you go out and do the steps with me. We do it live together, and Then you can also go back and watch it, if it's not appropriate. If you don't have time to do it live with us, you can also pick back up in the video and do it, like that. I think my, backstage helper here, Quincy, I thank everyone for taking the time showing up. on that note, I'm going to sign off and we'll see y'all next time. And like I said, if you still got an open question and I don't reach out to you very soon, reach out to me directly. Don't hesitate. You know, I'll help you out and get going. Thanks, everybody. Have a beautiful day.