
Did you know there’s a way to stripe large files across multiple nodes in a NetApp ONTAP cluster now? NetApp TME John Lantz drops by to drop some knowledge on Advanced Capacity Balancing!
Finding the podcast
You can find this week’s episode on Soundcloud here:
There’s also an RSS feed on YouTube using the new podcast feature, found here:
You can also find the Tech ONTAP Podcast on:
Transcription
The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.
Episode 402 – NetApp FlexGroup: Advanced Capacity Balancing
===
Justin Parisi: This week on the Tech ONTAP Podcast, John Lantz joins us to talk all about granular data distribution.
[Intro music]
[Intro music] [Intro music] [Intro Music]
Hello and welcome to the Tech ONTAP Podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today I have a special guest to talk to us all about granular data distribution. John Lantz is here. So John, what do you do here at NetApp and how do we reach you?
John Lantz: Hey Justin email is the easiest way but I’m a principal technical marketing engineer, I do FlexGroups, I do FabricPool, I do ONTAP S3, tons of overlap between all of those and yeah, I wanted to talk about granular data distribution slash advanced capacity balancing depending on how you’re accessing it, whether that’s through system manager or through CLI.
Yeah, we call it different things, but it’s a really cool new feature in 9.16.
Justin Parisi: Yeah, originally I thought GDD was gosh darn data. It is not.
John Lantz: It is not, yeah. It’s gosh darn interesting, I’ll give you that.
Justin Parisi: Yeah, gosh darn data. So let’s talk about gosh darn data and what it is and what you would use it for.
John Lantz: Sure. So, it only works in a FlexGroup volume. If you’re not familiar, a FlexGroup is the big container inside of ONTAP. Classically we had a FlexVol. And the old school flexible volumes, they went up to 100 terabytes.
In recent times, they go up to 300 terabytes. But, we have workloads that are way past that. And so, probably a decade ago now, I forget, when you did the very first release of FlexGroups, but they’ve been around for a while. We had the idea of, what if, under the covers of the storage layer, we, stitched a bunch of these volumes together, present it as one single namespace, and that’s a FlexGroup volume.
And they scale, instead of 100 terabytes or 300 terabytes, what about 60 petabytes? And it really is as much hardware as you can put in your cluster we can get a FlexGroup out of it. And so that’s where granular data distribution or advanced capacity balancing, that’s where that lives.
And it solves a really unique kind of edge case problem, but it opens up a lot of interesting possibilities. We have these massive capacity FlexGroups, and the neat thing is you get tons of performance out of them as well, because it’s spreading all those files across the entire cluster, so multiple nodes, multiple aggregates, multiple LIFS and multiple cores at the node level.
So you got lots more horsepower than a classic volume. So what we’re solving for, was all those little member vols underneath, they still have their own limits, whether that’s 100 terabytes or 300 terabytes.
And you could end up with one large file that could fill that up. Say, you have a 300 terabyte volume and you’re at 290 terabytes. It’s pretty full. Not the FlexGroup volume, but an individual member vol inside of a FlexGroup.
So the FlexGroup still has petabytes and petabytes of capacity. All of a sudden you wanted to write a 16 terabyte file into that member volume. You’d have a problem because you only had 10 terabytes left in this example. And you’d be stuck. Kind of an edge case but it absolutely hits some customers.
I’m using a big one, like, 16 terabytes in this example, which is uncommon. Probably more common, the ones I saw far more frequently than big giant files coming in, was a pre existing file, like a log file. You’re capturing logs every day, week, hour, whatever the time frame is, and those things grow. And over years, they get to 16 terabytes in size, and you’d end up with that same problem of, what happens when one of those individual members of that giant FlexGroup, what happens when they get full?
And prior to granular data distribution or advanced capacity balancing, That would be a problem, because that member vol would fill up, and you max out 100%, and if the FlexGroup wants to write data there that’s a problem. Even though you got petabytes of capacity on the system, because that one member was full you could run into issues and good news with 9.16, that is not an issue anymore because the new feature, the granular data distribution or advanced capacity balancing, if you click the checkbox in system manager basically is striping for lack of a better term. I definitely want to be clear. I think of stripes as 64 KB or maybe big stripes 256 KB or something for other systems out there that do striping. That is not how it works in ONTAP. ONTAP is really big stripes. They’re 10 gigs a pop.
So, if your files never cross 10 gigs, you’ll never take advantage of advanced capacity balancing. Personally, I’d say you still turn it on because it gives you peace of mind. For customers that do have big files what we’re doing is basically writing the first ten gigs to one volume, and then we’re going to write the next 10 gigs to another volume, and et cetera, et cetera, et cetera. So it’s really large stripes across the system. Like a lot of things in FlexGroup the initial ask or deliverable here was, fix this problem. But with advanced capacity balancing what we’re seeing is if you have a single file read, it can sometimes go faster than it would in the classic FlexGroup because now it’s been spread across and just like a FlexGroup spreads across your environment, now a single file read isn’t trapped to a single member, a single aggregate, a single node. That single file read can get spread across entire cluster as well.
So if you have a hundred gigs or terabyte size files you might experience even better performance using advanced capacity balancing. It’s early days, but that’s kind of what we’re seeing right now for the first few folks that have started enabling it. But initially the core purpose is to prevent that problem of you just don’t have enough capacity in a single member anymore to write this big file, and it totally solves that. And now we’re seeing some neat benefits that are coming along for the ride.
Justin Parisi: It’s not just about avoiding that outta space issue, ’cause that’s pretty uncommon, right? I think the better use case here is keeping a more even balance in the FlexGroup itself, because now files that might have thrown things off. are being spread across multiple volumes, and now you’re not seeing the weird discrepancies there.
John Lantz: Yes and I think that helps storage admins that are really on top of stuff and so they’ll have their spreadsheet that has all the members in their system, and they’re monitoring the balance. I feel like the FlexGroup balance issue, people read into that more than I think it deserves, honestly. It’s really rare for the balancing to hit performance. So here’s an example when it would, where, say you just have like three giant files, let’s use movies for example or something, and for whatever reason, all those files are in the same member vol of a FlexGroup that has 10, 000 movies in it, but three of the most popular movies this week are all in this one file, and so everybody’s reading them from that one place. That’s more than a capacity issue at that point. That’s a hot workload. So you’re not getting the performance of the entire FlexGroup because those three files are all in the same place. That’s where advanced capacity balancing would absolutely change that. So that paradigm really doesn’t exist in a post 9.16 granular data distribution world, because anything over 10 gigs is going to get split across multiple members, so that gets put on different aggregates, different nodes, etc. And so you harness that horsepower making those single file reads go a little bit faster.
So yeah to your point there are use cases when it absolutely helps performance too and it’s not just about that capacity play.
Justin Parisi: Yeah and I wasn’t referring to imbalances in terms of the constituents themselves necessarily but Constituents live on physical disk. And if an aggregate starts to fill up and you don’t have enough space to honor a write, having that balance across multiple constituents alleviates that quite a bit.
So it’s more about your physical capacity and where it lives versus the virtual capacity underlying. That’s what I was mainly referring to when it comes to the data distribution imbalances, because that’s where most of the problems came in. was not necessarily performance because a volume was too full is more about, Oh, this aggregate’s filling up and I can’t move this data very easily.
What do I do? So having GDD does that for you.
John Lantz: Yeah. Great point. Pre 9.16, you would have to do a vol move. And we do our best to make vol moves of individual member vols relatively easy, but that’s still work that a admin has to come in and do.
And so this alleviates that issue. You’re not gonna see those scenarios so often. I mean, I guess technically you’d still see them when you add a new shelf or something, and all the new members that I add from that new shelf, they start at zero capacity.
So you’ll still see those capacity plays just through growth and adding disks, but for the disks that you got it spreads it out much better than it did in the past for those big files where things get a little lopsided under the covers.
Justin Parisi: So how does this handle locking? Do locks happen at the nblade where the request is received, or does it have to disperse those locks across all the file parts?
John Lantz: It happens up at the file, just like normal. So your NFS or SMB locks, they work identically to the way they did before. And we just handle it under the covers. Obviously it’s going to all the members, but the core inode for that file is still preserving the lock just like it always has.
Justin Parisi: So, what sort of things can it not do? What do we have limitations on? When should I maybe avoid using GDD? Because I know that it’s not enabled by default, if I’m understanding it correctly.
John Lantz: Yeah, yeah. You’re totally right. So, first release not enabled by default. I think the big one, at least today, and this isn’t like always going to be the case, probably virtualization make a lot of use of copy offload and that’s not supported right now. And because of that, we won’t let you enable GDD or Advanced Capacity Balancing if we see that is already enabled on your SVM. Again, that’s just for this initial release. I think that’s something that we’re going to have supported in the very near future, but yeah, right now, offloaded data transfers or copy offload just isn’t supported when you do GDD. You can use storage vMotion and other things to move data around, but just not those yet. I did want to bring up S3 has some implications here as well.
So if you’re not aware that S3 does multi part objects. It’s pretty common in the world of object. Files not so much. But in ONTAP, we have something that’s really cool. We treat S3, even though S3 is clearly object, we treat S3 as if it’s a NAS protocol, so NFS and SMB can work together in the same volume, and you can have clients using one protocol or clients using the other, and they rewrite, delete the exact same data.
A couple releases ago we started doing that with S3 as well. So you could be in a NAS volume that speaks NFS or SMB. And it also speaks S3 the catch being, we had some limitations under the covers, because there’s kind of two flavors of S3 and ONTAP. We have the native S3, which is true object, and does everything you expect from an object. Then you have the multi protocol one, which for the vast majority of S3 actions, it’s going to look like an object, but under the covers, it’s really a file and so multi part object was always a roadblock for the multi protocol one.
Totally works as normal in native S3 and ONTAP, but for multi protocol, you couldn’t have these multi part objects which is bad if your workload needs five gig or greater files slash objects, because that’s how you do it. So good news with advanced capacity balancing, You click that box now, and all of a sudden, multi part works just like normal, and under the covers, we’re, you know, you can’t see my air quotes, but we’re splitting up that file into a multi part file and putting it across different members, so we’re harnessing our new capabilities even outside of the NAS world. And enabling multi-part objects inside of a NAS volume that speaks multiple protocols including S3. So yeah, just a neat little add-on. It’s not just your NAS clients that benefit from it, your S3 clients benefit as well.
The only gotcha is it has to be in a FlexGroup. There’s no concept of advanced capacity balancing in a classic FlexVol. So, provided you’re in a FlexGroup volume now you can do, multipart objects in your multiprotocol volume where before you couldn’t.
Justin Parisi: Well, you couldn’t do S3 in a FlexVol anyway. It was always a FlexGroup, right?
John Lantz: In the multiprotocol version, you could do it in a classic volume even.
Justin Parisi: So you could have a FlexVol, That has NFS data, and then you could expose that FlexVol to S3?
John Lantz: Yeah, yeah. Technically you could go a level deeper than that if you wanted, and you could have like a directory inside the volume and call that a bucket to S3.
Justin Parisi: Wow. More granular than I thought it was. Yeah, yeah. Interesting.
John Lantz: But again, to take advantage of multi part objects in the multi protocol ones, it has to be a FlexGroup. So yeah, to your point, for this feature, you want a FlexGroup. But yeah, if I’m just doing, put, get, list, delete, I could do it in a classic volume.
Justin Parisi: Are there any features that don’t work with GDD? For example, SnapLock or, I don’t know, SnapMirror or anything like that?
John Lantz: Your snapshot copies, once you go GDD, you can’t go back. So once you enable it, you can’t disable it. So Say you’re on granular data or advanced capacity balancing in 9.16, but you had old snapshots. That’s going to be a problem. You can’t roll back to them once you move over.
Justin Parisi: It’s just single file restore, I bet, right?
John Lantz: Yeah, yeah, so if you wanted to go backwards, you really have to do a full on rollback so you know, to restore snapshots from the olden way you gotta do a complete rollback ’cause once you go GDD, all your snapshots, et cetera, will be GDD enabled moving forward.
Justin Parisi: And it’s not just that if you have files that have already written to other constituents. I don’t think there’s a way back from that either.
John Lantz: Correct. It doesn’t know what to do. If I went back to like 9.14 or something where GDD doesn’t exist, what do I do? This doesn’t work.
Justin Parisi: It would block you anyway. It would say GDD is enabled. You can’t go back. We would not let you roll back. I hope we would not let you roll back. I would imagine there’s a check there just like we have with any other thing.
So, what sort of use cases work best with GDD? Like, industries or workloads?
John Lantz: Sure, so we talked about movies earlier where the files get bigger as the quality gets better. And everybody has 4K Blu rays nowadays, but 8k is coming. I think the 4K ones are 30 plus GB. But I think they’re up to 200 gigs per hour when you get to 8k. The density of the data increases so much. That’s one place where it’s good to split those up and balance that out especially for that scenario we’re talking about where if I have clients that are touching the same file or a bunch of different files that all live in the same member that can create a hotspot in the cluster. Advanced capacity balancing fixes that. It’s grabbing those files from all over the cluster and spreads that workload around.
The other one where I see large files is EDA you know, semiconductors basically. It’s not as fun as watching a movie, but probably more important to the modern world, I think and those file sizes are increasing dramatically too.
A lot of EDA use cases, especially in the early days, they’re going to use a FlexClone or something, and recreate those files over and over and over throughout this cluster, and then hit them really, really hard with lots of workloads. Pre GDD, that meant these large files spread throughout the FlexGroups and stuff and now I’m spreading out these large files so that they can harness the power of this big cluster, because we’re not talking about a two node HA pair anymore, these are the folks that have 24 nodes in their cluster.
So you have all this extra performance. Why bottleneck it when you don’t have to? You don’t have the capacity bottlenecks, and you have better performance allocation.
Less sexy than , cool technology of EDA or fun entertainment stuff. Everybody captures log files, right? It’s the least fun thing, just writing down what’s happened. It used to be I’d have a legal hold for like 7 years or 10 years. There’s customers now that are like 30 plus years, so they’re going to have to hang on to this for the life of the cluster easily. Those files just get bigger over time, and this is really not a performance play, but it absolutely prevents those weird edge cases where my file over many, many years just got too big to fit inside the member vol. That problem pretty much goes away with advanced capacity balancing. Not as cool as 2nm chips, or 8k movies, but, just the boring log file.
Justin Parisi: Yeah, this is also going to be a good place to play anywhere you read a large file, so oil and gas, seismic, genomics, AI ML, medical imaging. So a lot of different use cases where you’re dealing with reads of larger files. And generally, I would imagine streaming reads, not so much like finding a specific section of a file, like a database spreading across this probably wouldn’t be as great, but
image files and that sort of thing would be really where you’re looking at.
John Lantz: And remember our definition of a large file is totally different for Customer 1 and Customer 2. This was your logic back in the day, and it still holds true today, is like, basically, If it consumes more than 1 percent of that underlying member vol, that counts as large. So if I have a 300 terabyte member vol, or a 1 terabyte member vol, what counts as large is going to be dramatically different for those two volumes and this solves the problem for both. We’ll start using Advanced Capacity Balancing all the way down there. If you have a quote unquote large file, which counts as basically greater than 1 percent of that member vol we’ll split it up for you.
Justin Parisi: So it’s a percentage and not straight up 10 gigs.
John Lantz: Yeah, yeah. Technically we won’t go smaller than one. We do have a cutoff where we’re not going to make four megabyte files or something. We won’t stripe unless it’s at least over one gig. Say I was like 1. 5 gig. Well, the first gig would get written to one member, and then the next .5 would get written to a different member. So you’re still splitting it up spreading out capacity and load across the cluster that way.
Justin Parisi: So if I grow my volumes, then that changes the Yes, way that distributes and that could potentially change how things perform.
John Lantz: Yeah. So internally, a lot of us are saying, Hey, this is like asymmetric striping because depending on what member it lands on, the heuristics really change a bit.
Justin Parisi: Yeah. I guess my concern is, I’m used to performance of one gig stripes, right?
And then I grow my volume and now I’m getting five gig stripes. So you mentioned that having multiple stripes can be potentially beneficial for performance. Is it so drastic where a customer might notice that difference of one gig to five gig, or it really isn’t that big of a difference to work with.
John Lantz: So, number one, they probably won’t notice. In the real world, where it’s not always just a single file read, it’s lots of workload hitting the cluster all at the same time you’re probably not going to get noticed.
Justin Parisi: And this is faster for reads because writes aren’t going to be faster because they’re still going to operate the same as they would normally.
John Lantz: Yeah, yeah. They work the same as always. ONTAP doesn’t know how big this file is going to be when a write is coming in until it stops writing. It’s going to just keep writing to that initial member vol like it normally does until it crosses that threshold that says, oh, this is enough for this one volume, time to start writing on the next volume. So yeah, the writes work the same as always, you don’t see a performance gain on the write and technically, that lives in cache before it even hits disk. But it’s the read where you see things go faster.
Justin Parisi: All right, John, well, thanks so much for joining us and talking to us all about granular data distribution or my favorite, gosh darn data. Feel free to use it.
John Lantz: Advanced Capacity Balancing.
Justin Parisi: Advanced Capacity Balancing. So again, if we wanted to reach you to learn more about this or to get other questions answered or if there’s a TR out there or something how do we reach you?
John Lantz: Funny enough, we’re trying to deprecate the TRs. So we’re trying to move everything over to the doc site. You can give feedback on the doc site. All that is going to get to me. If it’s any of the topics on a FlexGroup or FabricPool or S3. That’s a good way. Otherwise, just reach out to your sales teams they should be your first point of contact and then we’ll get in touch with me, and then we can, can have a chat, and we’ll jump on a call.
Justin Parisi: So you don’t want to give out your personal phone number, John?
John Lantz: No.
Justin Parisi: For a good time with data called John.
John Lantz: It’s written on a couple walls here and there.
Justin Parisi: What about any NGs? for GDD?
John Lantz: Not for GDD specifically, but the FlexGroup NGs are always there. So yeah, happy to help out. I’m monitoring those constantly all the time.
Justin Parisi: It’s like the FlexGroup’s Batman.
John Lantz: Yeah.
Justin Parisi: All right. Well, John, thanks so much for joining us today and talking to us all about GDD or Advanced Data Balancing. Capacity Balancing. Capacity Balancing. Happy to be here. Thank you. All right, that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast@netapp.com or send us a tweet @netapp. As always, if you’d like to subscribe, find us on iTunes, Spotify, Google Play, iHeartRadio, SoundCloud, Stitcher, or via techontappodcast.com. If you liked the show today, leave us a review. On behalf of the entire Tech ONTAP podcast team, I’d like to thank Jon Lantz for joining us today. As always, thanks for listening. [[Outro]][[Outro]][[Outro]][[Outro]][[Outro]]