cloud data flexcache flexgroup hybrid cloud NAS netapp ontap podcast tech ontap podcast techontap transcripts

Tech ONTAP Podcast Episode 401 – NetApp FlexCache (Spring 2025)

April 14, 2025

3 Views 0

SaveSavedRemoved 0

NetApp FlexCache is a way to squeeze the most performance possible out of your NFS and SMB workloads by doing two things:

– Adding multiple endpoints for read access without needing to copy the entirety of a dataset

– offloading read workloads from volumes that also need to perform writes

FlexCache can run on the same ONTAP cluster as your origin volume, another cluster in your on-prem datacenter, or in the cloud.

Finding the podcast

You can find this week’s episode on Soundcloud here:

There’s also an RSS feed on YouTube using the new podcast feature, found here:

https://www.youtube.com/watch?v=videoseries

You can also find the Tech ONTAP Podcast on:

Transcription

The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.

Episode 401 – FlexCache
===

Justin Parisi: This week on the Tech ONTAP podcast, Elliot Ecton stops by to talk to us all about FlexCache in ONTAP.

Podcast intro/outro: [Intro]

Justin Parisi: Hello and welcome to the Tech ONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today I have a special guest to talk to us all about FlexCache and ONTAP. So to do that we have Elliott Ecton. Elliott, Tell us what you do here at NetApp, and how do we reach you?

Elliott Ecton: Hey, Justin. I am the technical marketing engineer for NAS and FlexCache, of course, since that’s what we’re talking about today.

If you want to reach me, just Elliott. Ecton at NetApp.com. I don’t have any socials. I avoid the socials.

Justin Parisi: You avoid the socials. Probably for the best.

All right, so we’re here to talk about FlexCache, not politics ’cause that’s what we do. Okay. We talk about tech. We are the Tech ONTAP podcast. So Elliott let’s start off with FlexCache in general. Give us the 10,000 foot view of what FlexCache is and maybe talk about why it became FlexCache.

Elliott Ecton: So high level, FlexCache is a volume that you can deploy anywhere in the world that you have ONTAP. It’s not a replica of a origin volume. It’s a caching play that’s a hundred percent consistent, current, coherent at all times. You can use it for performance reasons like hot file remediation or geographically dispersed workforces to allow people to access their data locally and get better performance, and it’s a hybrid cloud play as well.

Justin Parisi: Right. So let’s talk about how it keeps things up to date. As far as I know, it’s not necessarily exactly a mirror because a mirror would say it’s everything in that volume. While you have access to everything in the volume at any given time, it’s not always in that cache. So one of the benefits of FlexCache is that you may have a hundred petabytes on on prem and then you can create smaller caches that maybe take on those small files or directories and they don’t have to be giant volumes, right? They can be smaller volumes. So it’s a sparse cache.

Elliott Ecton: Yeah. Yep. Yep. You’re right. So essentially when you stand up a FlexCache You can enumerate that cache and immediately it’s gonna look exactly like the origin volume, right? But there’s actually no data in there. It’s just the metadata is replicated.

So FlexCache operates on a pull level basis. We never push data. I like to say we send data just in time, not just in case. So we pull on demand and then we’ll keep that in the caches as long as the data remains valid.

Justin Parisi: And that’s great for the hybrid cloud use case because, as you know, it can get very expensive in the cloud to keep data there.

So a cache that’s sparse that doesn’t keep all the data there only when you need it is going to be cost efficient when you use a cloud use case.

Elliott Ecton: And to get stuff out, if your main data set lives in there you get lower egress charges as well.

Justin Parisi: So the data that’s pulled into these FlexCaches, how secure is it when it transfers over to those caches?

Elliott Ecton: Yeah. So it’s extremely secure. It goes over TLS 1. 3 encryption. And it’s not a NAS protocol that’s crossing the wire, it’s a proprietary call. So even if someone somehow broke the TLS encryption, they’d still have to figure out the proprietary protocol.

So TLS encrypted and proprietary protocol makes it pretty dang secure.

Justin Parisi: Yeah, that’s great. So as far as the files themselves go, I can cache partial pieces of the files. I don’t have to cache the whole file. Is that accurate?

Elliott Ecton: Yeah, that’s correct. So say you had your origin somewhere, say it’s up in the cloud and you just need the last few lines of a log file, right?

You can just do a tail -n, and it’s only going to pull those blocks. We work at the 4k block level in ONTAP. So it’s only going to pull the 4k blocks necessary to serve the last few lines of that file, and leaving the rest of it up in the origin. So, yeah, you can have a partially populated file.

Justin Parisi: And that’s great for other use cases, like a database where you’re doing a query of a certain row and a table, you don’t want to pull the whole database over just to do that.

Elliott Ecton: Yep. Speeds up performance limits your storage that you require where the cache lies and obviously the benefits with cloud with egress charges and the like.

Justin Parisi: So talk to me more about this performance gain. Why is it improving my performance?

Elliott Ecton: Well, let’s start with a geodispersed workforce.

You get performance there because obviously the closer you are to your data, the quicker you’re going to access it. The only thing we can’t change or speed up is the speed of light. So we had to get the things closer. So that’s one performance play.

But that’s definitely in the geo dispersed bucket, right? The performance play is when you actually have caches of an origin in the same cluster as the origin. And this is a really cool use case. It’s kind of like our old load sharing mirrors. You remember those, right?

Justin Parisi: Yeah, I remember those.

Elliott Ecton: You have control over them, and they can be writable as well. You basically have FlexCaches. I call them high density FlexCaches, because a FlexCache is generally going to span all the nodes in the cluster by default, if you don’t do some special CLI configuration. So with a high density FlexCache, you actually make it only span one or two nodes in the cluster, and then you add in more FlexCaches to go across all the nodes. Well, now you have added that many volume affinities, CPUs that are in the IO path to help speed things up.

Justin Parisi: You’re also able to leverage more networking as well, right? I mean, you’re not taxing a single node’s networking.

Elliott Ecton: Yeah, absolutely. I mean, you should always be load balancing your NAS front end, but it has even more benefit with the FlexCache because everything’s direct then, right?

If you get it down to a one node FlexCache and every node has a FlexCache, there’s no east/west traffic, and it’s all direct traffic through a volume affinity that’s not shared by every other front end connection.

Justin Parisi: Yeah, you don’t have to traverse the cluster network. You don’t get that extra added latency, it’s going to give you a better performance for that as well.

So tell me about why you see people using a FlexCache. What sort of use cases are they leveraging to make a FlexCache viable for their environments?

Elliott Ecton: So the biggest use case that we’ve seen so far is EDA. They call it Design Anywhere, where you have your servers in the cloud and your job is on prem, where you schedule it in the cloud, all your tools and everything are on prem. And then when the job kicks off, it’s only going to pull the libraries into the cloud that it needs. And then at the same time, you stand up an origin in the cloud, so you have a high performance write, because it’s an origin. So, it’s just like writing to a regular volume. Then once the job’s done, if you were to copy all of the workspace down, all of the scratch and everything, it would be a ton of data. But they usually only need a little bit of that. So once again, it’s faster to get the data down you need, you don’t have to lift and shift everything from the results down…

Justin Parisi: it also can be used for binaries, right? The EDA binaries where they have to load a specific application and it’s basically pulling a read, but if it’s localized and you have multiple instances of this that aren’t actually taking up space, then that can help out immensely.

Elliott Ecton: Yeah, for sure. It’s been a pretty big game changer.

The EDA industry is really, really latching on to FlexCache. In fact, EDA industry is actually kind of why FlexCache was reintroduced. As you know, we had it back in the 7 mode days. But it didn’t make the jump from 7 mode to clustered ONTAP, right? So when we reintroduce it in 9.5, a big driver of that was the EDA industry missing FlexCache.

Justin Parisi: You mentioned that when we write, it’ll basically go back to the origin and that’ll be reflected.

But now we have something called WriteBack with FlexCache. So tell me the difference of what WriteBack does versus how the classic FlexCache writes happened.

Elliott Ecton: Yeah, for sure. So, when we introduced FlexCache again in 9. 5, we implemented a write around technology. Essentially, you still mount the cache, right?

I’m just going to use NFS as an example. You would just mount the FlexCache volume like you would any other volume, and you can read and write to it. The writes, though, when they hit the cache, are actually intercepted and turned into a proprietary call I was talking about earlier when we were talking about security, and that’s going to be forwarded to the origin.

And then the origin is going to commit it to disk, respond to the cache, and then the cache can respond to the client. So for writes, you’re always getting the full round trip time of the So your reads could be accelerated, but writes would always be WAN latency. Now, in 9.15.1, we introduced WriteBack it does support NFS and SMB and essentially the difference now is, when you write to the cache, it commits the rights to stable storage at the cache and then immediately acknowledges the client. It’s not good for every single workload though. So make sure if you do want to use WriteBack that you test your production workload and see what works for you.

It’s definitely not a replacement for WriteAround.

Justin Parisi: No, no, it’s basically specific use cases, I would imagine. Yeah. Can you think of any specific things that it can do better than you would see with WriteAround?

Elliott Ecton: Yeah, we’ve seen it in the video editing world pretty well. And then anything that’s heavy, heavy writes. And when I say big writes, I don’t mean necessarily even the size of the file, because you can do a small write to a big file, right? But I’m talking about how many writes are between the open and the close of the file, essentially. That’s a good use case for it.

Justin Parisi: So basically like a streaming workload.

Elliott Ecton: Yeah, you got it.

Justin Parisi: When I’m dealing with a FlexCache volume and I’ve got ACLs How does it handle that? When I create the cache does it retain the ACLs I’ve set on the origin? And if I want to change the ACLs on the destination, how does that work?

Elliott Ecton: Yeah, everything’s 100 percent consistent, coherent, and current, right? That’s our main design philosophy with FlexCache, and that includes ACLs. So when you create a cache, it inherits it from the origin. And it also inherits all the ACLs and everything like that. So the ACLs are consistent with the origin 100 percent of the time. So you can change ACLs at the origin, but you can also change them at the cache. In write or download, obviously those are going to have to go to the origin.

To be committed, but in write back mode, most of your ACLs can be set at the cache and then asynchronously flushed back. There are a few exceptions to that of things that have to actually be forwarded to the origin and that can actually sometimes cause an eviction of the cache, but you can set the ACLs at the cache.

Justin Parisi: Okay, so when I have a situation where I have ACLs and there’s like, say, an LDAP involved and my destination, I’m probably going to have to have the same name services to recognize those ACLs, right?

Elliott Ecton: Yeah, I’m actually glad you brought that up. That’s something that people overlook a lot of times. Absolutely, you gotta have some kind of replicated name service, centralized name service.

Obviously with CIFS, you have Active Directory trusts or something. You have to be able to resolve that SID. And then with NFS, your LDAP lookups, you don’t want UID collisions or anything like that.

Justin Parisi: And we want those NFS v4 ACLs to translate as well, because I assume this supports NFS v4.

Elliott Ecton: It does, it does, v4, not 4.2. And here’s the worst part about it. It doesn’t break until a certain call. So people will think it works.

It’ll mount fine, but I have a feeling the reason why is because they don’t want to handle xattrs. 4.2 is, I call it a feature release of 4.1, right? So it’s an extension and it has a lot of features like extended attributes and things like that, that engineering has to really think about how to remain POSIX compliant and all that. It’s a more difficult ask and nobody’s asked for it yet. So it’s not that we’re not going to do it. We just haven’t done it yet.

Justin Parisi: So where specifically would break for me?

Elliott Ecton: So if you were to mount with NFS version 4. 2, it’ll actually mount up fine. It’s when you do your first operation, you’ll get an error.

Justin Parisi: But what if I want the error?

What if I like errors?

Elliott Ecton: Well, then 4. 2 is for you.

Justin Parisi: Awesome.

Elliott Ecton: Excellent.

Justin Parisi: Yeah. So basically the takeaway here is when you’re mounting your FlexCache, you got to specify the NFS version because it will negotiate to the highest supported version and NFS v4.2 is the highest supported version.

Elliott Ecton: Yeah, and you can’t disable 4.1 without disabling 4.2 or vice versa, right?

Justin Parisi: Yep, yep, it’s tied together. They didn’t have an option for 4. 2 to disable it.

Elliott Ecton: Which goes back to, it’s kind of a feature release of 4. 1.

Justin Parisi: I mean, there’s no special performance enhancements or anything.

It’s all just basically security stuff like Mac labels and xattrs and that sort of thing. I think there’s also like some stuff for databases like SQL databases where it can handle sparse file operations a little better. Those options you have to enable or disable manually. But overall yeah, you, you want to try to avoid 4. 2 with FlexCache.

Elliott Ecton: Yep, for the time being. And of course, you know, like I said, it’s on the road map

Justin Parisi: On the road map. All right. So one thing I know you can do with FlexCache is something called pre population. So the reason why you would do that is when a FlexCache is across a WAN and then you have to populate the data that you’re asking for, it takes a little bit of time. So if you know what you want already, say an EDA library, you can go ahead and run this pre population command and populate that. So tell me more about how that works, how you would do that and what sort of impact that has.

Elliott Ecton: Sure. There’s a couple ways to pre populate.

We have the native built in pre populate commands, which you can also trigger via REST API or during FlexCache creation in System Manager. But basically what it is, is you just tell it a top level directory that you want to start populating from and then it’ll recursively at a very low priority start pushing that data from the origin To the cache without it actually being requested at the cache because remember I said, that’s a pull only technology.

So that’s why we have to have the pre population That one is not built for speed. It was intentionally built as a low priority I always joke around that the lowest priority in ONTAP is to self destruct And one right above that is pre population of FlexCache.

Justin Parisi: Oh my god. So how long would that take?

And is there a way for me to change the priority of that?

Elliott Ecton: You can not nice it or anything like that. Nope. So if speed is of importance, and to answer your question about how long it would take. I don’t know.

Justin Parisi: It depends.

Elliott Ecton: It depends what’s going on in the cluster and how far away you are from the cache. When speed is important, we use XCP. XCP, as you know, is an awesome synthetic NAS client, and it’s hyper threaded. So you can actually just install it on a client, preferably an NFS client, because it’s more feature rich, XCP NFS is. And then you can just point it at the volume and you can get highly granular on what XCP matches.

And then you can just say, okay, here’s my filter for the files I want. Do an MD5 checksum. And essentially it’s a dump to null on the client. But while you’re doing that, you’re reading it through the cache, populating it.

Justin Parisi: Okay. So basically it’s a false read, but you’re doing it before the application has to do it.

Elliott Ecton: Yep. You got it.

Justin Parisi: So it sounds like. It can be kind of cumbersome to do this. When would it be best to pre populate? Does it really make enough of a difference to take that step?

Elliott Ecton: The only time that it’s absolutely required that you pre populate is if whoever is accessing that file first absolutely cannot afford to have the WAN latency on the first read.

That’s it. Now if you got a guy who he’s raising tickets all the time, he just likes to complain, maybe pre populate his home directory. I don’t know. But for jobs that aren’t super latency sensitive, they’ll get the data in. And then every time they access it after that, it’s fast.

It’s just that very first access.

Justin Parisi: Okay. So let’s talk about the hot files you mentioned earlier. Now, first let’s define what a hot file is. And let’s talk about how FlexCache can help with those.

Elliott Ecton: Yeah, for sure. I’m going to use rendering in the M& E market as an example.

So when you’re rendering, a hot file would be something that all the compute farm needs. They’re all hidden in this one file all the time and it becomes a hot spot. Before SSDs, it would be a bottleneck at the disk. Now it’s actually a CPU bottleneck. So that’s what a hot file is. So, there’s two bottlenecks in a cluster in ONTAP for that. And that is, like I said, the CPU, unless you’re still using a FAS, it could be the disk. And then also the other thing that adds latency is east/west traffic. Essentially, you want to get rid of both of those.

East/west, less so, but you definitely don’t want to be bottlenecking on a single affinity because that file lives on a single volume. So, when you deploy FlexCache, say, from System Manager, it’s going to try to put at least one, if not multiple constituents, on every node in the cluster and FlexCache logically spans the entire cluster. So the problem with that is, say you distribute your NAS clients across all the front end, say we have four node cluster. You have all the clients distributed nice and evenly across the front end and your files are nice and distributed in your FlexCache.

Well, if it resides on node one, this hot file that the node 2, 3, and 4 NAS connections are going to have to have East/West traffic to get to it, right? And the bigger bottleneck, going to have to go through Node 1’s volume affinity to get to it. So they’re all going to pile up on Volume 1’s affinity.

So a FlexCache just deployed like it is, is not going to help you with hotspots. So the way you got to do it, and I’m actually getting ready to publish some documentation on this. is you create what I’m calling high density FlexCache arrays or HDFAs. And the goal here is to take a FlexCache and condense it down to as few nodes as the capacity requirements allow with the Big goal being to get a FlexCache on a single node. In our four node example, you’d have a FlexCache on node one, a FlexCache on node two, FlexCache on node three, and a FlexCache on node four. And these are all FlexCaches of the same origin in that same cluster. So now when you distribute your NAS traffic across it, You will actually have no east/west traffic because you’re guaranteed a local copy of it.

And you’ve also increased the volume affinities. Now there’s no east/west traffic and each node has its own volume affinity to service that file.

Justin Parisi: You’re also adding more throughput. Because if you go to a single node, you’re limited to whatever the throughput is for that node. So if a node can do 200 megs a second for a file, which I’m just kind of estimating, then if everyone’s hitting that same file, you’re probably not going to get much more performance out of that.

But if you do 200 megs a second on 12 nodes, then that’s 12 times 200 megs a second.

Elliott Ecton: Absolutely. So your latency is down, your throughput’s up. Your CPU wait time’s down, so it’s a win win. And, you gotta remember, this isn’t like you have to have a one for one of the origin. Say you have a one terabyte origin.

It’s not like you have to create four one terabyte caches. Because they’re sparse, so you could only have a hundred gig cache, a hundred gig cache, a hundred gig cache, and a hundred gig cache, because we see 10 percent is usually about what the active data set is in a given origin.

Justin Parisi: It also depends on the file size too, right?

You don’t want to have a cache that’s smaller than the file.

Elliott Ecton: That’s very true. Very true. Yes.

Maybe I should have used

Justin Parisi: Constantly evicting. Oh, this is not

better, Elliott. Why did you make it so much worse?

Elliott Ecton: Maybe I shouldn’t have used one terabyte as an example.

Justin Parisi: In your use case, you mentioned rendering, right?

And really what that comes down to is the little artifacts they use to make a shot in a movie. Like the hair, or the water, or the trees. The texture files. Everybody used the same stuff. If they’re all hitting the same folder, because they all have to work on the same shot, then spreading that out across nodes is going to absolutely help that rendering.

Elliott Ecton: 100%. Yep. So doing just a regular FlexCache is not going to benefit you much other than you have now access to the origin and one other copy of it, right? Yep.

Justin Parisi: Have you seen much in the way of using FlexCache for, say, medical imaging where doctor’s offices may be remote from the hospitals they service and they all reference the same images for patients?

Yep. And maybe they need to localize that stuff.

Elliott Ecton: Yeah, so a lot of times actually you know, they take the images in one location and then the radiologist or whoever it is that specializes in that kind of image is at a satellite location. And so they only need read access to it. They can immediately get a copy of what was taken over at site A and read it at site B. So, yeah, it’s very good for that kind of stuff.

Justin Parisi: And what about SnapMirror? Let’s say I have a SnapMirror relationship involved with my origin. Does it affect how the FlexCache works? And if it doesn’t, can I actually use it to SnapMirror FlexCaches?

Elliott Ecton: So I’ll answer the second question first. No, you cannot SnapMirror FlexCache.

Remember, it’s a Sparse volume.

Justin Parisi: So the better answer is I should not be snapmirroring FlexCaches.

That’s the better answer. Do not snapmirror FlexCaches. Even if we let you do it, don’t do it.

Elliott Ecton: Yeah. Don’t .

Justin Parisi: And the reason why is you’re not trying to Dr your cache. Right. But I guess somebody would wanna do that. But anyway, continue.

Elliott Ecton: Yeah. Yeah. You know. Yeah. We just won’t go there. , .

Justin Parisi: I know it’s been asked, I’ve seen it asked.

Elliott Ecton: Yeah. Oh yeah. Oh yeah. There’s been all kinds of asks. And I’ll get to the first question you asked too, but real quick, one thing about FlexCache, I think is, we have so many flex products, right?

Flex vol, FlexGroup, flex this, flex that. Right. FlexCache I think is the best use of the term flex because people’s imaginations when I go and I talk to customers like, can I do this with it? It’s like, wow, I hadn’t even thought of that. You know the flexibility of FlexCache appears to be limited by the imagination of the storage admin.

Now, not all those ideas. are good ideas that will work with FlexCache, but it definitely gets the imagination going. But let me go back to your snapshot question because obviously we still have to be able to take snapshots at the origin, correct? With write around, there’s really nothing to worry about because all the writes are committed to the origin anyways. But in write back we had this conundrum where when a snapshot happens, we could have dirty data outstanding that hasn’t been flushed back to the cache yet. And for a snapshot to be consistent, all the data has to be back. So for us to make sure, once again, a hundred percent consistent paradigm when we were designing it, to make sure we keep that, we have to quiesce all IO at the caches until all that dirty data is flushed back to the origin before the snapshot can be taken. So, if you have too much outstanding dirty data, that can make the snapshots time out. So if you have a very heavy snapshot schedule, it may be best to avoid write back for the time being.

Justin Parisi: Yeah. Cause, you’re trying to constantly keep up and it’s just really hard to do that with the snapshot schedules.

Elliott Ecton: Yeah, you don’t want to quiesce your writes every five minutes globally.

Justin Parisi: So I know that a while back we did some work on making a global locking system. Tell me about how that interplays with the file system. And I think it’s specifically mostly for SMB, but I would imagine NFS v4.1 is there as well. So talk about that.

Elliott Ecton: Yeah. The real differentiator is stateful versus non, right? So NFS version three versus everything else, essentially. FlexCache, from the end user’s perspective is going to behave just as if you were not using FlexCache at all. So an SMB lock in cache one will be honored by a cache at a different location and the origin orchestrates all this stuff through read, write lock delegations so you don’t have to worry about two writrs, when there wouldn’t be obviously with NFS version three, even if FlexCache wasn’t in play, there can always be two writers, because it’s advisory locking, not mandatory locking. So that’s still a thing with FlexCache. But you don’t have to worry about data corruption due to multiple writers at the same time or anything like that. Once again, a hundred percent consistency. We make sure that everything is honored as far as locks go.

Justin Parisi: So how does that impact performance with the locking coordination?

Elliott Ecton: So obviously the origin is having to talk and orchestrate between the caches, right?

So when you have a conflicting call to a file that’s being accessed at a different cache, they has to recall that delegation, so you might see a little pause at the beginning of a write or something like that while it’s getting the delegation from the old cache and giving it to the new cache and obviously we’re not gonna pull the delegation, make a client stop writing if they’re not done writing. But that’s normal locking behavior.

Justin Parisi: All right. So if I’m trying to stand up a FlexCache, what should I not be doing to make sure I have the best success?

Elliott Ecton: Yeah, so you want some worst practices? I can give you some worst practices. The biggest one is you got to understand that Flexcache actually uses the same TCP ports that SnapMirror does. SnapMirror can be a bandwidth hog, so a lot of times your network team will throttle TCP ports 11105 and 11104.

Well, FlexCache uses the same TCP ports, and we don’t have any way for the networking team to differentiate it. There’s no cost values or anything like that. So if your network team is throttling SnapMirror on the network, guess what you’re going to do? You’re obviously going to throttle your FlexCache, and FlexCache is somewhat of a performance play, even if it’s just geo distributed, right?

Elliott Ecton: So you definitely don’t want them, you just go ask them nicely to unthrottle that there, and keep your promise and throttle it at the cluster level.

Justin Parisi: Is there a way to change the FlexCache ports? Or is that just kind of hard coded?

Elliott Ecton: It’s hard coded right now. Some other worst practices, you hit up on the name service one. That’s a big one. It’s not replicating your name services having both SVMs at the cache and the origin and be able to resolve the same thing, right?

Justin Parisi: What about number of FlexCaches? Should I stop at a certain number of FlexCaches? Do I have a limit there?

Elliott Ecton: Each origin can have up to a hundred FlexCaches, but only 10 of those can be write back. So you can have a hundred write around, or you can have up to 10 write back. And then that means you’d only have 90 write around. So a hundred is the max limit, no matter what the combination is.

But the limit for write back is 10.

Justin Parisi: Is that a hard limit where we just basically say no more? No more for you.

Elliott Ecton: It is a hard limit right now, yeah. We’re trying to, trying to walk before we run. Obviously, we do as much testing as we can in house, but when you release something like this to the wild, a lot more stuff starts rearing its head, so we’re trying to walk before we run. So that brings up another worst practice. Don’t just assume that you need write back. Most customers and rightfully so were like, well, why wouldn’t I just put everything in write back? You know, it makes sense. It seems like on the surface, that should be what you do, right? But there’s quite a bit more overhead to write back and it’s got some of those caveats, some that we already hit up, like the SnapMirrors and stuff like that. And the other thing is you may be very surprised with the performance you get in write around mode.

That’s because we have some optimizations on the intercluster network that make it function and perform faster than if you were writing a NAS protocol that far away. We have multiple TCP streams allow you to have more bytes in flight because you’re not limited to a 7.8 megabyte window on the other side. You’re limited to 12x 7.8 megabytes. That’s right. So you can get your bison flight up, which helps you get your throughput up. And also the fact that it’s a proprietary protocol, you’re not going to get that NAS overhead across the land. So always test write around and only if it’s performing. Subpar or not acceptable then go to write back. You should default to write around just because of the simplicity of it

Justin Parisi: Okay, that’s a good tip because I think that people hear write back. Oh, yeah I really really want that and then they try it and they’re like, man, maybe not

Elliott Ecton: Yeah, and also the biggest takeaway for the people that are listening to this is every workload is different. And everything is workload dependent, right? How it’s going to perform. FlexCache is more so just because there’s so many things in play, right? You don’t have just from the client to the storage. You also have from the storage to the origin and then you have the origin and all the other caches.

So there’s a lot more variables. Make sure you test your production workload or as close to it as you can in a non production environment.

Justin Parisi: How does this play with a static mount path. Like, so an application always needs to access / mount, but my FlexCaches might be anywhere in the world. Can I use the same mount point for all these FlexCaches? If I’ve got an auto mounter that basically calls the same path every time, am I going to have to go change all those auto mounters?

Elliott Ecton: You can use auto FS for NFS. That would be just fine. Now keep in mind, auto Fs gets stuck on whatever IP it’s resolved to.

And if it, if for some reason you needed to go to a different Mount Point, the client just wouldn’t try to go re-look up a new IP that’s in your auto FS map. But for SMB sites and services plus DFS is a great solution.

Justin Parisi: This could basically act like a DFS. The FlexCache.

Elliott Ecton: Yeah, absolutely. You don’t have to do any of the DFS replication or anything like that. Just make sure your sites and services set up and then no matter where you are, just access the DFS namespace and it’ll direct you to where you’re supposed to be.

And then if something goes down, DFS does its thing. It takes it out of the list of targets, right?

Justin Parisi: So, yeah. Awesome. Any other worst practices for us?

Elliott Ecton: I’m sure there are some Can’t think of them off the top of my head.

Justin Parisi: Worst practice number 10, listen to Elliott.

Elliott Ecton: Yes, don’t listen to any, everything you just heard.

Terrible practice, this whole

Justin Parisi: like, this whole 45 minutes you’ve wasted yourself. Yeah, goal achieved. That’s right. All right, cool. So are there any places I can find information about FlexCache?

Elliott Ecton: Yeah, on docs.netapp.com is our write back documentation. As you know, but probably most don’t. We’re going from traditional TRs to stuff integrated in docs.netapp.com. But our old TR for FlexCache is still out there, TR 4743. You can get that. For the historical information on FlexCache and the write back information at docs.netapp.com.

Justin Parisi: All right. Excellent. And again, if we wanted to reach you or reach a FlexCache DL, how do we do that?

Elliott Ecton: Ng-flexcache-info@netapp.com.

Justin Parisi: Alright, excellent. Well Elliott, thanks again for joining us and talking to us all about FlexCache in ONTAP.

Alright, that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast@netapp.com or send us a tweet @netapp. As always, if you’d like to subscribe, find us on iTunes, Spotify, Google Play, I heart radio, SoundCloud, Stitcher, or via techontappodcast.com. If you liked the show today, leave us a review. On behalf of the entire tech ONTAP podcast team. I’d like to thank Elliott Ecton for joining us today. As always, thanks for listening.

Podcast intro/outro: [Outro]

Tech ONTAP Podcast Episode 401 – NetApp FlexCache (Spring 2025)

Finding the podcast

Transcription

Cruising (1980) by William Friedkin

New illusion exhibition – Richard Wiseman

TheArabianMagazine.Com Podcast – In Conversation with… Scott Benjamin

You’ve got 99 problems but data shouldn’t be one

Why call one API when you can use GraphQL to call them all?

Programming problems that seem easy, but aren’t, featuring Jon Skeet

Leave a reply Cancel reply

Compare items

Shopping cart