Good morning, everybody. Welcome to re:Invent. How's everybody doing today? Ok, great. Excellent.

I'm Mark and Mary is sitting here in the front row. She's gonna join us in a little bit. I run our worldwide storage, go-to-market for AWS. And Mary is our senior, go-to-market specialist for Amazon S3.

We're gonna tell you a little bit about storage cost optimization, best practices. How many of you use S3? Ok, great. And how many of you use FSx? Ok, quite a few. Great. And EFS? Ok, good, good, good.

I'm also gonna do a little bit just to kind of get everybody going - where's everybody from? I've been talking to people in the audience and it's a tee up to the next slide. But I've talked to folks from Ukraine. I've talked to folks from Michigan. I've talked to some folks from China, Japan, India.

Anybody here from Australia or New Zealand? Ok, we got a couple. Great. Ok, anybody here from India? Great. Ok, good, good, good. Well, I wanted to get a sense of folks because one of the things that I wanted to open up with is it looks like we have a little bit of an international audience here and I count Michigan kind of as international, almost Canada, because I grew up there.

So almost a...so love Canada. My first job I had all of Canada as a territory. So I got to know everything from Halifax, Nova Scotia - any Canadians here? Oh great, I love Canada - so Halifax all the way to Surrey Memorial over by Vancouver and everything in between.

But since we have this international audience, I wanted to explain a saying that might be in Canada too, about “cats and dogs” - you know how it's “raining cats and dogs.” And what that means is there's just a lot of rain pouring down. It's very unique to the US.

But I'm on the East Coast now and it doesn't rain cats and dogs. California where I used to live, now it rains cats and dogs. So we have a lot of just, like there's a lot of rain.

We also have a lot, go with me here on this, go with me - there's a lot of Amazon AWS services that we have and we're not going to talk about all the cats and dogs of all our services because we've just got too many. But we are going to talk about a few.

We're also not going to talk about actual cats or actual dogs and optimizing them. So that's not what we're going to talk about today. And by the way, the cute one on the left is Mary's cat - that's Mimi. And I've got Harry, who's 7 years old, and Bailey, who's our 10 month old on the right. But we will not optimize them today - that's not what we're going to be talking about.

Instead, you will hopefully learn a little bit. I was talking to a few folks in the back room - look, we really appreciate your time. We know this is valuable. You've flown from all over the world. If you take away at least one thing from today's session that you can apply back in your environment today in AWS, then hopefully it's been successful and we really, really truly appreciate the investment that you've made.

So I want you to first - you know, first we're going to be talking about optimizing storage costs and efficiencies by moving to the cloud in the first place, because yes, this session is about optimizing costs when you're in the cloud. And you're like, how do I optimize S3? How do I optimize S3? But if you step back a little bit, you first have to think about what are you doing and how do you optimize your costs before you even move to the cloud?

And so we work with a lot of customers who are migrating on-premises infrastructure into the cloud. And often people are thinking three steps down the road about, ok great, I've already moved it. Now how am I optimizing it? Some of the best savings you can get are by architecting and thinking about up front what your application is, making sure that you're optimizing storage cost as your applications land.

And then we're going to go through some tools that you can use today. Mary's going to talk about S3 Storage Lens and some other tools that we'll talk about during the session. And finally, we'll give you customer examples of how they've benefited from both moving to AWS, and then also after they've migrated, optimizing the storage once in the cloud.

And again, hopefully there's a few nuggets that you take away from this. So I'll leave this up here - there's a QR code that if you want to scan, ESG, Enterprise Strategy Group, conducted a study last year on migrating infrastructure from on-premises data centers to AWS. And what they saw was that they reduced storage costs by up to 69% by moving from on-prem into AWS.

And so what's interesting, and I kind of find, we actually, if you dig into this report, overall cost savings - meaning overall shifting from on-premises into the cloud - was more 65, 66, 67%, but storage actually was 69%. So customers achieved more cost savings by moving to the cloud according to ESG and the interviews that they did earlier this year.

And so why is that? Well, when you look at cloud pricing, and I talk to customers all over the world every single day, and it's tempting for you to just, as we go back to basics, just think about basically the per-gig cost. I get this all the time - it's like, hey, I went to Best Buy or I went somewhere and I bought a hard drive and it's only x dollars per gig. How come this is x?

Well, you know, when you think about it, you have to think about everything in the cloud. You think about, first of all, if you build out on-premises, you've got a petabyte of raw storage, then of that, maybe 800 terabytes is available after RAID, formatting, and file systems.

And then what you have to do is buy ahead, right? So anticipating what you're going to buy, we often see cost - I was talking to a customer, they said they were pretty efficient, they bought about 90 days ahead. But again, a few years ago, people were buying maybe up to a year ahead. And so that's costing essentially you're paying for that storage that you're not using and it's underutilized.

And then finally the actual data usage you might have 400 terabytes. What we see on average is around 60-65% utilization across the board in on-prem storage - your mileage, if you're really great, you're probably going to do better at that. But that's kind of an average that we've seen with storage in the cloud.

And this goes for all of our services where you pay as you grow, like S3, EFS - you're only paying for the data usage at the time that you're using it, right? So it's an important kind of TCO element and cost. And it isn't really a cost savings as much as thinking about what are your costs when you're comparing on-premises to the cloud and migrations - it's more than the per-gig cost. Hopefully that makes sense.

So once you've moved to the cloud, you're going to get some savings. What customers find is really about 20% at the beginning migrating to the cloud, and that's just lifting and shifting. So typically we'll see people move to say EC2, S3, and EBS, and then what they'll do is they'll optimize a bit their existing infrastructure applications and also using S3 storage like S3 Intelligent Tiering or S3 Glacier Instant Retrievals.

And by doing all of that, and also using additional volume types, we're going to see another 20-30% savings. And finally, then customers in the last part of the migration is really thinking about using advanced services and storage - in the storage world, about using managed services like FSx, but also using serverless architecture. Say for example, to really get efficiencies not only in the cost, but as we'll see in a few minutes in a customer slide, time to market.

So by using serverless architectures like Lambda, you can achieve instead of taking weeks or months to generate applications or drive application updates, you can reduce that to days.

So let me talk about a few customers who have migrated and now we're talking about all-up migration from on-premises into the cloud.

So Cisco - they're a large food services provider. They're based, they market and distribute to restaurants, healthcare, and education facilities. And they're doing that in over 90 countries. What they did is they enabled and took all their different siloed data centers and all the storage - we see this as a common pattern where you have many data centers or even in one data center, silos of storage - and what do you get? You get a lot of duplication of data, right? And you don't get optimal, like we showed in the first slide, optimal utilization - and that just multiplies over and over and over again by the amount of storage that you have in each data center as well as across multiple data centers.

So by moving to Amazon S3 and Amazon S3 Glacier, they reduced their storage cost by 40%. And this is a number from Cisco.

A second example is FINRA, who's a great customer of ours. If you have not seen it, there was an article about earlier this year, I want to say in March of this year, in CIO Magazine with the CIO of FINRA. And FINRA moved about 90% of their data to AWS to capture, analyze, and offer different services and store all of their daily influx of 37 billion records per day.

And what they calculated is that running in AWS versus their own private data centers reduced and made it 40% less expensive. This is actually from the article in CIO Magazine. And then they also, more recently, by using S3 - not to steal Mary's thunder - but S3 Intelligent Tiering, they reduced their annual savings by another 35%.

So FICO - who knows who FICO is in the US? Everybody. I know, I just bought a house so I know. So I got it and I wasn't declined, so very happy about that. They are the largest provider of consumer credit reports and they actually produce reports and financial information for 95% of the largest financial institutions in the US.

They migrated and they moved their MyFICO.com, which maybe you've seen, and also their Decision Management, which is their DMS, to AWS. And what they figure is by moving their DMS solution to AWS, they reduced their development cycles - going from months to under a day - by using Lambda and other serverless architectures, but also still meeting their stringent security and compliance requirements as you can imagine.

And finally, GE Healthcare - GE Healthcare and GE in general are large AWS customers. They chose AWS for their GE Health Cloud and they have literally millions of IoT devices, including over 500,000 GE imaging systems, that are feeding data and storing data.

And one of the things that they did an internal study and GE showed that up to 35% of patient cases were misdiagnosed and they attributed that partly due to a lack of access to images, data, and records. So by having it all available, allowing physicians to collaborate in this in the GE cloud, they believe that they can reduce that amount of misdiagnosis.

And finally, they did another study - GE Healthcare - and according to GE Healthcare, offering better interoperability between the systems could save the healthcare ecosystem - meaning all of the hospitals and all of the doctors - over $30 billion per year in cost savings.

So I'm going to talk to you a little bit about now, and this is an important slide where we talk about our storage portfolio. So first you've got data protection, you've got our storage portfolio, you have data at work, which is all the facilities and all the services that are using storage like our analytics or machine learning, which is obviously something that is top of mind for folks, and then data getting data in and out of AWS.

So we have things like data sync, snow family transfer family. We also have analytic solutions that are pipeline solutions like and manage kafka to get data into aws storage and out of aws storage. And this is probably one of the more important things. It's one of the key areas that we can help customers and i talk to customers all the time is by choosing the right storage to begin with.

It sounds counterintuitive with many of our customers asking how they can optimize their data already in aws ebs or s3. And how do i reduce my costs? We get that a lot. But one important decisions is making the right storage choice up front and making sure that it meets your application and workload requirements. And it can be super counterintuitive because you might get a higher cost per gig storage service that actually might be lower cost in terms of tco.

So as an example of that, if you're doing anybody doing ml training or analytics, workloads, machine learning. A few folks. Ok. So one thing is if you have a slower storage tier and you have a expensive cpu, right? If you're not getting the data into the cpu fast enough, then your cost of cpu is much higher. So what you want to do is really make sure that you utilization or gp u utilization is really saturated as much as you can. And if you have a slower performance, maybe less expensive performance storage here, you might be a little bit on the usage cost, but basically you're paying for it in the compute.

So something counterintuitive, just little nuggets that you want to think about the whole system of cost, right? So we have the right storage solutions for just about any application, we have ebs for databases, for high performance, for unstructured data. We offer a range of file services including efs fsx and then for massive scalability and availability of data anywhere in the world, high durability, we have amazon s3.

So matching your app requirements to the right storage is the first and one of the most important things that you can take away from the session. So while we don't have time to go through all the options today, let me provide a few, all the cats and dogs. I'm not going to go through everyone. Just a few of the cats and a few of the dogs. Let me provide a few choices where we see customers making the wrong potentially costly choices. This is from my own experience from our own experience that we see every day.

First example is where high performance low latency workloads with millions of reader rights per hour. Normally, what we see is sometimes you see customers default to ebs for the use case, but let's double click on that. Don't default to ebs gp two, right? So how many of you are using ebs gp three? Ok. So if you're not, you probably should be because for the same basically capabilities, you're spending 20% less on gp three. So we introduced it a few years ago and customers are saving today by moving from gp two to gp three for high performance workloads.

We introduced ebs io two bx and we also have it available for additional instance, types. And are you using anybody using io one i one ebs? Ok. So think about also from a durability costing, you know, throughput, all the different dimensions and also just volume capacity. Think about io two bx and i'm sure that you are and maybe we can talk afterwards about about why not. But io two bx for if you're on io two and io one, a second area that we see in lift and shipped workloads is a, is a focus on getting to the cloud pretty quickly. Uh and usually means that we have new workloads on ebs and s3.

Um so we have how many of you? I know we did a number of storage assessment services. How many of you have participated in aws storage assessment service? Ok. So afterwards come and see me, we'll get you. Basically, it's a free service that we can go and look at your on premises footprint and then it'll do a mapping of your on premises footprint to the best choice or maybe a few choices of what that equivalent might be in aws. And you might be surprised at where the data recommendations are.

For example, if you have on premises, nas arrays, normally you might just move it over, dump it down to s3. But if you have file system, the symantec requirements, posi requirements, etc, then you might have put it on to say ebs and put a file system on top of it. But maybe you didn't even consider fsx. So if you need, for example, multi az because you want to have it across, so you want to have the availability and durability profile of multiple availability zones. In order to do that on ebs, you're going to have to set a file system up on ebs and you're going to have to somehow replicate it to another a z. And why don't you just let fsx, which is one of managed service isn't available for netapp, ontap, luster zf s and windows do the work for you.

So again, it's not always obvious up front because it's just that cost per gig. The point here is double click and make sure we've introduced dozens of new storage offerings in the last 5 to 10 years. Make sure you're doing the double click on what the latest options are. And again, finally, make sure that you're looking at the overall cost.

So the other dimension, second dimension is don't just look at usage, don't look at the per gig price only also look at api costs because if you have really high, i deal a lot with customers of video applications like wow, my api costs are the really small files and they're super high api costs. So it's like, well, let's look at, don't necessarily just put it on s3, think about maybe a tier of storage and if you have file system requirements, look at our managed file system, otherwise look at ebs which doesn't have api costs. So think about it because before you think about, oh, it will automatically default to this low cost storage tier. You might see this api cost tier super high.

The other little nugget that i see and these are little nuggets that i'm trying to pass along to you is file size matters, file size matters in general. For those of you in storage for a while, you all know about i node and all kinds of limitations and so on and small files are not great even more. So when you think we see a lot of customers going, hey great. Look at s3 glacier flexible retrievals. Look at, look at glacier deep archive. It's really, really inexpensive.

Well, if you have lots of small files then migrating those millions or sometimes i've seen billions of files, tearing those down into the tiers can actually increase. There can be higher life cycle costs associated with that. So one best practice is actually aggregating those smaller files into a larger object. And we've seen that can reduce from 10 to 100 x the cost of those life cycles or retrievals.

So again, these little things, so the takeaways from this is think about before you move, think about your application, don't always think about just usage and per gig costs, think about the other dimensions as well and also think about file size. Hopefully that's something that resonates. And even if one of those things, i see this every day that customers don't think about these things. And then it's like, whoa i got a huge bill for this. So think about it upfront. We're all here to help you as well. Architect, get together with your storage solution, architect or your aws journalist essay.

Ok. So that's a little couple tips tricks and everything. Hopefully to get you started. Let's dive a little bit into amazon ebs or block storage offerings. So we offer up to 256,000 iops with io two bx and we then can have really, really easy to use offerings, high reliability, unlimited data, secure, and super cost effective.

For example io two bx offers 256,000 iops and to 4000 megabytes per second of throughput per volume. So io two bx is pretty exciting. We also have been gradually adding the instance types supported. So for a while, it was more limited. Now we've added additional instant types of io two bx, we also introduced persistent reservations. So for those of you that are familiar with it for applications that are clustered applications, now you can have high performance clustered applications, leveraging a high performance capac tier like like ebs.

So very excited about the innovations with ebs and again, there might be counterintuitive. But if you can have one ebs volume with larger capacity and higher durability like io two bx, it might be more cost effective to do that than say multiple say i one volumes or other volumes. So that's just a few of the examples with the ebs.

So gp three volumes provide iops performance workloads. So we you know, i recommend you default unless you need to on gp two where there's a migration, you know, number one thing that you can shave costs is by using gp three. And basically, it's at 20% less than gp two.

So as i mentioned, ebs io two bx are offering up to 256,000 iops and the 4000 megabytes per second. And so it's got four x more iops and gp volumes and also to provide up to 100 x higher durability all at the same price as i one. So not to pick up the io one, but i think we've really made great strides. Please, please please think about io two bx.

The other thing that i often find that people don't think about because everybody defaults to gp volumes are st and c volumes. So consider volumes where you might have say colder pre archival workloads or data and analytics workloads that need existing high throughput and consistent, consistent reading if you will on the st and then colder archive workloads for c one, those again are a fraction of the cost of the ssd volumes.

And again, what i see over and over and over again, especially with migrations is you're in a hurry to get the data in, you dump it on to gp. But if you just spend a little bit more time up front and say, you know, 20 or 30% of this could go to st or sc magnetic volumes and then think a little, little bit further about gp three, you can save 2030 40 50% right in the architecture phase. It's going to be a lot harder to do that later because once you have your application up and going and live and everything, then you're going to have to go back and say, ok, great. And then maybe now you don't want to move it or so.

So i think that's why it's really important to think about the architecture up front. Um so as i said, people generally start with gp two volumes. Uh but now more recently gp three and let me give you a uh there's a great cost example here. Um and i'll build it out here for you. Um if you have a one terabyte gp three volume uh configured at 16,000, iops, it cost you around $65. The same configuration on gp two would cost you $513 per month because you would have to over provision for that. Iops.

So again, don't just think about the usage, think about all the other dimensions of the pricing as well. Super important. So take the time, take another minute and think about the other dimensions. The cool thing is we also introduced elastic volumes. So previously, you had to basically allocate and say, ok, i've got a volume provision at this capacity and i'm going to anticipate some future growth. Well, with elastic volumes, you can provision closer to what you think, what your actual storage usage is and expand your volumes as your needs change without any down time.

How many of you are using elastic volumes today? Ok. You think about it, this is another where you're basically you're not allocating the full volume, you're allocating a portion of it and then using elastic volumes to scale that again without any downtime. So you can have lower price volumes for less demanding workloads. And when vans pick up, you can also use elastic volumes to move in higher capability volume types.

So last area, one of the things that we see a lot are ebs snapshots. They're great. Hopefully everybody should be using ebs snapshots. If you're using ebs, they're point in time copies of your data and it's stored in managed s3 buckets, right? So it's in kind of behind the scenes under ebs snapshots, they can be used to instantiate multiple new volumes, expand the size of the volume or move volumes across availability zones.

So you've got two tiers, you've got a standard tier and there you have incremental backups and then you have an archive tier and the archive tier has full backups. The trade off with it is the archive tier is 75% lower cost, but restore times are a little bit higher and there's a 90 day retention period as well as an associated charge with the restorer. So you're going to think through it. But if you have from cost savings, snapshots archive is something because i look and talk to a lot of customers in their bills, double click when they look at the ebs bill and they double click on it often it's the snapshots.

So take a look and see if you can use ebs snapshots archive. So j and j deployed snapshot's archive and they estimate that they saved over 100,000 snapshots and they saved a million dollars annually.

So let me summarize and wrap up the cost efficiencies for ebs. And first of all ebs provides multiple volume types. Focus on gp two to gp three migration right, 20% right at the top elastic volumes don't over provision take advantage

And I saw the hands raised. Not a lot of you are using Elastic Volumes today. You should be and then EBS snapshots, snapshot archive. And we also have something called Data Life Cycle Manager to kind of move that through the different tier for you.

There are additional capabilities and snapshots. I know data protection is, you know, not the super exciting stuff and everything, but like you can actually get a lot, a lot of efficiencies while protecting your data, but at a much lower cost.

OK. So we talk about block storage. Now, let's talk about file. So we have a wide range of file services ranging from really Amazon EFS focused on cloud native workloads, Amazon File Cache when you want to burst to cloud for compute. And then we offer a range of services under the name Amazon FSx offer flavors and Windows and NetApp, Open EFS and Luster for different types of applications.

And again, this goes to think about which application and again, what I see often is customers will run something. I see any of you running your own Luster clusters on EBS today. You good. A couple of years ago, we had a lot of people doing that. How many people using FSx for Luster today though? A few. OK. Just a few, great.

We see this as a really a common pattern. For, for example, machine learning model training is taking storing the data in S3, moving it into FSx or hydrating it into FSx Luster. And then again, because of the high performance and also ability to have an instance, have multiple, basically multiple mount points for that Luster volume, then being able to saturate that PU back to that first point.

So really kind of a really good pattern, especially as you're considering some machine learning training, being able to take advantage of the multiple basically mount points and being able to have and parallelize that throughput from Luster into your CPUs.

So again, a range of solutions that we have that are available for any dimension. So let me just talk about some cost savings that you should be thinking about when think about file systems. And again, if you're just dumping a file system on EBS, this is some of the things that you get by thinking a little bit further and thinking about how can I use FSx or manage file system offerings.

So first of all deduplication and compression can reduce again depending on the data in anybody's been storage, some data is super deducible, some data is not deducible at all. But you know, on average, we'll say and again, your mileage may vary to 60% cost savings. Just depending on the use case example, like home directories, which often have probably a lot of maybe cat pictures with a lot of text files as well.

Uh so, uh I don't know, I have dog pictures and then Mary's got cat pictures. So, but you can save, uh again, maybe 25 you know, 25 to 30% you know, for those home directories. And sometimes again, if they're text heavy, uh 50% or more uh as data ages, we also have data typically stored on SSD using our life cycle management and capacity pooling. You can move from SSD to move the data into lower cost hard drive magnetic volumes, right?

And so if you deleted the data, the data that might break the application. But using tiring and pooling tiers, you can reduce the cost while still maintaining the application and user experience. So within FSx on tap, we have two tiers SSD and capacity pool and EFS has a standard and one zone tool and an infra access tier.

And we introduced a new tier which I'll talk about in a second also to further reduce your costs in EFS. So a couple of examples of customers who have saved money using our file services. So first of all, we had customers like Epic, a global legal services company tasked with reducing storage costs on their E discovery platform. They reduced 30% on by using FSx for Windows, by getting better utilization by using FSx versus what they were doing on premises.

AA needed a cloud file storage solution that enabled their global team to collaborate and share project data. What they did is they used FSx for on tap. How many of you are NetApp users on prem today, right? So it's like for like functionality. So again, all the workflows snap stuff that is such the great thing for NetApp that's all available in FSx for on tap in a fully managed environment.

They estimated that they were able to reduce their storage costs by 50% increase productivity by 30% by deploying their solution across 17 sites globally. So thinking about that and thinking about a managed service as you're thinking about it, you know, if you're especially running NetApp or Windows on prem JJ Johnson and Johnson was able to save 30% of their genomics on their genomics work flows by uh by deploying lifecycle management and Rev was able to move from self managed file systems to FSXz.

So ZFS and they reduced their storage cost and operational costs by 30%. So there's real world savings here. So diving into EFS for a little bit because I know I saw a lot of the raised hands for EFS, a couple of storage classes, standard storage, there's standard infrequent access and one zone infrequent access.

They all all the infrequent access classes, they have higher latency compared to say double digit microseconds compared to their counterparts. And there's an additional data access charge for those infrequent access. But by leveraging that plus intelligent tier, you can see the effect of cost can be reduced by up to 80 90% by using these fears.

If again, back to the point earlier, if you know that the data is going to go through and become colder. If you have super hot workloads and then you move it, you're probably going to get the opposite results. So it's really important to make sure that you know your hot workloads and your cold workloads and take advantage of the tiers as appropriate.

So just yesterday, did everybody see the announcement? I know there's a few and there's going to be a lot more. This week, we announced EFS archive tier the lowest cost storage for file and data. So EFS standard that we just talked about EFS infrequent access and then EFS archive for that rarely access data. You keep it for 30 days here, 90 days to the archived here.

And so similar to the equivalents and different tiers on the S3 side, you can save costs. EFS archive tier is complementary to the other tiers. And it's designed to be the first storage analog of Amazon's S3 coldest instantly accessible storage class S3 Glacier instant retrieval. So it's kind of the EFS twin of that.

And so EFS pricing is aligned with S3 Glacier insert retrievals for storage cost at rest. And again, it will vary. But in public regions that's 4/10 of a cent per gig per month and the ability to have customers still access their archive data in a single shared EFS file system, right alongside standard and I a tiers. So it's pretty exciting because we see that customers by using this new archive tier now can automatically save up to 84%.

So T-Mobile is a great example of a customer. A few of you might be using them or have heard of them. They're based in Bellevue, Washington. They're one of the largest telcos, I think the third largest wireless carrier in the US. And they were customizing and wanted to modernize their IT infrastructure to support better customer experiences.

They were also previously using open source solutions configured on two but that required a lot of operational overhead. So what they did, they chose Amazon EFS for its ability to easily and quickly create elastic storage for workloads and they realized 70% cost savings.

So some of the benefits, some of the soft benefits including the hard benefits are EFS doesn't require changing existing applications but allows for access through standard file system interface. EFS help T-Mobile meet its top goals of reliability, security and cost optimization.

And with AWS T-Mobile can address storage needs on demand, providing application development teams around the world with on demand file systems fully managed without having to have the operational burden.

So for FSx, a little double click on this as well, we can see that tiring is a key takeaway. So tier from SSD to HDD in the NetApp ontap, I saw a couple of NetApp customers out there. The automated built in tier within NetApp FSx on tap can reduce costs even further. You know, up to, you know, you can put up to 80 80 20 roll. You can save, you know, 3040 50% of your cost by moving it to the capacity pool.

So our CUM is a great customer who is using FSx on tap for exactly that they offer advanced data operations and analytics capability for financial services. And they're used by some of the world's most sophisticated asset managers and financial service organizations with a FSx for NetApp on tap.

Flex enabled AR Cesium to create near instantaneous space efficient database copies for system refresh purposes. The outcomes were 53% improvement in storage efficiency, five times faster, database refreshes and 80% cost savings in refresh costs.

So these are some of the things that hopefully again thinking through this first part, we can think about thinking about managed services, file services and picking the right type of managed service and file system for your application requirements can reduce your cost. TCO up front but also ongoing operationally.

And so with that, I'm going to turn that over to Mary to bring it on home with Amazon S3. Again. How many people use S3? OK. So I'm sure this will be a a crowd pleaser and we'll turn it over to Mary.

So intelligent tiering is literally optimizing every object individually. The most optimized scenario is going to be that you put data directly into intelligent tiering. You can literally specify it in your put request.

So objects are always going to start in the frequent access tier. If an object isn't accessed for 30 days, it will automatically transition to the infrequent access tier. If it's not accessed for 60 days, it will automatically move to the archive instant tier. 90 days to the archive tier. 180 days to the deep archive tier.

Now you opt into the archive tiers, you can choose one or both. You don't have to use them though. And you have some flexibility in terms of how many days intelligent tiering will wait before transitioning objects to those two tiers.

With intelligent tiering, there's no charge for the transition of data to the colder tiers. There's no charge for the retrieval of data. With one exception, there is no minimum retention period and there's no minimum billable object size.

There is a small monitoring fee that's unique to intelligent tiering and it's charged per 1000 objects. And this is how we cover the cost to constantly monitor objects and automate that tier process. We don't charge the monitoring fee for objects that are smaller than 128 kilobytes. And that's because those objects aren't going to auto tier, they'll always stay in the frequent access tier. And since you don't benefit from the auto tier, we're not going to charge the fee.

You can use intelligent tiering as a default workload for virtually any workload. The frequent, infrequent and archive instant tiers have that same millisecond retrieval performance as S3 Standard. And then the archive and deep archive tiers have the same retrieval performance as Glacier Flexible Retrieval and Glacier Deep Archive respectively.

So since the launch of intelligent tiering in 2018, our customers have saved over $2 billion just by letting it automatically optimize your storage class architecture. We get really excited about this number because customers tell us that they take these savings and they reinvest it in their business and it helps them grow and they can deliver new experiences to their customers.

The second category of workload has data with a predictable and a stable access pattern. So if you have a great deal of confidence and you understand your access patterns and you want to get more prescriptive about what data moves where it moves to. And when it moves, you should use S3 lifecycle policies.

A lifecycle configuration is a set of policies or rules that define actions that S3 applies to a group of objects. And there's two types of actions that you can accomplish with a lifecycle policy. Transition actions will move data to a colder storage tier. Expiration actions are going to delete data.

A well tailored life cycle policy can absolutely be more cost effective than intelligent tiering because what you're doing is you're speeding the amount of time it takes to get to those colder storage classes with their lower storage cost.

Lifecycle policies are triggered based on the age of an object as opposed to intelligent tiering where those transitions are triggered based on the last access state. So those are two very different things. So again, lifecycle policies are a great way to save money. But only if you understand your access patterns.

With lifecycle policies you pay for the transition and that's going to vary based on the storage class that you're transitioning to. The transition charges increase with each colder class of storage and they're charged per 1000 objects.

There are filters that are gonna help you use lifecycle policies more efficiently. You can establish life cycle rules and then filter them based on object tags or object prefixes. And this is gonna enable you to apply really fine grained life cycle management.

You can also set a filter that's only gonna allow objects of a certain size to transition. Object size is very important. Mark mentioned it and we're gonna talk more about it in a second.

If you have versioning enabled, it's possible that the percentage of your storage that's dedicated to noncurrent versions can grow out of control. So you can use a life cycle policy and expire noncurrent versions and you can do that based on their age or based on the number of noncurrent versions that you want to maintain.

You can also set a life cycle policy that will expire, incomplete multipart uploads and delete markers so that you don't have to worry about paying to store. those

I mentioned the importance of object size as object size increases for a fixed amount of storage life cycle transition costs and request costs will decrease. That's because transition and request costs are charged per 1000 objects.

So when you act on a fixed amount of storage or a fixed amount of data that's stored as lots of small objects versus fewer larger objects, your transition and your request costs will be higher.

So the rule of thumb is anything that's 100 and 28 kilobytes or smaller is a small object in S3. But if you're going to implement a life cycle transition, you should determine the number of objects that are going to move. And you should do the math.

As an example. If you're going to transition objects to Glacier Instant Retrieval, we recommend that objects be at least 256 kilobytes or larger to Glacier Flexible Retrieval and Glacier Deep Archive. One megabyte are larger. That's because the transition costs to those storage classes are the highest.

We recommend that customers batch small objects into larger objects if they can. And AWS Labs offers a few options that might help. The first is the Amazon S3 tar tool and it's available on GitHub. If you're using EMR AWS Glue will do the object compaction for you. There's a blueprint available also on GitHub and then if you're using Kinesis Firehose and you're streaming data into S3, the buffer size is configurable between one megabyte and 100 and 28 megabytes. And so we would recommend that you choose a larger buffer size.

We also offer that filter again. So if you apply that filter, you can prevent objects from moving based on their size. And so that's gonna help lower your life cycle transition costs and achieve your optimization objectives.

The S3 cost model favors larger objects. This is a great story. Alumina is a terrific customer of ours. Their innovative sequencing and array technologies are fueling really groundbreaking advancements in life science research, genomics and molecular diagnostics.

After only three months of using S3 Intelligent Tiering and lifecycle policies, they were able to quantify an 89% reduction in carbon emissions and a 60% reduction in storage costs. So really impactful results.

I want to talk to you about S3 Storage Lens in some detail. But first let me give you a quick overview of the S3 insights portfolio. These services are gonna help you better understand your storage from an organization wide consolidated view down to a very granular object level view.

Storage Lens is what we recommend as your first stop in gaining visibility into your storage because it has the broadest view, it's integrated with AWS Organizations. But once you've determined that there's a particular bucket of interest, we have other services that will help you dive deeper.

So CloudWatch provides centralized monitoring with the ability to set alarms based on metric thresholds that you choose. You decide what you want.

S3 Inventory is going to give you a manifest of all the objects in a bucket and then the metadata associated with those objects and then S3 Server Access Logs is gonna give you detailed request logging.

So as an example, you could use S3 Inventory with Amazon Athena and it would provide you with an analysis at the object level. Athena uses standard SQL expressions and it delivers results in seconds. It's commonly used for ad hoc data discovery.

So let's say you have all your data in S3 Intelligent Tiering and you wanna understand the distribution of data across those various access tiers. You could use this first SQL query and you get that list from S3 Inventory.

Or maybe you want to know the size distribution of objects in a bucket. You could use the second SQL expression and you could group objects by size.

Or perhaps you see in Storage Lens that not all the objects in a bucket are encrypted. And you want to get a list of the objects that are not encrypted. You could use this third SQL query against inventory and you get that data.

S3 Inventory with Amazon Athena is a great combination to help you make data driven decisions.

So, back to Storage Lens though. It provides more than 60 metrics and they give you the metrics in a highly visual and interactive dashboard, they're going to help you identify improvement opportunities, of course in cost optimization but also data protection, access control performance and more.

I have a colleague that always says you can't change what you don't measure. And within the context of S3 cost optimization S3 Storage Lens is the tool that we recommend to continuously monitor your entire S3 footprint and just monitor and watch the optimization actions that you take over time to make sure that they are returning what you were expecting from them.

There's a set of metrics that are free to all S3 customers and we refer to those as the free tier and then there's 35 additional metrics that are available if you'll upgrade and pay for them. And we refer to those as the advanced metrics and those are of course going to give you advanced capabilities.

So you're gonna get more history for better trending and analysis. You'll get 15 months of data as opposed to 14 days, you'll get deeper granularity. So you can look at data at the prefix level as opposed to just the bucket level. And you'll also be able to publish metrics to CloudWatch and to set alarms.

So I mentioned that Storage Lens provides this ability to drill down and view metrics that are aggregated at various dimensions. We just announced the availability of Storage Lens Groups on November 15th. Groups is part of the Storage Lens Advanced Metrics. It allows you to customize your aggregation level.

So for example, you could choose to aggregate metrics based on object tags or object size or even file extension. This feature is being covered in a number of presentations taking place this week.

Here's an example of something really useful that you can do with Storage Lens. This bubble chart lets you compare buckets across three dimensions. We've chosen object count on the x axis, retrieval rate on the y axis. And storage volume is represented by the size of the blue bubble.

You can see the highlighted bucket on the right of the chart. It has a low retrieval rate, but lots of objects and a high volume of storage. This is an ideal candidate to dig into because this might be an abandoned workload and you can take all that data and lifecycle it to one of our Glacier storage classes.

Our customers using CloudWatch for centralized monitoring, get more value when they use the Storage Lens, Advanced Metrics.

Customers that want to access the metrics via API either because they're developing their own application or maybe they want one of our analytics partners to have access to them can use the Advanced Metrics and accomplish those goals.

One of the many customers that's deployed S3 Storage Lens and experienced really tremendous benefits is Upstox. They're a leading Indian discount broker. They provide financial education and a digital platform for investment to more than 11 million customers.

So Upstox used S3 Storage Lens, Advanced Metrics in conjunction with some other AWS services to reduce their storage costs by 93%. They saved a million dollars.

Let me summarize what we've talked about relative to S3 and your storage class architecture. It's the single most important decision you're gonna make in terms of cost efficiency.

To make good storage class choices you need to understand your access pattern, your retention and your retrieval performance requirements and object size.

Intelligent Tiering is the right choice if your access patterns are unknown or changing. Use S3 lifecycle policies if your access patterns are known and stable. And use the filters to execute really fine grained life cycle management.

Pay attention to object size. The cost implications can be very significant. And make objects larger if you can. The S3 cost model favors larger objects.

And then use S3 Storage Lens. It's really going to help you make data driven decisions.

Ok? We've gone through a ton of information here in a really short amount of time. To learn more about how you can use AWS storage to gain economic and operational advantages, if you scan this QR code, it's going to give you access to an ebook where we're going to talk about the value AWS storage can bring to your organization. It covers what you should know about storage cost optimization and it provides best practices on how to get the most from your cloud storage spent and continue learning more about AWS storage, maybe make a learning plan in AWS Skill Builder or become an expert and get an AWS Storage badge.

The AWS Object Storage badge is going to give you a really broad and really deep understanding of S3 that is the one I recommend. And here's some other sessions that are taking place this week that are related to storage cost optimization. We thought that these might be valuable to you.

And then finally, thank you for your time. Please fill out the survey and remember that beautiful white and gray cat when you do it, it's Mimi. I work hard so my cat can have a better life.

Logo

Agent 垂直技术社区,欢迎活跃、内容共建。

更多推荐