S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. If the asking charge per unit grows steadily, S3 automatically partitions the buckets every bit needed to back up college request rates.
  • S3 tin achieve at least three,500 PUT/Re-create/POST/DELETE and 5,500 Get/Head requests per second per prefix in a bucket.
  • If the typical workload involves only occasional bursts of 100 requests per second and less than 800 requests per second, AWS scales and handle it.
  • If the typical workload involves a request charge per unit for a bucket to more 300 PUT/List/DELETE requests per second or more 800 Get requests per second, it's recommended to open up a support instance to prepare for the workload and avert any temporary limits on your request rate.
  • S3 best practice guidelines can be practical only if you are routinely processing 100 or more requests per second
  • Workloads that include a mix of request types
    • If the asking workload is typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate fundamental names for the objects ensures better operation by providing low-latency access to the S3 index
    • This behavior is driven by how S3 stores key names.
      • S3 maintains an alphabetize of object primal names in each AWS region.
      • Object keys are stored lexicographically (UTF-8 binary ordering) across multiple partitions in the index i.e. S3 stores key names in alphabetical order.
      • Object keys are stored in beyond multiple partitions in the index and the key proper name dictates which partition the key is stored in
      • Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that S3 volition target a specific partition for a big number of keys, overwhelming the I/O capacity of the partition.
    • Introduce some randomness in the key name prefixes, the primal names, and the I/O load, will exist distributed across multiple index partitions.
    • It likewise ensures scalability regardless of the number of requests sent per second.

Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
  • Transfer Dispatch takes advantage of CloudFront'due south globally distributed edge locations. As the information arrives at an edge location, data is routed to S3 over an optimized network path.

Get-intensive Workloads

  • CloudFront tin can be used for performance optimization and can help past
    • distributing content with low latency and high information transfer rate.
    • caching the content and thereby reducing the number of direct requests to S3
    • providing multiple endpoints (Edge locations) for data availability
    • available in two flavors as Web distribution or RTMP distribution
  • To fast data transport over long distances between a client and an S3 bucket, use S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to meliorate the upload and download functioning as well as the ability to recover in example it fails
  • For PUTs, Multipart upload tin assistance meliorate the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload, needs to be re-uploaded
    • ability to pause and resume uploads
    • brainstorm an upload before the Object size is known
  • For GETs, the range HTTP header tin assistance to improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the role that failed to download needs to be retried.

Listing Operations

  • Object central names are stored lexicographically in S3 indexes, making it difficult to sort and manipulate the contents of List
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index exterior of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather than performing operations on S3

Security

  • Utilize Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) to delete
  • Versioning does not prevent Bucket deletion and must be backed up as if accidentally or maliciously deleted the data is lost
  • Use Aforementioned Region Replication or Cross Region replication feature to backup data to a different region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • assistance institute a private connection between VPC and S3 and the traffic never leaves the Amazon network

Refer blog postal service @ S3 Security Best Practices

Cost

  • Optimize S3 storage toll by selecting an appropriate storage class for objects
  • Configure appropriate lifecycle management rules to motility objects to unlike storage classes and elapse them

Tracking

  • Apply Event Notifications to be notified for any put or delete asking on the S3 objects
  • Apply CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 saucepan
  • Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored, and configure appropriate deportment

S3 Monitoring and Auditing Best Practices

Refer blog post @ S3 Monitoring and Auditing Best Practices

AWS Certification Exam Practice Questions

  • Questions are nerveless from Internet and the answers are marked as per my cognition and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, and so enquiry appropriately.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open up to farther feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a full size of around 100GB later compression. All files accept a size of one-2 GB and need to be uploaded to Amazon S3 every night in a stock-still time window betwixt 3am and 5am. Current upload takes nearly 3 hours, although less than half of the bachelor bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, and so extract the files in AWS
    4. Utilise AWS Import/Consign to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You await this bucket to immediately receive over 150 PUT requests per 2nd. What should you lot do to ensure optimal operation?
    1. Employ multi-part upload.
    2. Add together a random prefix to the cardinal names.
    3. Amazon S3 volition automatically manage performance at this scale.
    4. Apply a predictable naming scheme, such equally sequential numbers or date time sequences, in the primal names
  3. You have an awarding running on an Amazon Elastic Compute Cloud example, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor awarding performance. Which method will aid improve performance of your awarding?
    1. Enable enhanced networking
    2. Employ Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Shop Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so yous don't take to worry most it (Not enabled past default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned past another AWS account for redundancy
  5. A startup company hired you to help them build a mobile awarding that will ultimately store billions of image and videos in Amazon S3. The visitor is lean on funding, and wants to minimize operational costs, however, they have an ambitious marketing program, and await to double their current installation base of operations every six months. Due to the nature of their business, they are expecting sudden and big increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other data must you assemble from this client in gild to decide whether S3 is the correct option?
    1. You must know how many customers that visitor has today, because this is critical in understanding what their customer base volition be in two years. (No. of customers do not matter)
    2. You must discover out total number of requests per second at peak usage.
    3. You must know the size of the individual objects existence written to S3 in lodge to properly design the key namespace. (Size does non relate to the primal namespace design but the count does)
    4. In order to build the cardinal namespace correctly, y'all must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage visitor is deploying their awarding to AWS and irresolute their business model to support both free tier and premium tier users. The premium tier users will be allowed to shop up to 200GB of data and free tier customers will be immune to store only 5GB. The customer expects that billions of files will exist stored. All users need to be alerted when approaching 75 percent quota utilization and again at ninety per centum quota apply. To support the costless tier and premium tier users, how should they architect their application?
    1. The visitor should utilize an amazon unproblematic workflow service activeness worker that updates the users data counter in amazon dynamo DB. The action worker will use simple email service to transport an email if the counter increases to a higher place the advisable thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and so querying the stored objects tabular array for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS simply with and then many objects might not be a skilful selection)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should and so create a file watcher to iterate over each object and amass the size for each user and send a notification via Amazon Unproblematic Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon uncomplicated workflow service action worker will query all objects for a given user based on the bucket the information is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not viable as well every bit SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web awarding allow users to upload big files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling armada of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) Subsequently a file has been uploaded. it is copied to amazon unproblematic storage service(S3). Amazon Ec2 instances employ an AWS Identity and Access Direction (AMI) role that allows Amazon s3 uploads. Over the final half-dozen months, your user base and scale have increased significantly, forcing you to increment the car scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to arrange the compages where needed to improve optimize costs. Which architecture modify could you innovate to reduce toll and however proceed your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increment price)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely laissez passer the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will non provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Adhere a public IP accost for each EC2 case (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http wellness check to DNS load rest the app request this arroyo volition significantly reduce the cost by bypassing elastic load balancing. (ELB is non the clogging)
    4. Re-architect your ingest blueprint, take the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement customer-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows i to kickoff uploading straight to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will requite some randomness to start with instead of instaneId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

Reference

S3_Optimizing_Performance