
Simplify integration of S3 storage with local resources
Access Portal
At Amazon Web Services (AWS), the hybrid idea is by no means limited to allowing companies to extend the scope of their corporate network securely to the AWS Virtual Private Cloud with the help of various managed virtual private network (VPN) services or by way of AWS Direct Connect. The hybrid approach is hiding around almost every corner. The goal of AWS Storage Gateway is to provide local applications transparent access to the most important AWS storage service by far: Simple Storage Service (S3). AWS Storage Gateway extends existing enterprise environments with native integration into AWS services.
S3 Role
S3 is the oldest and most important cloud service. It existed long before the official launch of AWS in 2006. S3 is important to AWS as an object store, because the entire AWS service portfolio is multitiered, with higher level managed platform services built on the base of AWS infrastructure and foundation services, each of which needs to store data and states in some way.
Despite third-party solutions such as the S3 browser, S3 is primarily designed for direct queries, which means that AWS users can use S3 to perform advanced queries on stored data without extracting, transforming, or loading the data onto a separate analysis platform. Direct querying of data in S3 increases performance and keeps costs low for analysis applications that use S3 as a data pool. S3 has several direct query options, including the new S3 Select, Amazon Athena, or Amazon Redshift Spectrum.
With S3 Select and AWS Lambda, you can even create serverless apps that use direct processing of S3 Select. Amazon Athena, on the other hand, is an interactive query service designed to simplify analysis of the data in S3 with the use of standard SQL queries. Athena is also serverless, so customers do not need to set up and manage an infrastructure. Nevertheless, it is often inconvenient to allow on-premises applications to use AWS storage resources, which is why AWS introduced the Storage Gateway service in 2012.
Basics and Interfaces
AWS Storage Gateway was originally designed as a seamless cloud backup and disaster recovery solution for local data. Locally stored information is automatically saved in S3. The service simply allows hybrid storage between local environments and the AWS cloud. Organizations can seamlessly integrate local applications and workflows with Amazon block and object cloud storage services. The offer primarily provides for use scenarios such as backup, archiving, disaster recovery, cloud bursting, storage tiering, or the migration of data to AWS.
Technically, AWS provides the Storage Gateway in the form of a virtual machine (or EC2 instance) that the user starts on a local server or in their own data center. With the AWS Management Console, for example, gateway storage volumes with a capacity of up to 32TB can be created and integrated into existing systems as iSCSI devices. The user connects their local applications to Amazon S3 with the gateway appliance according to standard storage protocols like iSCSI or NFS.
The gateway not only provides space for volumes in AWS, but also for files and virtual tapes. For high-performance integration, you can find technologies such as an optimized data transfer mechanism, bandwidth management, automated network stability, and support for local cache storage, which enables even faster local access to the most frequently used data, with the data being permanently stored in the Amazon cloud in the background.
Each Storage Gateway supports three storage interfaces – file, volume, and tape – but can only serve one interface type at any given time. The Volume Gateway provides applications with block storage via the iSCSI protocol. These volumes are backed up in Amazon S3. The File Gateway lets users store and retrieve objects in Amazon S3 with file protocols such as NFS. In contrast to the Volume Gateway, the objects written by the File Gateway are directly accessible in S3. Finally, the Tape Gateway acts as an S3 entry point for classic backup applications by providing an interface for the iSCSI Virtual Tape Library (VTL), which comprises a virtual media changer, virtual tape drives, and virtual tapes. Virtual tape data can be either stored in Amazon S3 or archived in AWS Glacier.
Understanding the Volume Gateway
The Volume Gateway is basically an iSCSI target that creates volumes and assigns them to local servers (or EC2 instances) as iSCSI LUNs. The Volume Gateway can run in either cached or stored mode. Cached mode writes the user's primary data to S3 but keeps frequently accessed data in a local cache, allowing low-latency access.
In stored mode, the user's primary data is initially stored locally, so that all data remains available for quick access at all times. Backups in S3 only occur asynchronously in the background. In both modes, users can also create time-based and space-efficient snapshots of their volumes and store them in S3 for reuse at any time. Direct access to the volumes is not possible in this way. However, users can create new Elastic Block Store (EBS) volumes from the snapshots at any time and use them in AWS.
File Gateway Functional Principle
The File Gateway provides a virtual file server that can store and retrieve Amazon S3 objects by standard protocols such as NFS, allowing file-based devices and applications to use cloud storage transparently and without changes by simply offering a user's existing S3 buckets as NFS provisioning points. The applications then read and write files and directories via NFS, with the gateway translating file operations into object requests on the relevant S3 buckets.
The data last used is buffered on the gateway for quick access, and the data transfer to AWS is completely managed and optimized by the gateway. Unlike the Volume Gateway, users can immediately access their data in S3 as soon as synchronization with the gateway has taken place, including all S3 features such as life cycle policies, versioning, or cross-regional replication.
Tape Gateway Functional Principle
The Tape Gateway provides backup applications with a VTL interface consisting of a media changer and tape drives. The user creates virtual tapes in the VTL from the AWS Management Console. The backup application has read and write access to virtual tapes, which the user assigns to virtual tape drives with the virtual media changer. The virtual tapes from the relevant backup application are recognized by standard procedures for media inventory recording. Virtual tapes are also backed up by Amazon S3, whereas tape archiving takes place by AWS Glacier.
Prerequisites and Setup
If you want to try Storage Gateway, you need an AWS account. The gateway can be installed either on-premises or on an EC2 instance. In this article, I discuss the version to be installed locally, which AWS provides in the form of a virtual machine (VM). First, however, you have to choose one of the three types of gateway addressed in the deployment wizard.
Installing the VM requires at least 16GB of RAM, 80GB of drive space, and a quad-core processor on the host system. To install the Storage Gateway on the basis of an EC2 instance, you need at least one general-purpose instance (m3 or m4) of the size "xlarge." The gateway performs better with instance types i2, d2, c3, c4, or r3, depending on the workload characteristics of the applications accessing it.
In addition to the 80GB hard drive space for the VM itself, you will need additional drives if you are operating the Volume Gateway in stored mode, for example. You also need at least 150GB of caching space for the Volume Gateway, also in cached mode, and for the File and Tape gateways. Table 1 provides more information.
Tabelle 1: Gateway Types and Requirements
Type |
Cache (min) |
Cache (max) |
Upload Buffer (min) |
Upload Buffer (max) |
Additional Local Drives |
---|---|---|---|---|---|
File Gateway |
150GiB |
16TiB |
Not relevant |
Not relevant |
Not relevant |
Cached Volume Gateway |
150GiB |
16TiB |
150GiB |
2TiB |
Not relevant |
Stored Volume Gateway |
Not relevant |
Not relevant |
150GiB |
2TiB |
One or more for stored volumes |
Tape Gateway |
150GiB |
16TiB |
150GiB |
2TiB |
Not relevant |
In the next step, you can download the gateway appliance. For local use, AWS provides a VM as an image for ESXi (OVA), Hyper-V 2008 R2, or Hyper-V 2012. The suitable gateway Amazon Machine Image (AMI) for use under EC2 is available from the AWS AMI Marketplace. However, deploying the OVA file under ESXi 6.5 in the web client did not work out of the box to an invalid manifest file, although it was no problem to deploy directly in the host client. The disks must be "thick" provisioned, and the ESXi host must be in sync with an NTP time server. If these requirements are met, the VM should be ready for use after a short time.
Next, connect to the IP address of the gateway in the wizard (i.e., wait for its provisioning process). Finally, you have to enable the gateway by defining the gateway time zone and the gateway name. The region has already been determined. Finally, click Activate gateway.
From this point on, costs arise. However, the free AWS contingent includes the first 100GB of data stored via the gateway. Additionally, the usual data storage prices for S3, plus the Storage Gateway-relevant storage prices for volume storage, EBS snapshots, or storage or archiving on virtual tapes are offered. Moreover, requisition prices and data transfer prices for S3 are always due. As usual with S3, incoming data is free of charge. For outgoing data, AWS differentiates between "data from the AWS Storage Gateway service to your gateway provided by Amazon EC2" and "transfer of outgoing data from the AWS Storage Gateway service to your local gateway," for which the first gigabyte per month is free of charge.
Once the Volume Gateway has been activated successfully, the wizard acknowledges with Gateway is now active. However, Configured local disk will still show No local disks found. After clicking Save and continue, you can continue with further configuration in the Management Console.
Fine Tuning in the AWS Management Console
In the Gateways side tab, you will find the configured gateways with their respective statuses (Figure 1). To the right of the Create gateway button for creating additional gateways are the buttons for creating file shares, volumes (iSCSI Targets), and tapes. The list under these buttons shows the configured gateways. The example gateway here has the status Running with a warning sign, because no volumes have been created yet. You can either do this with the buttons mentioned above or with the corresponding side tabs for File shares, Volumes, and Tapes. Of course, shares and tapes can only be used if a File Gateway or Tape Gateway is created. I have created a Volume Gateway in this example, so I will move on to create the targets.

You can only create an iSCSI target if local storage is associated with the gateway, as indicated by the orange warning. The matching Edit local disks button appears directly next to the warning. For this purpose, I assigned two more VM disks (VMDKs) to the VM and then restarted it, which I could have done in the course of the deployment. It is now possible to assign local volumes. In cached mode, each local volume can be assigned to the Volume Gateway either as a cache volume or an upload buffer, but not as a stored volume, because the administrator must provide the appropriate gateway type from the outset. Once the volumes are assigned, the Volume Gateway no longer displays any warnings in the list.
Setting Targets
The iSCSI volumes mapped in S3 are still missing with regard to the connection on the AWS side. You can also create these in the Management Console by clicking Create volume. The associated dialog is clear-cut and leaves no questions unanswered. Creating the iSCSI target continues with the Challenge Handshake Authentication Protocol (CHAP) configuration and points out that the volume accepts connections from any iSCSI initiator without further configuration.
Typical of iSCSI, CHAP configuration is optional. Once the volume has been created successfully, it appears in the volume list with the status Available. Under Actions, EBS snapshots can then be created or scheduled at any time, and existing volumes can be deleted. You then connect the target to the local server via the displayed Host IP. All functions for controlling the gateway itself can be found under the Actions menu of the gateway list in the Management Console, where you can then set bandwidth limits or maintenance windows and, of course, stop or delete gateways.
Security Through Encryption
All data transferred between a gateway appliance and AWS storage are encrypted by SSL. By default, all data stored by the AWS Storage Gateway in S3 are encrypted on the server side with Amazon S3-Managed Encryption Keys (SSE-S3). You can also optionally configure each file share in the File Gateway so that your objects are encrypted with AWS KMS-managed keys by SSE-KMS.
By the way, creating and connecting the File Gateway and the Tape Gateway is no less intuitive that the process shown for the Volume Gateway. Also, I have only focused on the purely storage-related aspects so far. Of course, the Storage Gateway is integrated with encryption/KMS, identity and access management (IAM), or CloudWatch (monitoring), as is usual with AWS, and thus benefits from the security, manageability, durability, and scalability of AWS in general.
Performance
To evaluate performance, this example only allows conditional statements, because too many aspects are involved, especially the variety used as a cached gateway, with its sophisticated technologies that ensure synchronization in the background does not affect iSCSI performance. The performance is, of course, only as good or bad as the connection to the Internet (as far as the AWS site is concerned) and the underlying iSCSI connection to the host, implemented in the setup here with a VM kernel adapter at 1Gbps without multipathing.
Some test runs with classic Iometer workloads [1] on a VM filesystem (VMFS) datastore "cached" with AWS storage produced results that are not noticeably different from those with locally connected storage. With the underlying architecture described as 1x1Gbps to the iSCSI adapter with asynchronous DSL (89Mbps downstream, 32Mbps upstream copper/vectoring), the Exchange_64 workload (60 percent Read, 0 Random) achieved the following results:
*743.27 Read IOPS
*384.84 Write IOPS
*46.45Mbps Read
*24.95Mbps Write
These results are far from enterprise values but are in line with what could be expected from the underlying infrastructure.
Conclusions
AWS Storage Gateway allows effective use of S3 storage with local applications or workflows via the standard iSCSI and NFS protocols, making any adjustments to applications a thing of the past. Local cache and other technologies, such as intelligent buffering, upload management, network fluctuations, and bandwidth management, ensure sufficient performance.
Because the gateway is stateless, you can easily create new instances of your gateways as needed, and the gateway can also be natively integrated with AWS management services such as Amazon CloudWatch, AWS CloudTrail, AWS KMS, and IAM.
For companies that already use AWS resources on a large scale, the gateway is a very convenient way to simplify integration with local resources. However, anyone who considers the service a welcome ticket to cloud computing for the use of S3 with all its features from existing applications and workflows must be cautious in view of the complex pricing, as is usual with AWS. All this fun is not cheap, and if you do not have a long-term AWS strategy, you will probably also find cheaper ways to use cloud storage on the free market.