Please provide a brief description of your company.
Our company is WorkXpress. We make a cloud-based platform that allows nonprogrammers to build web-based software applications that range from simple front-end websites to sophisticated back-end business-processing sites. It's a noncode type environment – basically plug and play and drag and drop.
What is your role at WorkXpress?
I am the chief information officer.
What was the business challenge you were trying to address with the Amazon S3?
Mostly it was just a cloud resources issue. The majority of what we use Amazon S3 [Simple Storage] Service for is just mass storage of things, such as backups. We have a dozen or more cloud servers running, with hundreds of gigabytes on each one, and those apps are backed up nightly. We needed a centralized location to store those backups that could expand to an infinite size. We also wanted a high degree of redundancy, in that we didn't want to upload the backups in multiple places. We wanted to be able to rely on S3 as the exclusive place to put our backups. That's why we started using S3 several years ago.
Of course, we also needed great APIs [application programming interfaces] to be able to upload files and download them easily while, at the same time, having the process be transparent to the end user. We did not want the user having to wait for a bunch of crazy files to download. S3 really made the process not only totally elastic, but also made the APIs able to communicate with the service and grab directory tree structures and size information. All of that was important to us.
How did your company implement the Amazon S3?
Everything is behind the scenes. In the cases of backups, for instance, you trigger your backup, it locks the systems so we can get a consistent database snapshot, and then we package the whole thing. Essentially, we end up with a tar.gz file that has all the content of the backup, and then we just use the Amazon S3 PHP SDK [software development kit] to connect to them and upload the file. We also use MD5 because we had some trouble with corrupted files, either uploaded or downloaded.
Consequently, we had to add in some of the capabilities available through S3 where you can grab an MD5 of a file and upload it. Once the upload is done, we can ask Amazon to tell us the MD5 of that file. As long as it matches, then we're satisfied that the file is complete and not corrupted. We do that both on the upload side and on the download side to make sure the files are complete. Once we upload them, we delete them off our local systems, so it's important that the complete file be up there. It's all part of an automated programmatic system.
Was your company considering other platforms? Why did you choose the Amazon S3?
That's a great question. To be honest, if someone would ask me for a recommendation, I would probably tell him or her to review all of the platforms for themselves and try to think about his or her required feature set. We started using S3 because we originally used Amazon EC2 [Elastic Compute Cloud] for all of our clouds, until we found out they were not only the most expensive, but also the least performing.
When we found a provider that used SSD [solid state] drives and was a third of the cost of Amazon EC2, with just as good redundancy and just as good a reputation in the technical community, we ended up migrating all of our cloud from EC2 to the competition. To be honest, the competition did not have a file attachment type service, and we didn't really feel the need to move from S3. We still work with S3, even though we rarely work with Amazon's EC2. We may, however, do so in the future because we're going to look at re-implementing EC2 now that Amazon has SSD-based systems that would simplify some of their offerings.
EC2 is difficult to work with, but they were all that was available when we started. Now, there's a lot of competition out there that does just as well, and EC2 has improved itself. From an EC2 standpoint, we may begin working with Amazon's EC2 again at some point, but we didn't really feel any reason to move the files straight over to a competitor.
That's not really a shining review of S3, but I haven't even looked at the competition in terms of mass file storage because S3 does everything we need, so we just go with it.
Could you provide a sense of the size of your involvement with the Amazon S3 in financial terms?
I think we pay somewhere between $60 and $90 a month. Compared to the rest of the bills to run the business, it's nothing.
RESULTS & FEEDBACK
Can you share any statistics, metrics, or other feedback from your implementation of this platform?
There's nothing that I'm particularly unhappy about. We recently started to use Amazon Glacier, S3's long-term storage. In a couple of hours, my systems team was able to set up automatic migration from S3 to Glacier based on file age, and then we were able to update the restore code to detect when a file had moved to Glacier and bring it back down to S3 at the start of a restore. That all went smoothly. We didn't have a lot of trouble doing that.
I don't know if maybe earlier versions of Glacier would have been harder to work with, but this version was quite easy to work with, with the exception that we weren't told upfront that there was an early deletion penalty. Apparently, if you delete something off Glacier that's only been up there less than 90 days, you pay them money for that early deletion. Instead of saving money, our bill actually went up by 75 percent because we were paying less to store the files, but significantly more to delete them early. We had to make some changes there, and that wasn't really well explained, so we were disappointed in that.
The actual implementation was quite smooth. We don't do anything unusual with S3 – it's just put the files up there and bring them down. Now, we have the MD5 function all squared away, and we haven't had any problems with corruption. It's hard to mess up a file storage system if all three of those things work right.
Did the Amazon S3 have any features or tools that really impressed you?
The whole suite of AWS tools is good to work with. They have their website with all their different buttons across the top for the different services; that is quite good. We haven't had a lot of trouble navigating through that. We have hundreds of thousands of files on S3, and we can generally find what we're looking for.
We use their CLI [command-line interface] tool a bit, and we use the PHP SDK, as I said earlier. I don't know if that's an official Amazon thing or something we found online when we were just figuring out how to put together a package. Pretty much anything we needed, we were able to find. I'm happy with Amazon's overall support level. I can't say I would be saying the same thing if I had chosen any of the other competing products.
Looking back, are there any aspects of the software that you feel could be added or improved upon?
Honestly, I've had only two major problems with S3. One was handling very large files. We were having problems either with corruption or with lost data connections. If we split our files into smaller pieces, then they uploaded without any trouble. I don't know if this is a historic thing because we've been using S3 for the better part of eight to 10 years now. We were a fairly early adopter of their technology.
We've solved that problem already, and whenever we need to upload a file larger than 200 megabytes, we split it into however many pieces we need, and we track all the different pieces of that file. I would rather not have had that problem. I'd like to be able to send them a 50-gigabyte file and not have any problems. I don't know if they have fixed those issues, but today it wouldn't be an issue for us. When we solve a problem like that, we would just keep the solution in place and let it go.
The other thing that was a little surprising to me was that when I uploaded or downloaded a file through their APIs, they didn't wrap it in MD5 matching. I don't understand why I have to MD5 the file, then have to upload it, then ask S3 for the MD5 and make sure they match. If they don't, I have to upload it again. Shouldn't that be part of the uploading and/or downloading package, something you check before you start? When you're done, make sure it's good, and if not, tell me to try again.
I don't see why something as simple as confirming if the whole file got there or not couldn't be built into the API tools.
Having said that I was able to get around that myself, but it's just painful to loose potentially critical data because the upload, or download process doesn't have built in transfer guarantee. I wouldn't mind if the package threw an error and we had to catch that are retry ourselves; but we had to build all of that.I haven't had any business losses because of this problem – thankfully – but I could have. Obviously, we live and learn, and we made some changes to make the backups more hardened, but I would have preferred not to have needed to do that.
We have five additional questions and, for each question, we ask you to rate the software on a scale of one to five, with five being the best. What would you give the software for the functionality of its available features?
What would you give the software for ease of use or ease of implementation into your business?
What is your overall satisfaction with the platform?
Four and a half.
How likely are you to recommend the software to a colleague or similar business?
Five. I definitely would.