Upload to Google Cloud Storage from a Bash Script

This is the buy one, get one free bonus prize from the AWS S3 upload script. Google Cloud has a (mostly) AWS-compatible mode as well as the OAuth 2.0 mode that is the native API. Connecting to OAuth is pretty involved1 and I’ve not seen it done directly from a shell script yet. Google do provide some Python tools for command line access2, but they need Python 2.73 and are both dog-slow and clunky.

You can’t get away from the command line tools totally with Google though because they haven’t really finished the interface to their cloud services4 and there are quite a few things that can’t be done at all with the web interface e.g. setting lifecyles on storage buckets5.

There is however, a useful consequence of Google Cloud being like AWS’s idiot step-child, the permissions set-up in AWS-compatible mode is MUCH easier than setting up permissions on AWS S36. This is all you have to do:

Google Cloud Storage Interoperability SettingsJust turn create a storage bucket, turn on interoperability mode for the project, copy down the key and secret and voila! The default permissions are those of the project owner, so read+write to the bucket just works.

The picture shows the view after interoperability mode is enabled. The key+secret can be deleted at any time, and / or further key+secrets created. Very easy.

So,  here is the script.

#GS3 parameters
GS3KEY="my-key-here"
GS3SECRET="my-secret-here"
GS3BUCKET="bucket-name"
GS3STORAGETYPE="STANDARD" #leave as "standard", defaults to however bucket is set up

function putGoogleS3
{
  local path=$1
  local file=$2
  local aws_path=$3
  local bucket="${GS3BUCKET}"
  local date=$(date +"%a, %d %b %Y %T %z")
  local acl="x-amz-acl:private"
  local content_type="application/octet-stream"
  local storage_type="x-amz-storage-class:${GS3STORAGETYPE}"
  local string="PUT\n\n$content_type\n$date\n$acl\n$storage_type\n/$bucket$aws_path$file"
  local signature=$(echo -en "${string}" | openssl sha1 -hmac "${GS3SECRET}" -binary | base64)

  curl --fail -s -X PUT -T "$path/$file" \
    -H "Host: $bucket.storage.googleapis.com" \
    -H "Date: $date" \
    -H "Content-Type: $content_type" \
    -H "$storage_type" \
    -H "$acl" \
    -H "Authorization: AWS ${GS3KEY}:$signature" \
    "http://$bucket.storage.googleapis.com$aws_path$file"
}

It is very similar to my earlier S3 post and usage is exactly the same. The upload speed seems very similar to S3, which is not that surprising as I’d expect their network infrastructure to be a similar scale and capability.

As to which of the two services is better, I haven’t got a clue. I think for a large-scale enterprise user AWS would win every time on the superiority of the tools, stability of platform and the fact that they offer proper guarantees of service etc. Google is much more of a beta product, the docs are full of warnings that the interface could change at any time and there don’t appear to be any warranties.

For a non-pro user the Google cloud storage is easier to use, in AWS-compatibility mode at least, so I think it’s a good choice for backup storage in less mission-critical applications. I’m using it for one of my applications and I haven’t had any issues yet.

 

Postscript:

I was finished, but I thought I’d just add a quick note on setting bucket lifecycle. First a json config file has to be created with the lifecycle description e.g. this is a file I call lifecycle_del_60d.json:

{
  "rule":
  [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 60}
    }
  ]
}

Then the gsutil command needs to be run to set lifecycle on a bucket: gsutil lifecycle set lifecycle_del_60d.json gs://my-bucket-name

…and that is that, job done. Files older than 60 days in the bucket will be automatically deleted.

 

Upload to AWS S3 from a Bash Script

The Big River’s cloud storage is very good and cheap too, so it is an ideal place to store backups of various sorts. I wanted to do this upload from a Bash script in as simple a way as possible.

I have previous with the API having used it from PHP, both using the AWS SDK and also rolling my own simplified upload function. That wasn’t exactly easy to do7, so I didn’t want to go there again.

A bit of Googling came up with this Github gist by Chris Parsons which is almost exactly what I needed. I just had to parameterise it a bit more and add the facility to specify storage class and AWS region 8.

Here is the finished result:

#S3 parameters
S3KEY="my-key"
S3SECRET="my-secret"
S3BUCKET="my-bucket"
S3STORAGETYPE="STANDARD" #REDUCED_REDUNDANCY or STANDARD etc.
AWSREGION="s3-xxxxxx"

function putS3
{
  path=$1
  file=$2
  aws_path=$3
  bucket="${S3BUCKET}"
  date=$(date +"%a, %d %b %Y %T %z")
  acl="x-amz-acl:private"
  content_type="application/octet-stream"
  storage_type="x-amz-storage-class:${S3STORAGETYPE}"
  string="PUT\n\n$content_type\n$date\n$acl\n$storage_type\n/$bucket$aws_path$file"
  signature=$(echo -en "${string}" | openssl sha1 -hmac "${S3SECRET}" -binary | base64)
  curl -s -X PUT -T "$path/$file" \
    -H "Host: $bucket.${AWSREGION}.amazonaws.com" \
    -H "Date: $date" \
    -H "Content-Type: $content_type" \
    -H "$storage_type" \
    -H "$acl" \
    -H "Authorization: AWS ${S3KEY}:$signature" \
    "https://$bucket.${AWSREGION}.amazonaws.com$aws_path$file"
}

This works a treat on the small-to-medium sized compressed files that I use it for.

The only real snag with this function is that it doesn’t set an exit code, so it’s not easy to exit a script if the upload fails for some reason. I got round it by grepping the output of the curl and looking for the text “error”. This finds all the cases I tried apart from incorrect region.

The last line is changed to the following:

"https://$bucket.${AWSREGION}.amazonaws.com$aws_path$file" --stderr - | grep -i "error"
 #the --stderr bit onward is so that a return code is set if curl returns an 
 #error message from AWS
 #note that the value of the code is inverted from usual error interpretation: 
 #0=found "error", <>0=not found "error"

I wasn’t entirely happy with this rather crude workround, but it did achieve the desired result.

Postscript:

I later found and tested the –fail option for Curl that sets the exit code properly. That means the grep hack isn’t needed to allow error detection and the normal testing of the $? bash variable will work.

(I also tidied up and made the function variables proper locals – see the Google cloud version for example Google cloud version)