In the first part of this two-part tutorial series, we had an overview of how buckets are used on Google Cloud Storage to organize files. We saw how to manage buckets on Google Cloud Storage from Google Cloud Console. This was followed by a Python script in which these operations were performed programmatically.
In this part, I will demonstrate how to manage objects, i.e. files and folders inside GCS buckets. The structure of this tutorial will be similar to that of the previous one. First I will demonstrate how to perform basic operations related to file management using Google Cloud Console. This will be followed by a Python script to do the same operations programmatically.
Just as bucket naming in GCS had some guidelines and constraints, object naming follows a set of guidelines as well. Object names should contain valid Unicode characters and should not contain Carriage Return or Line Feed characters. Some recommendations include not to have characters like "#", "[", "]", "*", "?" or illegal XML control characters because they can be interpreted wrongly and can lead to ambiguity.
Also, object names in GCS follow a flat namespace. This means physically there are no directories and subdirectories on GCS. For example, if you create a file with name /tutsplus/tutorials/gcs.pdf
, it will appear as though gcs.pdf
resides in a directory named tutorials
which in turn is a subdirectory of tutsplus
. But according to GCS, the object simply resides in a bucket with the name /tutsplus/tutorials/gcs.pdf
.
Let's look at how to manage objects using Google Cloud Console and then jump onto the Python script to do the same thing programmatically.
Using Google Cloud Console
I will continue from where we left in the last tutorial. Let's start by creating a folder.
To create a new folder, click on the Create Folder button highlighted above. Create a folder by filling in the desired name as shown below. The name should follow the object naming conventions.
Now let's upload a file in the newly created folder.
After the creation, the GCS browser will list the newly created objects. Objects can be deleted by selecting them from the list and clicking on the delete button.
Clicking on the refresh button will populate the UI with any changes to the list of objects without refreshing the whole page.
Managing Objects Programmatically
In the first part, we saw how to create a Compute Engine instance. I will use the same here and build upon the Python script from the last part.
Writing the Python Script
There are no additional installation steps that need to be followed for this tutorial. Refer to the first part for any more details about installation or development environment.
gcs_objects.py
import sys from pprint import pprint from googleapiclient import discovery from googleapiclient import http from oauth2client.client import GoogleCredentials def create_service(): credentials = GoogleCredentials.get_application_default() return discovery.build('storage', 'v1', credentials=credentials) def list_objects(bucket): service = create_service() # Create a request to objects.list to retrieve a list of objects. fields_to_return = \ 'nextPageToken,items(name,size,contentType,metadata(my-key))' req = service.objects().list(bucket=bucket, fields=fields_to_return) all_objects = [] # If you have too many items to list in one request, list_next() will # automatically handle paging with the pageToken. while req: resp = req.execute() all_objects.extend(resp.get('items', [])) req = service.objects().list_next(req, resp) pprint(all_objects) def create_object(bucket, filename): service = create_service() # This is the request body as specified: # http://g.co/cloud/storage/docs/json_api/v1/objects/insert#request body = { 'name': filename, } with open(filename, 'rb') as f: req = service.objects().insert( bucket=bucket, body=body, # You can also just set media_body=filename, but for the sake of # demonstration, pass in the more generic file handle, which could # very well be a StringIO or similar. media_body=http.MediaIoBaseUpload(f, 'application/octet-stream')) resp = req.execute() pprint(resp) def delete_object(bucket, filename): service = create_service() res = service.objects().delete(bucket=bucket, object=filename).execute() pprint(res) def print_help(): print """Usage: python gcs_objects.py <command> Command can be: help: Prints this help list: Lists all the objects in the specified bucket create: Upload the provided file in specified bucket delete: Delete the provided filename from bucket """ if __name__ == "__main__": if len(sys.argv) < 2 or sys.argv[1] == "help" or \ sys.argv[1] not in ['list', 'create', 'delete', 'get']: print_help() sys.exit() if sys.argv[1] == 'list': if len(sys.argv) == 3: list_objects(sys.argv[2]) sys.exit() else: print_help() sys.exit() if sys.argv[1] == 'create': if len(sys.argv) == 4: create_object(sys.argv[2], sys.argv[3]) sys.exit() else: print_help() sys.exit() if sys.argv[1] == 'delete': if len(sys.argv) == 4: delete_object(sys.argv[2], sys.argv[3]) sys.exit() else: print_help() sys.exit()
The above Python script demonstrates the major operations that can be performed on objects. These include:
- creation of a new object in a bucket
- listing of all objects in a bucket
- deletion of a specific object
Let's see how each of the above operations looks when the script is run.
$ python gcs_objects.py Usage: python gcs_objects.py <command> Command can be: help: Prints this help list: Lists all the objects in the specified bucket create: Upload the provided file in specified bucket delete: Delete the provided filename from bucket $ python gcs_objects.py list tutsplus-demo-test [{u'contentType': u'application/x-www-form-urlencoded;charset=UTF-8', u'name': u'tutsplus/', u'size': u'0'}, {u'contentType': u'image/png', resp = req.execute() u'name': u'tutsplus/Screen Shot 2016-10-17 at 1.03.16 PM.png', u'size': u'36680'}] $ python gcs_objects.py create tutsplus-demo-test gcs_buckets.py {u'bucket': u'tutsplus-demo-test', u'contentType': u'application/octet-stream', u'crc32c': u'XIEyEw==', u'etag': u'CJCckonZ4c8CEAE=', u'generation': u'1476702385770000', u'id': u'tutsplus-demo-test/gcs_buckets.py/1476702385770000', u'kind': u'storage#object', u'md5Hash': u'+bd6Ula+mG4bRXReSnvFew==', u'mediaLink': u'https://www.googleapis.com/download/storage/v1/b/tutsplus-demo-test/o/gcs_buckets.py?generation=147670238577000 0&alt=media', u'metageneration': u'1', u'name': u'gcs_buckets.py', u'selfLink': u'https://www.googleapis.com/storage/v1/b/tutsplus-demo-test/o/gcs_buckets.py', u'size': u'2226', u'storageClass': u'STANDARD', u'timeCreated': u'2016-10-17T11:06:25.753Z', u'updated': u'2016-10-17T11:06:25.753Z'} $ python gcs_objects.py list tutsplus-demo-test [{u'contentType': u'application/octet-stream', u'name': u'gcs_buckets.py', u'size': u'2226'}, {u'contentType': u'application/x-www-form-urlencoded;charset=UTF-8', u'name': u'tutsplus/', u'size': u'0'}, {u'contentType': u'image/png', u'name': u'tutsplus/Screen Shot 2016-10-17 at 1.03.16 PM.png', u'size': u'36680'}] $ python gcs_objects.py delete tutsplus-demo-test gcs_buckets.py '' $ python gcs_objects.py list tutsplus-demo-test [{u'contentType': u'application/x-www-form-urlencoded;charset=UTF-8', u'name': u'tutsplus/', u'size': u'0'}, {u'contentType': u'image/png', u'name': u'tutsplus/Screen Shot 2016-10-17 at 1.03.16 PM.png', u'size': u'36680'}]
Conclusion
In this tutorial series, we saw how Google Cloud Storage works from a bird's eye view, which was followed by in-depth analysis of buckets and objects. We then saw how to perform major bucket and object related operations via Google Cloud Console.
Then we performed the same using Python scripts. There is more that can be done with Google Cloud Storage, but that is left for you to explore.
Comments