Contents

Tiered Access To S3 Data With Presigned URLs

Managing access to your Amazon S3 data is crucial for ensuring security and efficiency in your cloud architecture. You typically have three options for managing access to S3 data for client applications: using Amazon CloudFront distribution, leveraging S3 presigned URLs, or routing through backend APIs.

In this blog post, we will delve into S3 presigned URLs as an effective method for implementing tiered access to your S3 data. We’ll discuss their benefits, limitations, and provide a straightforward example to illustrate their use.

What Are S3 Presigned URLs?

S3 presigned URLs are a feature of the S3 service that allows you to grant temporary access to your S3 data without modifying S3 bucket policies. When access to an S3 object is requested, a URL is generated with an authorization token. This URL can be used by any client to upload or retrieve data from the specified S3 location.

Benefits and Limitations

Benefits:

  • Programmatic Access: Access is granted programmatically, eliminating the need for updating infrastructure
  • Flexibility: Authorization checks are handled by your backend API, which issues the presigned URL.
  • Offloading Backend Tasks: AWS manages data downloads/uploads, reducing the load on your backend.
  • Cost Efficiency: No need for setting up and maintaining CloudFront distribution.
  • Reusability: The URL can be reused multiple times within its validity period.
  • Expiration Control: URLs can be configured with expiration times to limit access duration.
  • Data Integrity: Support for checksums ensures data integrity during transfers.

Limitations:

  • Only File-Specific Access: Access is limited to single files; folder access is not supported.
  • Operation Constraints: Only PUT and GET operations are supported.
  • URL Changes: URLs are dynamically generated, which may require adjustments in your application to handle them.

Architecture

/tiered-access-to-s3-data-with-presigned-urls/architecture.png
Fig 1. Presigned Urls Architecture

To implement presigned URLs, you need an API that issues these URLs. This API is responsible for authenticating users and determining if they are authorized to access the requested object.

User requests PUT/GET access to s3 object. The API authenticates the user and verifies their permissions for the requested S3 object.

If the user is authorized, the backend generates a presigned URL with PUT or GET permissions for the object and returns it to the client. If the user is not authorized, the backend returns a 403 Forbidden status code.

The service responsible for issuing presigned URLs, such as an AWS Lambda function, must have the corresponding permissions to access the S3 bucket.

User can then use the generated presigned URL to interact with the S3 bucket using GET/PUT HTTP methods.

Example: Tiered S3 Access With Presigned Urls

In this example, we will deploy the sample architecture shown in Fig 1 and simulate tiered access to the S3 bucket.

You will need programmatic access to your AWS account and have AWS SAM installed.

Api With Cognito Authorizer

First, let’s define the API Gateway with a Cognito authorizer. To simplify the process, we will add two users directly into the SAM template. However, note that this approach is not recommended for a production environment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
 MyApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: !Ref StageName
      Cors:
        AllowMethods: "'*'"
        AllowHeaders: "'*'"
        AllowOrigin: "'*'"
      Auth:
        Authorizers:
          CognitoAuthorizer:
            UserPoolArn: !GetAtt UserPool.Arn

  UserPool:
    Type: AWS::Cognito::UserPool
    Properties:
      Policies:
        PasswordPolicy:
          MinimumLength: 8
          RequireLowercase: true
          RequireNumbers: true
          RequireSymbols: true
          RequireUppercase: true
      UsernameAttributes:
        - email
      Schema:
        - AttributeDataType: String
          Name: email
          Required: false

  UserPoolClient:
    Type: AWS::Cognito::UserPoolClient
    Properties:
      UserPoolId: !Ref UserPool
      GenerateSecret: false
      ExplicitAuthFlows:
        - ALLOW_USER_PASSWORD_AUTH
        - ALLOW_REFRESH_TOKEN_AUTH

  UserPoolUserOne:
    Type: AWS::Cognito::UserPoolUser
    Properties:
      DesiredDeliveryMediums:
        - EMAIL
      Username: !Ref CognitoUserOneEmail
      UserPoolId: !Ref UserPool

  UserPoolUserTwo:
    Type: AWS::Cognito::UserPoolUser
    Properties:
      DesiredDeliveryMediums:
        - EMAIL
      Username: !Ref CognitoUserTwoEmail
      UserPoolId: !Ref UserPool

Lambda issuer

Different access levels will be granted to different users. The first user will have access to both tier one and tier two content, while the second user will have access only to tier one content.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import json
import os
import re

import boto3
from botocore.exceptions import ClientError


def handler(event: dict, context: object) -> dict:
    key = event["queryStringParameters"]["key"]
    username = event["requestContext"]["authorizer"]["claims"]["email"]

    # access only to tier 1 and tier 2 folders permitted
    if not re.match(r"^(tier 1|tier 2)/.*jpg$", key):
        return {
            "statusCode": 403,
            "body": json.dumps(
                {"message": "Not authorized to access non tier folders."}
            ),
        }
    # limit first user to tier 1 content
    if key.startswith("tier 2") and username != os.environ["TIER_TWO_USERNAME"]:
        return {
            "statusCode": 403,
            "body": json.dumps({"message": "Not authorized to access tier 2 data"}),
        }

    try:
        # generate pre signed url
        s3_client = boto3.client("s3")
        presigned_url = s3_client.generate_presigned_url(
            "get_object",
            Params={"Bucket": os.environ["BUCKET_NAME"], "Key": key},
            ExpiresIn=3600,
        )

        return {
            "statusCode": 200,
            "body": json.dumps({"url": presigned_url}),
        }
    except ClientError as e:
        print(f"Error: {str(e)}")
        return {"statusCode": 500, "message": "Internal server error"}

To simplify the process, we will define the Lambda function directly in the SAM template.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
MediaBucket:
    Type: AWS::S3::Bucket
    Properties: {}

  IssuerFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.10
      Handler: index.handler
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref MediaBucket
      Events:
        ApiEvent:
          Type: Api
          Properties:
            RestApiId: !Ref MyApi
            Path: /presignedurl
            Method: GET
            RequestParameters:
              - method.request.querystring.key:
                  Required: true
                  Caching: false
            Auth:
              Authorizer: CognitoAuthorizer
      Environment:
          Variables:
            BUCKET_NAME: !Ref MediaBucket
            TIER_ONE_USERNAME: !Ref UserPoolUserOne
            TIER_TWO_USERNAME: !Ref UserPoolUserTwo
      InlineCode: |
        import json
        import os
        import re

        import boto3
        from botocore.exceptions import ClientError


        def handler(event: dict, context: object) -> dict:
            key = event["queryStringParameters"]["key"]
            username = event["requestContext"]["authorizer"]["claims"]["email"]

            # access only to tier 1 and tier 2 folders permitted
            if not re.match(r"^(tier 1|tier 2)/.*jpg$", key):
                return {
                    "statusCode": 403,
                    "body": json.dumps(
                        {"message": "Not authorized to access non tier folders."}
                    ),
                }
            #
            if key.startswith("tier 2") and username != os.environ["TIER_TWO_USERNAME"]:
                return {
                    "statusCode": 403,
                    "body": json.dumps({"message": "Not authorized to access tier 2 data"}),
                }

            try:
                # generate pre signed url
                s3_client = boto3.client("s3")
                presigned_url = s3_client.generate_presigned_url(
                    "get_object",
                    Params={"Bucket": os.environ["BUCKET_NAME"], "Key": key},
                    ExpiresIn=3600,
                )

                return {
                    "statusCode": 200,
                    "body": json.dumps({"url": presigned_url}),
                }
            except ClientError as e:
                print(f"Error: {str(e)}")
                return {"statusCode": 500, "message": "Internal server error"}

Deployment and Configuration

Resulting sam template template.sam.yaml is shown below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
Transform: AWS::Serverless-2016-10-31

Parameters:
  CognitoUserOneEmail:
    Description: Email address of the first created user
    Type: String

  CognitoUserTwoEmail:
    Description: Email address of the second created user
    Type: String

  StageName:
    Description: Api stage name
    Type: String
    Default: "dev"

Resources:

  MediaBucket:
    Type: AWS::S3::Bucket
    Properties: {}

  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: !Ref StageName
      Cors:
        AllowMethods: "'*'"
        AllowHeaders: "'*'"
        AllowOrigin: "'*'"
      Auth:
        Authorizers:
          CognitoAuthorizer:
            UserPoolArn: !GetAtt UserPool.Arn

  UserPool:
    Type: AWS::Cognito::UserPool
    Properties:
      Policies:
        PasswordPolicy:
          MinimumLength: 8
          RequireLowercase: true
          RequireNumbers: true
          RequireSymbols: true
          RequireUppercase: true
      UsernameAttributes:
        - email
      Schema:
        - AttributeDataType: String
          Name: email
          Required: false

  UserPoolClient:
    Type: AWS::Cognito::UserPoolClient
    Properties:
      UserPoolId: !Ref UserPool
      GenerateSecret: false
      ExplicitAuthFlows:
        - ALLOW_USER_PASSWORD_AUTH
        - ALLOW_REFRESH_TOKEN_AUTH

  UserPoolUserOne:
    Type: AWS::Cognito::UserPoolUser
    Properties:
      DesiredDeliveryMediums:
        - EMAIL
      Username: !Ref CognitoUserOneEmail
      UserPoolId: !Ref UserPool

  UserPoolUserTwo:
    Type: AWS::Cognito::UserPoolUser
    Properties:
      DesiredDeliveryMediums:
        - EMAIL
      Username: !Ref CognitoUserTwoEmail
      UserPoolId: !Ref UserPool


  IssuerFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.10
      Handler: index.handler
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref MediaBucket
      Events:
        ApiEvent:
          Type: Api
          Properties:
            RestApiId: !Ref MyApi
            Path: /presignedurl
            Method: GET
            RequestParameters:
              - method.request.querystring.key:
                  Required: true
                  Caching: false
            Auth:
              Authorizer: CognitoAuthorizer
      Environment:
          Variables:
            BUCKET_NAME: !Ref MediaBucket
            TIER_ONE_USERNAME: !Ref UserPoolUserOne
            TIER_TWO_USERNAME: !Ref UserPoolUserTwo
      InlineCode: |
        import json
        import os
        import re

        import boto3
        from botocore.exceptions import ClientError


        def handler(event: dict, context: object) -> dict:
            key = event["queryStringParameters"]["key"]
            username = event["requestContext"]["authorizer"]["claims"]["email"]

            # access only to tier 1 and tier 2 folders permitted
            if not re.match(r"^(tier 1|tier 2)/.*jpg$", key):
                return {
                    "statusCode": 403,
                    "body": json.dumps(
                        {"message": "Not authorized to access non tier folders."}
                    ),
                }
            #
            if key.startswith("tier 2") and username != os.environ["TIER_TWO_USERNAME"]:
                return {
                    "statusCode": 403,
                    "body": json.dumps({"message": "Not authorized to access tier 2 data"}),
                }

            try:
                # generate pre signed url
                s3_client = boto3.client("s3")
                presigned_url = s3_client.generate_presigned_url(
                    "get_object",
                    Params={"Bucket": os.environ["BUCKET_NAME"], "Key": key},
                    ExpiresIn=3600,
                )

                return {
                    "statusCode": 200,
                    "body": json.dumps({"url": presigned_url}),
                }
            except ClientError as e:
                print(f"Error: {str(e)}")
                return {"statusCode": 500, "message": "Internal server error"}

Outputs:
  MyApiUrl:
    Description: Url of api gateway
    Value: !Sub "https://${MyApi}.execute-api.${AWS::Region}.amazonaws.com/${StageName}"

  CognitoUserPoolClientId:
    Description: Cognito user pool client id
    Value: !Ref UserPoolClient

  UserPoolId:
    Description: User pool id
    Value: !GetAtt UserPool.UserPoolId

  BucketName:
    Description: Bucket name
    Value: !Ref MediaBucket

Assign valid email addresses to both users. These addresses must be functional since temporary passwords will be sent to them. Proceed to deploy the infrastructure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# set your email here
STACK_NAME="s3-pre-signed-urls"
COGNITO_USER_ONE_EMAIL="me+user1@example.com"
COGNITO_USER_TWO_EMAIL="me+user2@example.com"

# deploy the stack
sam deploy \
--parameter-overrides CognitoUserOneEmail=$COGNITO_USER_ONE_EMAIL  CognitoUserTwoEmail=$COGNITO_USER_TWO_EMAIL \
--capabilities CAPABILITY_IAM \
--stack-name $STACK_NAME \
-t template.sam.yaml

Lets copy cute cat to tier one and super cute cat to tier two folder in the s3 bucket.

1
2
3
# copy data
aws s3 cp assets/cute-cat.jpg "s3://$BUCKET_NAME/tier 1/"
aws s3 cp assets/super-cute-cat.jpg "s3://$BUCKET_NAME/tier 2/"

For next steps, we will need information from stack outputs.

1
2
3
4
5
6
# get infrastructure information from the stack
STACK_OUTPUTS=$(sam list stack-outputs --stack-name $STACK_NAME --output json)
USER_POOL_CLIENT_ID=$(echo $STACK_OUTPUTS | jq -r 'map(select(.OutputKey == "CognitoUserPoolClientId")) | .[0].OutputValue')
USER_POOL_ID=$(echo $STACK_OUTPUTS | jq -r 'map(select(.OutputKey == "UserPoolId")) | .[0].OutputValue')
API_URL=$(echo $STACK_OUTPUTS | jq -r 'map(select(.OutputKey == "MyApiUrl")) | .[0].OutputValue')
BUCKET_NAME=$(echo $STACK_OUTPUTS | jq -r 'map(select(.OutputKey == "BucketName")) | .[0].OutputValue')

Testing user access

Let’s test access for the first user. Check your email and paste the temporary password for user one below.

First, we retrieve session information from Cognito. Then, we set a new password and obtain the authorization token.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# set your temporary password here
EMAIL=$COGNITO_USER_ONE_EMAIL
TMP_PASSWORD=""
NEW_PASSWORD="Supersecretpassword12+"

# retrieve session information
SESSION=$(aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH \
--auth-parameters "USERNAME=$EMAIL,PASSWORD=$TMP_PASSWORD" \
--client-id $USER_POOL_CLIENT_ID \
--query "Session" --output text)

# get token id
TOKEN_ID=$(aws cognito-idp admin-respond-to-auth-challenge \
--user-pool-id $USER_POOL_ID \
--client-id $USER_POOL_CLIENT_ID \
--challenge-responses "USERNAME=$EMAIL,NEW_PASSWORD=$NEW_PASSWORD" \
--challenge-name NEW_PASSWORD_REQUIRED \
--session $SESSION \
--query 'AuthenticationResult.IdToken' --output text)

Set the authorization token and test access for both tiers. If you have access, test the presigned URLs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# send token as part of the Authorization header when requesting resources.
curl -G -H "Authorization: Bearer $TOKEN_ID" --data-urlencode "key=tier 1/cute-cat.jpg" "$API_URL/presignedurl"

TIER1_GET_URL=$(curl -G -H "Authorization: Bearer $TOKEN_ID" --data-urlencode "key=tier 1/cute-cat.jpg" "$API_URL/presignedurl" | jq -r '.url')
curl -G -L --output cute-cat.jpg $TIER1_GET_URL

curl -G -H "Authorization: Bearer $TOKEN_ID" --data-urlencode "key=tier 2/super-cute-cat.jpg" "$API_URL/presignedurl"

# if authorized download the image
TIER2_GET_URL=$(curl -G -H "Authorization: Bearer $TOKEN_ID" --data-urlencode "key=tier 2/super-cute-cat.jpg" "$API_URL/presignedurl" | jq -r '.url')
curl -G -L --output super-cute-cat.jpg $TIER2_GET_URL

Repeat the same for user two.

Cleanup

When you are done testing, clean up the resources.

1
2
3
4
5
# delete files in s3 bucket
aws s3 rm s3://$BUCKET_NAME --recursive

# delete the stack
sam delete --stack-name $STACK_NAME

Conclusion

Leveraging S3 presigned URLs can streamline access management and offload operational tasks to AWS, allowing you to focus on building robust and scalable applications.

Happy engineering!