Contents

Building Websites at Scale With Aws CloudFront and Hugo

Building websites has become easier than ever. Numerous platforms and third-party providers now offer tools to create and host websites within minutes, complete with custom domains, analytics, and sleek graphical interfaces.

For straightforward websites or smaller-scale projects, these platforms can be a convenient choice. However, they often fall short when it comes to flexibility, automation, and cost-effectiveness at scale. Many lack robust CLI (Command Line Interface) support for streamlining tasks, can become expensive as your needs grow, or demand significant management effort and a steep learning curve.

In this blog post, we’ll explore how to overcome these challenges by leveraging Hugo, AWS CloudFront, and GitHub Workflows to build, host, and manage websites efficiently and at scale.

Project resources can be found here.


What is Hugo?

Hugo is a powerful open-source static site generator that transforms content written in Markdown into static files such as HTML, CSS, and JavaScript. These static files are pre-rendered and served directly to users, unlike dynamic websites that generate pages in real time for each user request.

Static sites offer significant advantages—they are simpler to maintain, faster to load, and more secure. However, they aren’t the ideal solution for every use case, especially those requiring extensive interactivity or real-time updates.

Hugo powers some prominent websites, including The Kubernetes Project and Let’s Encrypt, thanks to its robust feature set, including:

  1. Markdown-Based Content: Writing and managing content in Markdown simplifies the process of creating and organizing websites, eliminating the need to work with complex HTML or other code.
  2. Open Source Framework: Hugo is free to use and backed by an active community that regularly contributes improvements and offers support.
  3. Theme Support: Hugo provides a wide array of free and paid themes, enabling users to quickly create professional-looking websites with minimal effort.
  4. Lightweight and Portable: The static files generated by Hugo can be hosted on virtually any platform, such as AWS, GitHub Pages, or a VPS, making it highly flexible and versatile.

What is AWS CloudFront?

In simple terms, CloudFront is a content delivery network (CDN) managed by AWS. A CDN is a network of servers deployed close to end users, serving as a caching layer to improve content delivery speed and reliability.

When a user requests content that is not present on the edge server, the edge server pulls the content from your backend and caches it. If the content is already present on the edge server, it is returned directly to the user without contacting your backend.

This caching mechanism improves response times, reduces the load on the backend and helps with site availability during DOS attacks.


Architecture

/building-websites-at-scale-with-aws-cloudfront-and-hugo/architecture-site.png
Fig 1. Single Site Architecture

Domain Name Resolution

To enhance manageability and security, the infrastructure for domain name resolution is distributed across two AWS accounts: the Network Account and the Website Hosting Account, each with distinct responsibilities.

The Network Account serves as the primary hub for managing top-level domains (TLDs). This account is responsible for purchasing, renewing, and maintaining domains. When a user requests DNS resolution for a website, the request initially reaches the name servers hosted in this account. For requests targeting the www subdomain, the resolution is then routed to the name servers managed within the Website Hosting Account.

The Website Hosting Account is dedicated to hosting the infrastructure for the websites. It includes a hosted zone for the www subdomain, where DNS records point to the appropriate CloudFront distribution.

In this setup, a typical DNS resolution flow begins with the user’s request being directed to the Network Account’s name servers. If the request involves the www subdomain, it is forwarded to the Website Hosting Account, where it resolves to the target CloudFront distribution.

Content Delivery Network

Once the website is built using Hugo, it is deployed to an S3 bucket and distributed globally using CloudFront, AWS’s content delivery network. This setup ensures fast and reliable content delivery by caching the website’s static assets closer to end users through CloudFront’s network of edge locations.

To provide a secure connection to the website, CloudFront integrates seamlessly with AWS Certificate Manager. Certificate Manager handles the issuance of SSL/TLS certificates and takes care of automatic renewal, ensuring uninterrupted secure connections without manual intervention.

Some Hugo themes utilize pretty URLs, which CloudFront may not handle correctly out of the box. While disabling this feature in Hugo can solve the issue, doing so may inadvertently break certain themes. To address this, a CloudFront function is deployed alongside the distribution. This function remaps URLs into a format recognized by CloudFront, preserving functionality and ensuring compatibility with Hugo themes.

Git Ops

/building-websites-at-scale-with-aws-cloudfront-and-hugo/architecture-infrastructure.png
Fig 2. Deployment At Scale

One of the primary objectives of this project is to enable easy and scalable management of websites, including both infrastructure and website content, through a command-line-driven workflow.

Hugo already provides the capability to build websites locally via the CLI. However, to streamline the process further, we aim to automate deployment. All website content is stored in a GitHub repository, and deployments are triggered automatically using GitHub Workflows. These workflows, guided by tags, ensure that updates to the repository are seamlessly reflected in the corresponding S3 buckets.

The approach to structuring repositories can vary depending on your preferences. A single repository might reduce context switching, while multiple repositories can enhance modularity and maintain clear domain boundaries. Both approaches are valid, and the choice depends on your specific requirements.

In this project, we assume a structure where infrastructure resides in one repository and websites in another. This separation is particularly beneficial for large websites, which can quickly clutter a single repository.

Both repositories act as the source of truth. Any changes pushed upstream—whether additions, deletions, or updates—are automatically reflected in the infrastructure or website deployments, ensuring consistency and minimizing manual intervention.


Requirements

Before getting started, ensure you have set up GitHub-AWS integration with the necessary deployment roles in place. While this post doesn’t delve into the details, you can refer to this guide for general instructions on configuring a multi-account cloud deployment using Terraform and GitHub Actions.

We also assume that root hosted zones have already been created in the Network Account. These hosted zones are typically generated automatically when a domain is purchased through AWS. Therefore, this guide starts from the point where the hosted zones already exist, and the necessary infrastructure for the site is ready to be added.


Directory Structure

The project leverages Local Terraform Modules and uses Terragrunt to orchestrate the deployment. The directory structure of the project is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
└── infrastructure
    ├── _envcommon
    │   ├── accounts.hcl
    │   └── websites.hcl
    ├── live
    │   └── prod-shared
    │       ├── account.hcl
    │       └── us-east-1
    │           ├── environment.hcl
    │           ├── region.hcl
    │           └── website-hosting
    │               └── terragrunt.hcl
    ├── modules
    │   └── web-hosting
    │       ├── 0-versions.tf
    │       ├── 1-variables.tf
    │       ├── 2-outputs.tf
    │       ├── 3-sites.tf
    │       ├── README.md
    │       └── single-site
    │           ├── 0-versions.tf
    │           ├── 1-variables.tf
    │           ├── 2-outputs.tf
    │           ├── 3-locals.tf
    │           ├── 4-domain-route53-zone.tf
    │           ├── 5-primary-zone-ns-record.tf
    │           ├── 6-route53-record.tf
    │           ├── 7-acm.tf
    │           ├── 8-cdn.tf
    │           ├── 9-cdn-function.tf
    │           └── code
    │               └── fix_urls.js
    └── terragrunt.hcl

The live directory contains the configuration files for Terragrunt. Although Terragrunt’s primary advantage is its ability to keep state files small and manage multiple stacks independently, in this project, all resources are managed within a single stack. This decision simplifies management, as updates are only required when adding or removing websites.

The _envcommon directory houses configuration files used to manage website infrastructure. Adding a new website is as simple as creating an entry in the configuration file, while deleting a website requires removing the corresponding entry.

The accounts.hcl file contains account details and deployment roles.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# _envcommon/accounts.hcl
locals {

  network_account = {
    account_name = "Network"
    account_id   = ""
    assume_role  = ""
  }

  prod_shared_account = {
    account_name = "Prod Shared"
    account_id   = ""
    assume_role  = ""
  }
}

The websites.hcl file defines the configuration for websites. Adding a new website requires specifying its details in this file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# _envcommon/websites.hcl
locals {
  sites = [
    {
      site_name                = "firstsite"
      domain_name              = "www.firstsite.com"
      primary_zone_id          = ""
      primary_zone_record_name = "www"
    },
    {
      site_name                = "secondsite"
      domain_name              = "www.secondsite.com"
      primary_zone_id          = ""
      primary_zone_record_name = "www"
    }
  ]
}

Terraform Modules

The directory modules contains the Terraform code, structured to manage individual websites as well as collections of websites. The single-site sub-module implements the architecture shown in Figure 1. Meanwhile, the web-hosting module acts as a wrapper that orchestrates the deployment of multiple websites based on configuration files located in the _envcommon directory.

To keep this post concise, not all files are shown here. You can view the complete code here.

Single Website

The single-site module uses tags to link websites with infrastructure.

1
2
3
4
5
6
7
8
9
locals {
  tags = merge(
    var.tags,
    {
      SiteName   = var.site_name,
      DomainName = var.domain_name,
    }
  )
}

A dedicated Route 53 hosted zone is created in the Website Hosting Account to manage the www subdomain. It includes an A record pointing to the CloudFront distribution hosting the website:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
resource "aws_route53_zone" "this" {
  name = var.domain_name
  tags = local.tags
}

resource "aws_route53_record" "cdn" {
  zone_id = aws_route53_zone.this.zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = module.cdn.cloudfront_distribution_domain_name
    zone_id                = module.cdn.cloudfront_distribution_hosted_zone_id
    evaluate_target_health = true
  }
}

In the Network Account, the root hosted zone is updated with the name servers from the newly created hosted zone:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
resource "aws_route53_record" "subdomain_nameservers" {
  provider = aws.network

  zone_id = var.primary_zone_id
  name    = var.primary_zone_record_name
  records = aws_route53_zone.this.name_servers

  type = var.primary_zone_ns_record_type
  ttl  = var.primary_zone_record_ttl
}

To enable HTTPS traffic, a certificate is provisioned using AWS Certificate Manager with DNS validation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
module "acm" {
  source  = "terraform-aws-modules/acm/aws"
  version = "5.1.1"

  domain_name               = var.domain_name
  zone_id                   = aws_route53_zone.this.zone_id
  subject_alternative_names = [var.domain_name]

  validation_method   = "DNS"
  wait_for_validation = true

  tags = local.tags

  depends_on = [aws_route53_record.subdomain_nameservers]
}

The website is hosted on CloudFront with an S3 backend. The S3 bucket is configured as private with encryption, and its policy allows access only from the CloudFront distribution. CloudFront is set up with HTTPS redirection and a custom function for URL adaptation:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
module "cdn" {
  source  = "terraform-aws-modules/cloudfront/aws"
  version = "3.4.1"

  comment             = "Web Hosting for domain ${var.domain_name}"
  enabled             = true
  is_ipv6_enabled     = true
  retain_on_delete    = false
  wait_for_deployment = true

  aliases = [var.domain_name]

  create_origin_access_control = true
  origin_access_control = {
    "s3_oac_${var.site_name}" = {
      description      = "CloudFront access to S3"
      origin_type      = "s3"
      signing_behavior = "always"
      signing_protocol = "sigv4"
    }
  }

  origin = {
    media_bucket = {
      domain_name           = module.media_bucket.s3_bucket_bucket_domain_name
      origin_access_control = "s3_oac_${var.site_name}"
    }
  }

  default_root_object = "index.html"
  default_cache_behavior = {
    target_origin_id       = "media_bucket"
    viewer_protocol_policy = "redirect-to-https"

    allowed_methods = ["GET", "HEAD", "OPTIONS"]
    cached_methods  = ["GET", "HEAD"]
    compress        = true
    query_string    = true

    function_association = {
      viewer-request = {
        function_arn = aws_cloudfront_function.this.arn
      }
    }
  }


  viewer_certificate = {
    acm_certificate_arn = module.acm.acm_certificate_arn
    ssl_support_method  = "sni-only"
  }

  tags = local.tags

}

module "media_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "4.2.1"

  # Allow deletion of non-empty bucket
  force_destroy = true

  restrict_public_buckets = true // #trivy:ignore
  ignore_public_acls      = true // #trivy:ignore
  block_public_policy     = true // #trivy:ignore
  block_public_acls       = true // #trivy:ignore

  server_side_encryption_configuration = {
    rule = {
      bucket_key_enabled = true

      apply_server_side_encryption_by_default = {
        sse_algorithm = "AES256"
      }
    }
  }

  tags = local.tags
}

data "aws_iam_policy_document" "s3_policy" {
  # Origin Access Controls
  statement {
    actions   = ["s3:GetObject"]
    resources = ["${module.media_bucket.s3_bucket_arn}/*"]

    principals {
      type        = "Service"
      identifiers = ["cloudfront.amazonaws.com"]
    }

    condition {
      test     = "StringEquals"
      variable = "aws:SourceArn"
      values   = [module.cdn.cloudfront_distribution_arn]
    }
  }
}

resource "aws_s3_bucket_policy" "bucket_policy" {
  bucket = module.media_bucket.s3_bucket_id
  policy = data.aws_iam_policy_document.s3_policy.json
}

The CloudFront function adapts URLs to ensure compatibility with themes using pretty URLs:

1
2
3
4
5
resource "aws_cloudfront_function" "this" {
  name    = "fix-url-${var.site_name}"
  runtime = "cloudfront-js-2.0"
  code    = file("${path.module}/code/fix_urls.js")
}

Function’s code is showed below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
function handler(event) {
      var request = event.request;
      var uri = request.uri;

      // Check whether the URI is missing a file name.
      if (uri.endsWith('/')) {
          request.uri += 'index.html';
      }
      // Check whether the URI is missing a file extension.
      else if (!uri.includes('.')) {
          request.uri += '/index.html';
      }

      return request;
    }

Scaling Websites

To deploy multiple websites, the single-site module is wrapped in the web-hosting module, which iterates through a list of site configurations. Each website is uniquely identified by its domain name, ensuring that sub stacks are correctly managed by Terraform.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
module "site" {
  for_each = { for index, site in var.sites : site.domain_name => site }
  source   = "./single-site/"

  providers = {
    aws.network = aws.network
  }

  domain_name = each.value.domain_name
  site_name   = each.value.site_name

  primary_zone_id          = each.value.primary_zone_id
  primary_zone_record_name = each.value.primary_zone_record_name

  primary_zone_record_ttl     = var.primary_zone_record_ttl
  primary_zone_ns_record_type = var.primary_zone_ns_record_type

  tags = var.tags
}

GitOps

The infrastructure deployment process is triggered automatically when changes are detected in the infrastructure directory. The workflow assumes a role in the AWS account, which is later used to assume the deployment roles in the target account. For a detailed guide on configuring multi-account deployment with Terraform and GitHub Actions, refer to this post.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
name: "Websites Infrastructure"

on:
  workflow_dispatch: {}
  push:
    paths:
      - infrastructure/**
    branches:
      - main

env:
  AWS_REGION : "REGION"
  AWS_PIPELINES_ROLE_ARN: "PIPELINE ROLE ARN"
  TF_VERSION: '1.9.8'
  TG_VERSION: '0.67.3'
  WORKING_DIR: 'infrastructure/live/prod-shared/us-east-1/website-hosting'


jobs:

  deploy-infrastructure:
    name: Terragrunt Infrastructure Deployment

    runs-on: ubuntu-latest

    permissions:
      id-token: write
      contents: read

    steps:

      - name: Checkout code
        uses: actions/checkout@v2

      - name: Configure aws credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_PIPELINES_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Ensure configuration is valid
        uses: gruntwork-io/terragrunt-action@v2
        id: terragrunt-validate
        with:
          tg_dir: ${{ env.WORKING_DIR }}
          tf_version: ${{ env.TF_VERSION }}
          tg_version: ${{ env.TG_VERSION }}
          tg_command: 'validate'

      - name: Ensure infrastructure is live
        uses: gruntwork-io/terragrunt-action@v2
        id: terragrunt-apply
        with:
          tf_version: ${{ env.TF_VERSION }}
          tg_version: ${{ env.TG_VERSION }}
          tg_dir: ${{ env.WORKING_DIR }}
          tg_add_approve: 1
          tg_command: 'apply --terragrunt-log-level debug --terragrunt-debug'

Website Updates

If your project uses separate repositories for website content, you will need to add the infrastructure repository as a submodule within the infra-repo directory of your website repository. Ensure that you have configured a GitHub PAT (Personal Access Token) in the repository secrets, allowing the workflow to access and pull the submodule during the checkout step.

When new content is pushed upstream, the workflow triggers automatically. It pulls the website repository along with the infrastructure submodule. Hugo is then installed, and the website is built. The workflow retrieves information about the target S3 buckets from the Terragrunt state file and syncs the updated version of the website to the corresponding S3 bucket.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
name: "Website content: Site 1"

on:
  workflow_dispatch: {}
  push:
    paths:
      - site1/**
    branches:
      - main

env:
  AWS_REGION: "eu-west-1"
  AWS_PIPELINES_ROLE_ARN: "PIPELINE ROLE ARN"
  AWS_TARGET_ACCOUNT_ROLE_ARN: "TARGET ACCOUNT DEPLOYMENT ROLE"
  TF_VERSION: "1.9.8"
  TG_VERSION: "0.67.3"
  WEBSITE_HOSTING_DIR: "infra-repo/infrastructure/live/prod-shared/us-east-1/website-hosting"
  WEBSITE_NAME: "site1"
  WEBSITE_DIR: "site1"

jobs:
  deploy-website-content:
    name: Deploy website content

    runs-on: ubuntu-latest

    permissions:
      id-token: write
      contents: read

    defaults:
      run:
        working-directory: "${{env.WEBSITE_DIR}}"

    steps:
      - name: Checkout this repository
        uses: actions/checkout@v4
        with:
          token: ${{secrets.GH_PAT}}
          submodules: true

      - name: Configure aws pipeline credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_PIPELINES_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}
          role-session-name: "${{env.WEBSITE_NAME}}-session"

      - name: Retrieve target content bucket
        uses: gruntwork-io/terragrunt-action@v2
        id: retrieve-s3-bucket
        with:
          tg_dir: ${{ env.WEBSITE_HOSTING_DIR }}
          tf_version: ${{ env.TF_VERSION }}
          tg_version: ${{ env.TG_VERSION }}
          tg_command: "output --terragrunt-log-level error -json content_buckets"

      - name: Setup Hugo
        uses: peaceiris/actions-hugo@v2
        with:
          hugo-version: latest
          extended: true

      - name: Setup node
        uses: actions/setup-node@v4
        with:
          node-version: latest

      - name: Build site
        run: |
          npm install
          npm run build-upstream

      - name: Configure target account aws credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ env.AWS_TARGET_ACCOUNT_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}
          role-session-name: "${{env.WEBSITE_NAME}}-session"
          role-chaining: true

      - name: Publich site to target bucket
        run: aws s3 sync --delete public/ s3://$BUCKET_NAME
        env:
          BUCKET_NAME: "${{ fromJSON(steps.retrieve-s3-bucket.outputs.tg_action_output)[env.WEBSITE_NAME] }}"

Conclusion

In this blog post, we explored an effective approach to building, hosting, and managing websites at scale.

We have seen how to easily build websites using Hugo. We covered the secure hosting of static websites on AWS using the fully managed content delivery network CloudFront. Additionally, we looked at how to organize and scale hosting infrastructure with Terraform and Terragrunt. Finally, we discussed automating infrastructure management and website updates using GitHub Workflows, streamlining operations and minimizing manual effort.

Happy engineering!