You probably know what SEO is. The scope of this article is to solve the problem of rich previews when you share a link about your Single Page Application(SPA) on any social media. This problem exists because, as the name suggests SPAs has only one HTML page in which entire application is loaded. This implicitly means that either we solve the problem of meta tags for each application route somehow or follow the workaround as mentioned below to serve content to bots separately. This article makes use of AWS services to achive our goals of rich previews.
Assumptions
I assume that the SPA is hosted in a S3 bucket and is connected to cloudfront. Read more about cloudfront here.
The concept
Maintain a S3 bucket with route names as the folders inside it. Each folder will have an index.html file corresponding to that route. At minimum, this index.html, for any specific route, will have a html document with head section having all the required meta information.
When any of the client(user's computer/mobile devices) requests for the webpage, the request goes to the cloudfront, cloudfront intercepts it and checks if the request is from the bot's such as social media applications, messangers, etc and if yes, instead of sending the request to actual SPA, sends the request to a S3 bucket which stores the replica of same page with meta information. If cloudfront during interception, understands that the request is not from bots, it forwards the request to the S3 bucket which has the actual SPA hosted.
Implementation Details
CloudFront Events and Configurations
At minimum we require 2 lambda functions to be executed. There are total 4 events that cloudfront exposes for the life cycle of the request which it receives. These are as follows:
- Viewer Request
When CloudFront receives a request from a viewer. This is even before it checks to see whether the requested object is in the edge cache.
- Origin Request
When CloudFront forwards a request to the origin. When the requested object is in the edge cache, this request is not made.
- Origin Response
After CloudFront receives a response from the origin and before it caches the object in the response. This is not executed if the file is in edge cache.
- Viewer Response
Before returning the requested file to the viewer. Note that the function executes regardless of whether the file is already in the edge cache.
When CloudFront receives a request from a viewer. This is even before it checks to see whether the requested object is in the edge cache.
When CloudFront forwards a request to the origin. When the requested object is in the edge cache, this request is not made.
After CloudFront receives a response from the origin and before it caches the object in the response. This is not executed if the file is in edge cache.
Before returning the requested file to the viewer. Note that the function executes regardless of whether the file is already in the edge cache.
Out of the above 4 events, we are going to use Viewer Request and Origin Request.
Since we want CloudFront to distribute our content across the world, we need to create a CloudFront Distribution. Read more about CloudFront Distributions here. Follow AWS CloudFront official documentation to create a distribution.
Since we want CloudFront to distribute our content across the world, we need to create a CloudFront Distribution. Read more about CloudFront Distributions here. Follow AWS CloudFront official documentation to create a distribution.
Lambda Function Association for Events
Lambda 1: Viewer Request
This lambda function is responsible to check the User-Agent header of the request. It attaches addition header Is-Request-From-Bot to the request. The value of this new header depends on the User-Agent value. If User-Agent is identified as a bot, then Is-Request-From-Bot will be set to true and false otherwise. Here are some of the Bot user agents.
Pseudocode:
- Capture the request from events object
const request = event.Records[0].cf.request; const headers = request.headers; - Read the User-Agent header
- Do a regex Match to check if the User-Agent Contains Bot String.
- Create a new header(Is_request-From-Bot) and attach it to request
- If bot: Set Is-Request-From-Bot: true
- If not bot: Set Is-Request-From-Bot: false
- Call the lambda callback with the request
Lambda 2: Origin Request
This Lambda function is responsible to route the request to the expected origin based on the Is-Request-From-Bot header.
This lambda function Pseudocode looks as follows:
- Store the actual SPA origin link in a variable
if (headers["is-request-from-bot"] && headers["is-request-from-bot"][0]["value"]==="true") { isBot = true } - Now Based on the isBot variable, Change the origin in the request. This example can be referred to change the request origin.
- Call the lambda callback with the request as: callback(null, request);
Achievement
With this, we are successfully able to get reach previews for URLs in social media applications based on the meta information stored in the S3 for the different page routes of SPA.

Amazing work. Can't wait to implement this. Thank you so much for this wonderful piece of content
ReplyDeleteSo why bot host static pages in different folders of your s3 buckets ? I think you can leverage Cloudfront Distribution Behaviours to route to the bucket/folder you desire.
ReplyDelete