Rich previews using SEO, for Single Page Applications while sharing links on any social media

You probably know what SEO is. The scope of this article is to solve the problem of rich previews when you share a link about your Single Page Application(SPA) on any social media. This problem exists because, as the name suggests SPAs has only one HTML page in which entire application is loaded. This implicitly means that either we solve the problem of meta tags for each application route somehow or follow the workaround as mentioned below to serve content to bots separately. This article makes use of AWS services to achive our goals of rich previews.

Assumptions

I assume that the SPA is hosted in a S3 bucket and is connected to cloudfront. Read more about cloudfront here.

The concept

Maintain a S3 bucket with route names as the folders inside it. Each folder will have an index.html file corresponding to that route. At minimum, this index.html, for any specific route, will have a html document with head section having all the required meta information.

When any of the client(user's computer/mobile devices) requests for the webpage, the request goes to the cloudfront, cloudfront intercepts it and checks if the request is from the bot's such as social media applications, messangers, etc and if yes, instead of sending the request to actual SPA, sends the request to a S3 bucket which stores the replica of same page with meta information. If cloudfront during interception, understands that the request is not from bots, it forwards the request to the S3 bucket which has the actual SPA hosted.

Implementation Details

CloudFront Events and Configurations

At minimum we require 2 lambda functions to be executed. There are total 4 events that cloudfront exposes for the life cycle of the request which it receives. These are as follows:

Viewer Request
When CloudFront receives a request from a viewer. This is even before it checks to see whether the requested object is in the edge cache.

Origin Request
When CloudFront forwards a request to the origin. When the requested object is in the edge cache, this request is not made.

Origin Response
After CloudFront receives a response from the origin and before it caches the object in the response. This is not executed if the file is in edge cache.

Viewer Response
Before returning the requested file to the viewer. Note that the function executes regardless of whether the file is already in the edge cache.

Out of the above 4 events, we are going to use Viewer Request and Origin Request.

Since we want CloudFront to distribute our content across the world, we need to create a CloudFront Distribution. Read more about CloudFront Distributions here. Follow AWS CloudFront official documentation to create a distribution.

Lambda Function Association for Events

Lambda 1: Viewer Request

This lambda function is responsible to check the User-Agent header of the request. It attaches addition header Is-Request-From-Bot to the request. The value of this new header depends on the User-Agent value. If User-Agent is identified as a bot, then Is-Request-From-Bot will be set to true and false otherwise. Here are some of the Bot user agents.

Pseudocode:

Capture the request from events object
const request = event.Records[0].cf.request; const headers = request.headers;
Read the User-Agent header
Do a regex Match to check if the User-Agent Contains Bot String.
Create a new header(Is_request-From-Bot) and attach it to request
If bot: Set Is-Request-From-Bot: true
If not bot: Set Is-Request-From-Bot: false
Call the lambda callback with the request

Lambda 2: Origin Request

This Lambda function is responsible to route the request to the expected origin based on the Is-Request-From-Bot header.

This lambda function Pseudocode looks as follows:

Store the actual SPA origin link in a variable
if (headers["is-request-from-bot"] && headers["is-request-from-bot"][0]["value"]==="true") { isBot = true }

Now Based on the isBot variable, Change the origin in the request. This example can be referred to change the request origin.

Call the lambda callback with the request as: callback(null, request);

For non bot requests the origin need not to be change

Achievement

With this, we are successfully able to get reach previews for URLs in social media applications based on the meta information stored in the S3 for the different page routes of SPA.

When to use Azure Virtual Machine Scale Sets?

So it all started with client's requirement to build a Minimum Viable Product (MVP). The backend of the product was written in python and involved processing of images using Computer Vision. We decided to host the backend in Azure. There were lots of advantages of having the backend hosted in cloud instead of having an on-premise setup. The thing that attracted us the most was infrastructure scaling and availability of the system. We now dont have to bother about infrastructure, power consumption, system availability, system failures, etc. We were in need of GPU machines to run the code in backend. We though of using N-Series Azure virtual machines for this purpose. We wanted to scale the machines horizontally to handle multiple requests. We booted up few more machines to handle the load. This is where Azure Load balancer came into picture. Since there were multiple instances of virtual machines, we needed someone to decide for us, which machine the request should go to. We connect...

Sana Pathan18 November 2019 at 10:20
Amazing work. Can't wait to implement this. Thank you so much for this wonderful piece of content
LimaOfTheDoomed25 November 2019 at 23:49
So why bot host static pages in different folders of your s3 buckets ? I think you can leverage Cloudfront Distribution Behaviours to route to the bucket/folder you desire.

Cloud Experiences

Search This Blog