Designing a URL Shortener

Problem

Design a URL shortening service like tinyurl.

Clarifying Questions

  • Can you give an example of how a URL shortener work?
  • What is the traffic volume?
  • How long is the shortened URL?
  • What characters are allowed in the shortened URL?
  • Can shortened URLs be deleted or updated?

Answers:

  • Assume URL https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long is the original URL. Your service creates an alias with shorter length: https://tinyurl.com/y7keocwj. If you click the alias, it redirects you to the original URL.
  • 100 million URLs are generated per day
  • As short as possible
  • Shortened URL can be a combination of numbers (0-9) and characters (a-z, A-Z).
  • For simplicity, let us assume shortened URLs cannot be deleted or updated.

Basic Use Cases:

  1. URL shortening: given a long URL => return a much shorter URL
  2. URL redirecting: given a shorter URL => redirect to the original URL
  3. High availability, scalability, and fault tolerance considerations

Back of the envelope estimation

Write operation= 100 URLs are generated per dayWrite\ operation=\ 100 \ URLs \ are \ generated \ per \ day
Write operation per second= 100 million/24/3600=1160 Write\ operation\ per\ second=\ 100 \ million / 24 / 3600 = 1160

Assuming ratio of read operation to write operation is 10:1

Read operation per second= 1160  10= 11,600Read\ operation\ per\ second =\ 1160\ *\ 10 =\ 11,600

Assuming the URL shortener service will run for 10 years

100 million  365  10 = 365 billion records100\ million\ *\ 365\ *\ 10\ =\ 365\ billion\ records

Assume average URL length is 100. Storage requirement over 10 years

365 billion  100 bytes = 36.5 TB 365\ billion\ *\ 100\ bytes\ =\ 36.5\ TB

High-Level Design

API Endpoints

  1. POST api/v1/data/shortenURL shortening
    • To create a new short URL, a client sends a POST request, which contains one parameter: the original long URL
    • request parameter: {longURL: longURLString}
    • return shortURL
  2. GET api/v1/shortURLURL Redirecting
    • To redirect a short URL to the corresponding long URL, a client sends a GET request
    • return longURL for the HTTP redirection

The most intuitive way to implement URL redirecting is to use hash tables.

Hash table stores <shortURL, longURL> pairs, URL redirecting can be implemented by the following:

  • Get longURL: longURL = hashTable.get(shortURL)
  • Once you get the longURL, perform the URL redirect.

Redirects

301

A 301 redirect shows that the requested URL is “permanently” moved to the long URL. Since it is permanently redirected, the browser caches the response, and subsequent requests for the same URL will not be sent to the URL shortening service. Instead, requests are redirected to the long URL server directly.

Use 301 if the priority is to reduce server load

302

A 302 redirect means that the URL is “temporarily” moved to the long URL, meaning that subsequent requests for the same URL will be sent to the URL shortening service first. Then, they are redirected to the long URL server.

Use 302 if analytics is important

URL Shortening

The hash function must satisfy the following requirements:

  • Each longURL must be hashed to one hashValue.
  • Each hashValue can be mapped back to the longURL.

Design Deep Dive

Data Model

Storing in a hash-table was a good starting point, but it is not feasible for real-world systems.

A better approach is to store <shortURL, longURL> mapping in a relational database:

Hash Function

Hash function is used to hash a long URL to a short URL, also known as hashValue.

The hashValue consists of characters from [0-9, a-z, A-Z], containing 10 + 26 + 26 = 62 possible characters. 

The system must support up to 365 billion URLs

62^7 = 3,521,614,606,208 = ~3.5 trillion

When n = 7, 62 ^ n = ~3.5 trillion, 3.5 trillion is more than enough to hold 365 billion URLs, so the length of hashValue is 7.

Hash + Collision Resolution

To shorten a long URL, we should implement a hash function that hashes a long URL to a 7-character string. 

We can use the following algorithms:

Hash functionHash value (Hexadecimal)
CRC325cb54054
MD55a62509a84df9ee03fe1230b9df8b84e
SHA-10eeae7916c06853901d9ccbefbfcaf4de57ed85b

All of them give more than 7 characters. To only get 7 characters per hash value, we could remove the last character of CRC32, but this will cause collisions. We can use the following method to avoid collisions.

Base 62 conversion

Base conversion is another approach commonly used for URL shorteners.

Base conversion helps to convert the same number between its different number representation systems.

Our short URL is https://tinyurl.com/2TX

Hash + collision resolutionBase 62 conversion
Fixed short URL length.Short URL length is not fixed. It goes up with the ID.
Does not need a unique ID generator.This option depends on a unique ID generator.
Collision is possible and needs to be resolved.Collision is not possible because ID is unique.
It’s not possible to figure out the next available short URL because it doesn’t depend on ID.It is easy to figure out what is the next available short URL if ID increments by 1 for a new entry. This can be a security concern.

URL Shortening

  • Assuming the input longURL is: https://en.wikipedia.org/wiki/Systems_design
  • Unique ID generator returns ID: 2009215674938.
  • Convert the ID to shortURL using the base 62 conversion. ID (2009215674938) is converted to “zn9edcu”.
  • Save ID, shortURL, and longURL to the database as shown in Table 4.
idshortURLlongURL
2009215674938zn9edcuhttps://en.wikipedia.org/wiki/Systems_design

The distributed unique ID generator is worth mentioning. Its primary function is to generate globally unique IDs, which are used for creating shortURLs. You can refer to https://daniela-fernandez.com/designing-unique-id-generator-in-distributed-systems/ for understanding how to design a unique ID generator

URL Re-directing

  1. A user clicks a short URL link: https://tinyurl.com/zn9edcu
  2. The load balancer forwards the request to web servers.
  3. If a shortURL is already in the cache, return the longURL directly.
  4. If a shortURL is not in the cache, fetch the longURL from the database. If it is not in the database, it is likely a user entered an invalid shortURL.
  5. The longURL is returned to the user.

Wrap up

Rate limiter: To avoid that malicious users send an overwhelmingly large number of URL shortening requests.

Web server scaling: Since the web tier is stateless, it is easy to scale the web tier by adding or removing web servers.

Database scaling: Database replication and sharding are common techniques.

Analytics: Data is increasingly important for business success. Integrating an analytics solution to the URL shortener could help to answer important questions like how many people click on a link? When do they click the link? etc.

Availability, consistency, and reliability: These concepts are at the core of any large system’s success.


API REST

  • GET: Used to retrieve data from a server. 
  • POST: Used to send data to the server to create a new resource. 
  • PUT: Used to update or replace a resource that is already part of resource collection.
  • DELETE: Used to remove a resource from the server.