A Short URL refers to a web service that transforms a long URL into a much shorter version.This helps to simplify the complex URLs that may be difficult to share or fit into a character limited platforms like sms/tweets.In addition, it is easy and less error prone to type a short url when compared to its longer version.
- Easy to share and remember. Beautify a URL
- Helpful to track the clicks on URLs for Analytics purposes.
- Helps to Beautify any Long URLs.
- Generate a short url with the given original url.
- The short url (or link) should redirect users to original url (link).
- Provide option to create custom short url as given by end user.
- How long a tiny url would be ? Will it ever expire?
- The services to Create/Retreive/Update/Delete short url must be exposed as REST service.
- The long URL should also be uniquely identifiable from the short URL.
- Provide option to create a tiny url by user's choice.
- If user is allowed to create customer shortened links, it should be of max 7 characters.
- Once a shortened link is generated it should stay in system for lifetime.
- The Service should collect metrics like most clicked links.
Commomly the most URL shortening application or services provide the short URL where:
Example: `http://localhost:9999/1L9zO9P`
-
Fist Part: is the domain name of the service. example:
localhost:9999
-
Second Part: is a string formed by a set of random characters. example
1L9zO9P
This random string should be unique for each short URL created for a long URL. -
Length of Short URL or Random String:
The unique random characters can be generated using either base62
or md5
.
Both base62
and MD5
algorithm outputs consist of 62 character combinations (a-z A-Z 0-9)
.
Base62
takes integer type as input. So we need to first convert the long URL to a random number then pass this random number to the base62 algorithm.
MD5
takes string as the input and generates a random fixed length string output, we can directly pass the long URL as input.
- Hashing Algorithm: Use a hashing algorithm such as MD5, SHA-256, or CRC32 to generate a hash value from the long URL. The hash value can then be encoded using base62 or base64 encoding to create a short URL.
- Base Conversion Algorithm: Convert the unique ID of the long URL into a shorter representation using base conversion techniques. For example, you can convert the decimal representation of the ID into a base58 or base62 encoding, excluding easily confused characters like ‘0’, ‘O’, ‘1’, ‘I’, etc.
- Bijective Function Algorithm: Utilize a bijective function that maps a unique identifier to a short URL and vice versa. For example, you can use a function like the Bijective Conversion Function (BCF) algorithm, which converts a decimal ID into a sequence of characters using a predefined set of characters.
- Randomized Approach: Generate a random sequence of characters of a fixed length (e.g., 6 or 8 characters) to create the short URL. Although this approach may not guarantee uniqueness, you can perform a lookup in the database to ensure uniqueness and regenerate if there’s a collision.
- Counter Based Approach: When the server gets a request to convert a long URL to short URL, it first talks to the counter to get a count value.This value is then passed to the `base62` algorithm to generate random string. Making use of a counter to get the integer value ensures that the number we get will always be unique by default because after every request the counter increments its value.
The Base62 encoding scheme is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It uses character that can be one of the following:
A lower case alphabet [‘a’ to ‘z’] -> total 26 characters
An upper case alphabet [‘A’ to ‘Z’] -> total 26 characters
A digit [‘0′ to ‘9’] -> total 10 characters
Hence, there are total 26 + 26 + 10 = 62 possible characters. [base62].
The typical order is 0-9 for numbers (values 0 to 9), A-Z for uppercase letters (values 10-35), a-z for lowercase letters (values 36-61)
In this application, it is decided to go with random character string of length 7, so we can have 62^7 ~ 3.5 trillion combinations to work with.This is more than enough for a sample application.
- Divide the number by 62.
- Record the remainder as the rightmost digit.
- Use the quotient as the new number to be divided.
- Repeat the process until the number is zero.
- The sequence of remainders (read from bottom to top) forms the Base62 encoded string.
- Short Url: 7 character long unique short URL.
- Original Url: The original long URL.
- Counter: Unique integer to create Short URL Id. Initial counter value is 100000000000
- Mapping Short URL Id and Long URL: Store the details of Short URL Id and Long URL.
- Mapping Short URL Id and Click Count: Store the total clicks on Short URL Id to maintain analytics.
- Why initial counter value in redis is set to 100000000000 ?
- Explain the Application flow for Creating Short URL ?
To Ensure Minimum Length: By starting with a large initial value, we ensure that the encoded ID (short URL) has a minimum length of 7-characters from the beginning.
Avoid Simple Sequences: Starting with a large number makes the generated short URLs less predictable.
Collision Avoidance: If there are parallel or multiple ID generation requests then starting at a higher value could help in avoiding collisions or overlap.
The service recieves a POST
request to the /create-short-url
endpoint.
The service then gets a unique counter value from redis database. This value is then used to generate unique Short URL Id using base62
algorithm.
After storing the mapping of generated Short URL id and long URL and with all the details the service returns response to Client with all the corresponding details.