Encoder for TinyURL
Understand the inner details of an encoder that are critical for URL shortening.
Introduction
We've discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:
 How does encoding improve the readability of the short URL?
 How are the sequencer and the base58 encoder in the short URL generation related?
Why to use encoding
Our sequencer generates a 64bit ID in base10, which can be converted to a base64 short URL. Base64 is the most common encoding for alphanumeric strings' generation. However, there are some inherent issues with sticking to the base64 for this design problem: the generated short URL might have readability issues because of lookalike characters. Characters like O
(capital o) and 0
(zero), I
(capital I), and l
(lower case L) can be confused while characters like +
and /
should be avoided because of other systemdependent encodings.
So, we slash out the six characters and use base58 instead of base64 (includes AZ, az, 09, +
and /
) for enhanced readability purposes. Let's look at our base58 definition.
Base58
Value  Character  Value  Character  Value  Character  Value  Character 
0  1  15  G  30  X  45  n 
1  2  16  H  31  Y  46  o 
2  3  17  J  32  Z  47  p 
3  4  18  K  33  a  48  q 
4  5  19  L  34  b  49  r 
5  6  20  M  35  c  50  s 
6  7  21  N  36  d  51  t 
7  8  22  P  37  e  52  u 
8  9  23  Q  38  f  53  v 
9  A  24  R  39  g  54  w 
10  B  25  S  40  h  55  x 
11  C  26  T  41  i  56  y 
12  D  27  U  42  j  57  z 
13  E  28  V  43  k  
14  F  29  W  44  m 
The highlighted cells contain the succeeding characters of the omitted ones:
0
,O
,I
, andl
.
Converting base10 to base58
Since we're converting base10 numeric IDs to base58 alphanumeric IDs, explaining the conversion process will be helpful in grasping the underlying mechanism as well as the overall scope of the SUG. To achieve the above functionality, we use the modulus function.
Process: We keep diving the base10 number by 58, making note of the remainder at each step. We stop where there is no remainder left. Then we assign the character indexes to the remainders, starting from assigning the recentmost remainder to the leftmost place and the oldest remainder to the rightmost place.
Example: Let's assume that the selected unique ID is 2468135791013
. The following steps show us the remainder calculations:
Base10 = 2468135791013

$2468135791013 \ \% \ 58 = 17$ 
$42554065362 \ \% \ 58 = 6$ 
$733690782 \ \% \ 58 = 4$ 
$12649841 \ \% \ 58 = 41$ 
$218100 \ \% \ 58 = 20$ 
$3760 \ \% \ 58 = 48$ 
$64 \ \% \ 58 = 6$ 
$1 \ \% \ 58 = 1$
Now, we need to write the remainders in order of the most recent to the oldest order.
Base58 = [1] [6] [48] [20] [41] [4] [6] [17]
Using the table above, we can write the associated characters with the remainders written above.
Base58 = 27qMi57J
Note: Both the base10 numeric IDs and base64 alphanumeric IDs have 64bits, as per our sequencer design.
Let's see the example above with the help of the following illustration.
Converting base58 to base10
The decoding process holds equal importance as the encoding process, as we used a decoder in case of custom short URLs generation, as explained in the design lesson.
Process: The process of converting a base58 number into a base10 number is also straightforward. We just need to multiply each character index (value column from the table above) by the number of 58s that position holds, and add all the individual multiplication results.
Example: Let's reverse engineer the example above to see how decoding works.
Base58: 27qMi57J
Base10
Base10
This is the same unique ID of base10 from the previous example.
Let's see the example above with the help of the following illustration.
The scope of the short URL generator
The short URL generator is the backbone of our URL shortening service. The output of this short URL generator depends on the designimposed limitations, as given below:
 The generated short URL should contain alphanumeric characters.
 None of the characters should look alike.
 The minimum default length of the generated short URL should be six characters.
These limitations define the scope of our short URL generator. We can define the scope, as shown below:
 Starting range: Our sequencer can generate a 64bit binary number that ranges from
$1 \to (2^{64}1)$ . To meet the requirement for the minimum length of a short URL, we can select the sequencer IDs to start from at least 10 digits, i.e., 1 Billion.  Ending point: The maximum number of digits in sequencer IDs that map into the short URL generator's output depends on the maximum utilization of 64 bits, that is, the largest base10 number in 64bits. We can estimate the total number of digits in any base by calculating these two points:
 The numbers of bits to represent one digit in a basen. This is given by
$log_2{n}$ . 
$Number\ of\ digits \ = \frac{Total\ bits\ available}{Number\ of\ bits\ to\ represent\ one\ digit}$
Let's see the calculations above for both the base10 and base58 mathematically:
 Base10:
 The number of bits needed to represent one decimal digit =
$log_2{10}$ =$3.13$  The total number of decimal digits in 64bit numeric ID =
$\frac{64}{3.13}$ $=$ $20$  Base58:
 The number of bits needed to represent one decimal digit =
$log_2{5}8$ $=$ $5.85$  The total number of base58 digits in a 64bit numeric ID =
$\frac{64}{5.85}$ $=$ $11$
Maximum digits: The calculations above show that the maximum digits in the sequencer generated ID will be 20 and consequently, the maximum number of characters in the encoded short URL will be 11.
Quiz
Question 1
Since we’re using the 10 digits and beyond sequencer IDs, is there a way we can use the sequencer IDs shorter than 10 digits?
1 of 2
The sequencer's lifetime
The number of years that our sequencer can provide us with unique IDs depends on two factors:
 Total numbers available in the sequencer =
$2^{64}  10^{9}$ (starting from 1 Billion as discussed above)  Number of requests per year =
$200 \ Million \ per \ month \times 12 = 2.4 \ Billion$ (as assumed in Requirements)
So, taking the above two factors into consideration, we can calculate the expected life of our sequencer.
The lifetime of the sequencer =
Life expectancy for sequencer
Number of requests per month  200  Million 
Number of requests per year  f2.4  Billion 
Lifetime of sequencer  f7686143363.63  years 
Therefore, our service can run for a long time before the range depletes.