URL short link realization method

URL short link realization method

In the recent project development, it is necessary to realize the need of URL long link to short link, so I found some information on the Internet and sorted it out by the way. Welcome children with ideas to leave a message, and we will discuss it together.

1. The benefits of short links

  1. Content requirements (such as text messages, restrictions on the number of link words in Weibo) 

  2. Easy to manage (convenient to track clicks in the background, easy to count)

  3. User-friendly (looks very Cool, enhance user experience)

The general idea is to define a URL mapping algorithm, map a long URL to a short URL, use a database or redis cache to store the mapping relationship, and implement the mapping algorithm. The key part is the mapping algorithm, and then we will talk about the mapping algorithm in detail.

2. Mapping algorithm

1. Base conversion

Most of the schemes use different bases for mutual conversion, such as decimal to hexadecimal, and decimal to sixty binary. Even if we record 100 million pieces of data, the 64 base of 100 million is also suitable for short links Parameter, which converts the self-increasing ID into a string of short links. Long links and short links are stored in the database or cache in the key, value mapping relationship for more convenient access.

Disadvantages: There is no way to guarantee the length of the converted short link string. In the case of high concurrency, how to ensure that it can be distributed quickly is a problem.

2. Fixed algorithm

We use 6 characters to represent the short link, using the ASCII characters'a'-'z','0'-'5', a total of 32 characters as a set. Each character has 32 states, six characters can represent 32^6 (1073741824), then how to get these six characters, Md5 the incoming long URL to get a 32-bit string, this string changes a lot , Is 16 to the 32 th power, which basically guarantees uniqueness. Divide the 32 bits into four parts, each with 8 characters. At this time, the probability becomes 16 to the 8th power, which is 4294967296. The probability of collision of this number is also relatively small. The key is the subsequent processing. We think of this 8-bit character as a hexadecimal integer, that is, 1*('0x'.$val), and then take 0-30 bits, each group of 5, calculate its integer value, and then map it to us Among the prepared 32 characters, you can finally get a 6-bit short link address.

code show as below:

function shorten( $long_url) {$base32 = "abcdefghijklmnopqrstuvwxyz012345"; $hex = md5( $long_url ); $hexLen = strlen( $hex ); $subHexLen = $hexLen/8; $output = array(); for( $ i = 0; $i <$subHexLen; $i++) {$subHex = substr( $hex, $i * 8, 8 ); $subHex = 0x3FFFFFFF & (1 * ('0x'. $subHex) );

    $out =''; 

          for( $j = 0; $j <6; $j++) {$val = 0x0000001F & $int; $out .= $base32[$val]; $int = $int >> 5;} $output[] = $out;} return $output;}

Two other views:

In fact, they will not achieve this, and efficiency must be considered. The normal practice should be hash check + id counter + balance tree search. If the hash algorithm is designed cleverly, the id counter can be omitted.

The hash value generated by sha1 on the long URL is stored in hashtable or redis, and the hash value is compared before shortening. If the same, the short code generated before can be queried.

Reference: https://cloud.tencent.com/developer/article/1044039 URL Short Link Implementation Method-Cloud + Community-Tencent Cloud