The power of algorithms, Kai-Fu Lee talks about the importance of algorithms

The power of algorithms, Kai-Fu Lee talks about the importance of algorithms

The power of algorithms

Algorithm is one of the most important cornerstones in the field of computer science, but it has been ignored by some domestic programmers. Many students have a misunderstanding when seeing the variety of programming languages ​​that some companies require when recruiting. They think that learning computer is learning various programming languages, or that learning the latest languages, technologies, and standards is the best way to pave the way. In fact, everyone was misled by these companies. Although programming languages ​​should be learned, learning computer algorithms and theories is more important, because computer algorithms and theories are more important, because computer languages ​​and development platforms are changing with each passing day, but those algorithms and theories, such as data structures, algorithms, Compilation principle, computer architecture, relational database principle, etc. On the "Kaifu Student Network", a student vividly compared these basic courses to "internal skills" and the new language, technology, and standards to "external skills." Those who follow the trend all day only know how to move. Without the skill, it is impossible to become a master.

Algorithm and me

When I transferred to the Department of Computer Science in 1980, not many people were specialized in computer science. Many people in other departments laughed at us and said, “Do you know why you only want to add a'science' in your department, but not a'physical science department' or a'chemical science department'? Because they are really science, and there is no need to superfluous, and You yourself have a guilty conscience, for fear of not being'scientific', so that you want to cover it up." In fact, they are completely wrong about this. People who really understand computers (not just "programmers") have considerable knowledge in mathematics. They can use the rigorous thinking of scientists to verify, and they can also use the pragmatic means of engineers to solve problems-and this kind of thinking and means The best interpretation of is "algorithm."

I remember that the Othello game software that I wrote when I was reading the game won the world championship. At that time, the person who came in second thought I was winning him by luck, and asked unconvinced how many moves my program could search per second. When he found that my software was more than 60 times faster than him in search efficiency. At that time, he was completely defeated. Why can I do 60 times more work on the same machine? This is because I used a new algorithm that can convert an exponential function into four approximate tables, and an approximate answer can be obtained as long as a constant time is used. In this example, whether to use the right algorithm is the key to winning the world championship.

I still remember that the vice president of Bell Labs visited my school in 1988 in order to understand why their speech recognition system was dozens of times slower than the one I developed, and the speed difference was even greater after expanding to a large vocabulary system. Hundreds of times as much. Although they bought a few supercomputers and barely got the system up and running, their product department was disgusted by such expensive computing resources, because "expensive" technology has no application prospects. In the process of discussing with them, I was surprised to find that an O(n*m) dynamic programming was actually made O(n*n*m) by them. What is even more surprising is that they have published a lot of articles about this, even named their algorithm a very special name, and nominated the algorithm to a scientific conference, hoping to get a grand prize. At that time, the researchers at Bell Labs were of course extremely smart, but they all came from mathematics, physics, or electrical engineering. They had never studied computer science or algorithms before they made such a basic mistake. I think those people will never laugh at people who study computer science again!

Algorithms in the Internet Age

Someone might say: "Computers are so fast today, are algorithms still important?" In fact, there will never be a computer that is too fast, because we will always come up with new applications. Although under the influence of Moore's Law, the computing power of computers is increasing rapidly every year, and the price is also declining. But let us not forget that the amount of information that needs to be processed is increasing exponentially. Everyone now creates a lot of data (photos, videos, voices, texts, etc.) every day. Increasingly advanced recording and storage methods make the amount of information for each of us explode. The information flow and log capacity of the Internet are also growing rapidly. In terms of scientific research, with the advancement of research methods, the amount of data has reached an unprecedented level. Whether it is three-dimensional graphics, massive data processing, machine learning, and speech recognition, a huge amount of calculation is required. In the Internet age, more and more challenges need to be solved by excellent algorithms.

Give another example of the Internet age. Searching on the Internet and mobile phones, if you are looking for a nearby coffee shop, how should the search engine handle this request? The easiest way is to find out all the cafes in the entire city, then calculate the distance between their location and you, sort them, and then return the closest result. But how to calculate the distance? There are many algorithms in graph theory that can solve this problem.

This may be the most intuitive, but it is definitely not the fastest. If there are only a few cafes in a city, then it should be no problem to do so, anyway, the amount of calculation is not large. But if there are many cafes in a city and many users need similar searches, the pressure on the server will be much greater. In this case, how can we optimize the algorithm?

First of all, we can do a "pretreatment" for cafes throughout the city. For example, divide a city into several "grids", and then place the user in a certain grid according to the location of the user, and only sort the cafes in the grid by distance.

The problem comes again. If the grid sizes are the same, most of the results may appear in a grid in the city center, while there are very few results in a grid in the suburbs. In this case, we should divide the city center into a few more grids. Furthermore, the grid should be a "tree structure", the top layer is a large grid-the entire city, and then descending layer by layer, the grid is getting smaller and smaller, which is conducive to the user's precise search-if in the bottom grid There are not many search results, and the user can increase the search range step by step.

The above algorithm is very practical for the coffee shop example, but is it universal? the answer is negative. Abstract the cafe. It is a "point". What if you want to search for a "noodle"? For example, if a user wants to go to a reservoir, and a reservoir has several entrances, which one is closest to the user? At this time, the above-mentioned "tree structure" should be changed to "r-tree", because each node in the middle of the tree is a range, a boundary range (reference: http://www.cs.umd.edu/~hjs/rtrees/index.html ).

Through this small example, we can see that the requirements of the application are ever-changing. In many cases, it is necessary to decompose a complex problem into a number of simple small problems, and then choose the appropriate algorithm and data structure.

Parallel Algorithm: Google’s Core Advantage

The above example is a small case in Google! Every day, Google’s website has to process more than one billion searches, GMail has to store the 2G mailboxes of tens of millions of users, and Google Earth has to allow hundreds of thousands of users to travel across the earth at the same time, and submit appropriate images to each via the Internet. user. Without good algorithms, none of these applications can become a reality.

In these applications, even the most basic problems will bring great challenges to traditional computing. For example, every day, more than one billion users visit Google's website and use Google's services, which also generate many logs. Because Log is increasing rapidly every second, we must have a smart way to deal with it. I once asked in an interview about how to analyze and process Log. Although many interviewees’ answers are logically correct, they are almost infeasible in practical applications. According to their algorithm, even if we use tens of thousands of machines, our processing speed is not as fast as the data generation speed.

So how does Google solve these problems?

First of all, in the network age, even if there is the best algorithm, it must be able to execute in a parallel computing environment. In Google's data center, we use very large parallel computers. However, when the traditional parallel algorithm is running, the efficiency will decrease rapidly after increasing the number of machines. That is to say, if ten machines have five times the effect, it may only be dozens of times the effect when the number is increased to 1,000. The cost of doing this with half the effort is something no company can afford. Moreover, in many parallel algorithms, as long as one node makes a mistake, all calculations will be lost.

So how did Google develop parallel computing that is both efficient and fault-tolerant?

Jeff Dean, Google’s most senior computer scientist, realized that most of the data processing required by Google can be boiled down to a simple parallel algorithm: MapReduce . This algorithm can achieve very high efficiency in many kinds of calculations, and it is scalable (that is, even if a thousand machines cannot achieve a thousand times the effect, at least it can achieve a few hundred times the effect). Another major feature of MapReduce is that it can use a large number of cheap machines to form a powerful server farm. Finally, its fault-tolerant performance is exceptionally good. Even if a server farm goes down in half, the entire fram can still run. It is precisely because of this genius' knowledge that the MapReduce algorithm is developed. With the help of this algorithm, Google can increase the amount of calculation almost infinitely and grow with the ever-changing Internet applications.

Algorithms are not limited to computers and networks

Take an example from outside the computer field: In high-energy physics research, many experiments can have several terabytes of data per second. But because of insufficient processing power and storage capacity, scientists have to discard most of the unprocessed data. But everyone should know that the information of the new element is likely to be hidden in the data we have no time to process. Similarly, in any other field, algorithms can change human lives. For example, the study of human genes may invent new medical methods because of algorithms. In the field of national security, effective algorithms may prevent the next 911 from happening. In terms of meteorology, algorithms can better predict the occurrence of natural disasters in the future to save lives.

Therefore, if you put the development of computers in an environment of rapid growth in applications and data, you will definitely find that the importance of algorithms is not declining, but increasing.

Reference: https://cloud.tencent.com/developer/article/1055096 The power of algorithms, Kai-Fu Lee talks about the importance of algorithms-Cloud + Community-Tencent Cloud