A regular expression leads to high CPU troubleshooting process

A regular expression leads to high CPU troubleshooting process

This article records a regular expression that leads to high CPU troubleshooting. Since it is not possible to directly use the online code test, I have sorted out the code myself. The specific code is as follows:

public class AppMain {
	public static void main(String[] args) throws InterruptedException {
		final String regex="^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(- [a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$";
		final String email="blog.laofu.online.fuweilao@vip.qq.com#";
		for (int i = 0; i <1000000; i++) {
			Matcher matcher = RegexUtils.matcher(regex, email);
			matcher.find();
			Thread.sleep(10);
//matcher.group();
		}
	}
}

When running the program, we can see that the java process occupies the CPU 82.1%. Since the server I use is 1 core + 2G, the load avg occupancy is also very high.

Use top -H -p 4214 to view the occupancy of each thread

hex using printf '%x\n' 4217the process turn into a hexadecimal value of 1079. Execution jstack 4214|grep 1079 -A 100obtain the thread stack information:

"main" #1 prio=5 os_prio=0 tid=0x00007f943004c800 nid=0x1079 runnable [0x00007f9439fe0000]
   java.lang.Thread.State: RUNNABLE
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4264)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
       //at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4195)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4195)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4195)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4293)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4799)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4196)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:4286)
        at java.util.regex.Pattern$Curly.match(Pattern.java:4248)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672)
        at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4815)
        at java.util.regex.Pattern$Prolog.match(Pattern.java:4755)
        at java.util.regex.Pattern$Begin.match(Pattern.java:3539)
        at java.util.regex.Matcher.search(Matcher.java:1248)
        at java.util.regex.Matcher.find(Matcher.java:637)
        at org.rz.search.spider.AppMain.main(AppMain.java:13)

"VM Thread" os_prio=0 tid=0x00007f94300cc800 nid=0x1079 runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f9430121000 nid=0x1079 waiting on condition 

JNI global references: 5

From the stack information above, it can be seen that it is a regular recursive call, resulting in a deep stack. Check the final stack entry: It at org.rz.search.spider.AppMain.main(AppMain.java:13)can be concluded that the problem is the cause of the regular matching.

Why does a regular cause the CPU to soar?

We know that regular rules generally adopt a greedy mode. If the current string does not match, it will cause character backtracking. If the string to be matched is too long, the number of matches will increase exponentially.

Regular performance test can go to [Regular performance test (regex101)]

Using simple data matching, the comparison has reached 168,997 times at this time.

I added two more characters again, and the number of comparisons has increased to 404895.

Of course, if the match is successful, the number of matches is very small

So after regular use of time, it should be a lot of attention, specifically how to resolve this, or according to specific business scenarios to implement, I personally recommend using a simple filtering rules, the obvious question first character to remove

In this first embodiment can be directly determined whether the character contains illegal characters #,$,-,?..., etc., and then use a regular matching again.

Reference: https://cloud.tencent.com/developer/article/1780881 A regular expression leads to high CPU troubleshooting process-Cloud + Community-Tencent Cloud