Java의 정규식 - Java Regex 예제

Java의 정규식에 오신 것을 환영합니다. Java에서는 정규식이라고도 합니다. 프로그래밍을 시작했을 때 Java 정규식은 나에게 악몽이었습니다. 이 자습서는 Java에서 정규식을 마스터하는 데 도움을 주기 위한 것입니다. 또한 Java Regex 학습을 새로 고치기 위해 여기로 돌아올 것입니다.

자바의 정규 표현식

Pattern: Pattern 개체는 정규식의 컴파일된 버전입니다. 패턴 클래스에는 공개 생성자가 없으며 정규식 인수를 전달하여 패턴 객체를 생성하기 위해 공개 정적 메소드인 compile을 사용합니다.
Matcher: Matcher는 생성된 패턴 개체와 입력 문자열 패턴을 일치시키는 Java 정규식 엔진 개체입니다. Matcher 클래스에는 공용 생성자가 없으며 입력 문자열을 인수로 사용하는 패턴 개체 matcher 메서드를 사용하여 Matcher 개체를 가져옵니다. 그런 다음 입력 문자열이 정규식 패턴과 일치하는지 여부에 따라 부울 결과를 반환하는 matches 메서드를 사용합니다.
PatternSyntaxException: 정규식 구문이 올바르지 않으면 PatternSyntaxException이 발생합니다.

Java Regex 예제 프로그램을 살펴보겠습니다.

package com.journaldev.util;

import java.util.regex.*;

public class PatternExample {

	public static void main(String[] args) {
		Pattern pattern = Pattern.compile(".xx.");
		Matcher matcher = pattern.matcher("MxxY");
		System.out.println("Input String matches regex - "+matcher.matches());
		// bad regular expression
		pattern = Pattern.compile("*xx*");

	}

}

이 자바 정규식 예제 프로그램을 실행하면 아래와 같은 결과가 나타납니다.

Input String matches regex - true
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*xx*
^
	at java.util.regex.Pattern.error(Pattern.java:1924)
	at java.util.regex.Pattern.sequence(Pattern.java:2090)
	at java.util.regex.Pattern.expr(Pattern.java:1964)
	at java.util.regex.Pattern.compile(Pattern.java:1665)
	at java.util.regex.Pattern.(Pattern.java:1337)
	at java.util.regex.Pattern.compile(Pattern.java:1022)
	at com.journaldev.util.PatternExample.main(PatternExample.java:13)

자바 정규 표현식은 문자열을 중심으로 돌아가므로 정규식 패턴 일치를 수행하는 일치 메서드를 제공하기 위해 문자열 클래스가 자바 1.4에서 확장되었습니다. 내부적으로는 Pattern 및 Matcher java regex 클래스를 사용하여 처리하지만 분명히 코드 라인을 줄입니다. Pattern 클래스에는 정규식과 입력 문자열을 인수로 사용하고 일치 후 부울 결과를 반환하는 matches 메서드도 포함되어 있습니다. 따라서 아래 코드는 입력 문자열을 Java의 정규식과 일치시키는 데 적합합니다.

String str = "bbb";
System.out.println("Using String matches method: "+str.matches(".bb"));
System.out.println("Using Pattern matches method: "+Pattern.matches(".bb", str));

따라서 요구 사항이 입력 문자열이 패턴과 일치하는지 확인하는 것이라면 간단한 문자열 일치 방법을 사용하여 시간과 코드 줄을 절약해야 합니다. 입력 문자열을 조작해야 하거나 패턴을 재사용해야 하는 경우에만 Pattern 및 Matches 클래스를 사용해야 합니다. regex에 의해 정의된 패턴은 왼쪽에서 오른쪽으로 문자열에 적용되며 소스 문자가 일치에 사용되면 다시 사용할 수 없습니다. 예를 들어 정규식 "121\은 "31212142121\과 "_121____121\의 두 배만 일치합니다.

Java의 정규식 - 일반적인 일치 기호

Regular Expression	Description	Example
.	Matches any single character	(“…”, “a%”) – true(“…”, “.a”) – true (“…”, “a”) – false
^aaa	Matches aaa regex at the beginning of the line	(“^a.c.”, “abcd”) – true (“^a”, “ac”) – false
aaa$	Matches regex aaa at the end of the line	(“…cd$”, “abcd”) – true(“a$”, “a”) – true (“a$”, “aca”) – false
[abc]	Can match any of the letter a, b or c. [] are known as character classes.	(“^[abc]d.”, “ad9”) – true(“[ab].d$”, “bad”) – true (“[ab]x”, “cx”) – false
[abc][12]	Can match a, b or c followed by 1 or 2	(“[ab][12].”, “a2#”) – true(“[ab]…[12]”, “acd2”) – true (“[ab][12]”, “c2”) – false
[^abc]	When ^ is the first character in [], it negates the pattern, matches anything except a, b or c	(“[^ab][^12].”, “c3#”) – true(“[^ab]…[^12]”, “xcd3”) – true (“[^ab][^12]”, “c2”) – false
[a-e1-8]	Matches ranges between a to e or 1 to 8	(“[a-e1-3].”, “d#”) – true(“[a-e1-3]”, “2”) – true (“[a-e1-3]”, “f2”) – false
xx	yy	Matches regex xx or yy

자바 정규식 메타문자

Java 정규식에는 일부 메타 문자가 있으며 일반적인 일치 패턴에 대한 단축 코드와 같습니다.

Regular Expression	Description
\d	Any digits, short of [0-9]
\D	Any non-digit, short for [^0-9]
\s	Any whitespace character, short for [\t\n\x0B\f\r]
\S	Any non-whitespace character, short for [^\s]
\w	Any word character, short for [a-zA-Z_0-9]
\W	Any non-word character, short for [^\w]
\b	A word boundary
\B	A non word boundary

정규식에서 메타문자를 일반 문자로 사용하는 방법에는 두 가지가 있습니다.

메타 문자 앞에 백슬래시(\)를 붙입니다.
메타 문자를 \\Q(따옴표 시작) 및 \\E(끝) 안에 두십시오.

Java의 정규식 - 수량자

Java Regex Quantifier는 일치시킬 문자의 발생 횟수를 지정합니다.

Regular Expression	Description
x?	x occurs once or not at all
X*	X occurs zero or more times
X+	X occurs one or more times
X{n}	X occurs exactly n times
X{n,}	X occurs n or more times
X{n,m}	X occurs at least n times but not more than m times

Java Regex Quantifier는 문자 클래스 및 캡처 그룹과 함께 사용할 수도 있습니다. 예를 들어 [abc]+는 - a, b 또는 c -를 한 번 이상 의미합니다. (abc)+는 "abc\ 그룹을 한 번 더 의미합니다. 이제 Capturing Group에 대해 설명하겠습니다.

Java의 정규식 - 그룹 캡처

Java 캡처 그룹의 정규식은 여러 문자를 단일 단위로 처리하는 데 사용됩니다. ()를 사용하여 그룹을 만들 수 있습니다. 캡처 그룹과 일치하는 입력 문자열 부분은 메모리에 저장되며 역참조를 사용하여 불러올 수 있습니다. matcher.groupCount 메서드를 사용하여 Java 정규식 패턴에서 캡처 그룹 수를 찾을 수 있습니다. 예를 들어 ((a)(bc))에는 3개의 캡처 그룹(((a)(bc)), (a) 및 (bc))이 포함됩니다. 백슬래시(\\) 다음에 불러올 그룹의 번호와 함께 정규 표현식에서 역참조를 사용할 수 있습니다. 그룹 및 역참조 캡처는 혼란스러울 수 있으므로 예를 들어 이해해 보겠습니다.

System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false

첫 번째 예에서 런타임에 첫 번째 캡처 그룹은 (\\w\\d)이며 입력 문자열 "a2a2\와 일치하고 메모리에 저장될 때 "a2\로 평가됩니다. 따라서 "a2\를 참조하므로 true를 반환합니다. 같은 이유로 두 번째 문이 false를 인쇄합니다. 문 3과 4에 대한 이 시나리오를 직접 이해해 보세요. :) 이제 패턴의 몇 가지 중요한 방법을 살펴보겠습니다. 및 Matcher 클래스.

플래그가 있는 패턴 개체를 만들 수 있습니다. 예를 들어 Pattern.CASE_INSENSITIVE는 대소문자를 구분하지 않는 일치를 활성화합니다.
Pattern 클래스는 String 클래스의 split() 메서드와 유사한 split(String) 메서드도 제공합니다.
Pattern 클래스 toString() 메서드는 이 패턴이 컴파일된 문자열 정규식을 반환합니다.
Matcher 클래스에는 start() 및 end() 인덱스 메서드가 있어 입력 문자열에서 일치 항목이 발견된 위치를 정확하게 보여줍니다.
Matcher 클래스는 문자열 조작 방법 replaceAll(String replacement) 및 replaceFirst(String replacement)도 제공합니다.

간단한 예제 프로그램에서 이러한 Java 정규식 메서드를 살펴보겠습니다.

package com.journaldev.util;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExamples {

	public static void main(String[] args) {
		// using pattern with flags
		Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
		Matcher matcher = pattern.matcher("ABcabdAb");
		// using Matcher find(), group(), start() and end() methods
		while (matcher.find()) {
			System.out.println("Found the text \"" + matcher.group()
					+ "\" starting at " + matcher.start()
					+ " index and ending at index " + matcher.end());
		}

		// using Pattern split() method
		pattern = Pattern.compile("\\W");
		String[] words = pattern.split("one@two#three:four$five");
		for (String s : words) {
			System.out.println("Split using Pattern.split(): " + s);
		}

		// using Matcher.replaceFirst() and replaceAll() methods
		pattern = Pattern.compile("1*2");
		matcher = pattern.matcher("11234512678");
		System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
		System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
	}

}

위의 자바 정규식 예제 프로그램의 출력은 다음과 같습니다.

Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678

이것이 Java의 정규식에 대한 전부입니다. Java Regex는 처음에는 어려워 보이지만 한동안 사용하다 보면 쉽게 배우고 사용할 수 있습니다.

GitHub 리포지토리에서 전체 코드와 더 많은 정규식 예제를 확인할 수 있습니다.