Python에서 BLEU 점수를 계산하는 방법은 무엇입니까?

Python의 Bleu 점수는 기계 번역 모델의 장점을 측정하는 메트릭입니다. 원래는 번역 모델 전용으로 설계되었지만 현재는 다른 자연어 처리 애플리케이션에도 사용됩니다.

BLEU 점수는 문장을 하나 이상의 참조 문장과 비교하고 후보 문장이 참조 문장 목록과 얼마나 잘 일치하는지 알려줍니다. 0과 1 사이의 출력 점수를 제공합니다.

BLEU 점수 1은 후보 문장이 참조 문장 중 하나와 완벽하게 일치함을 의미합니다.

이 점수는 이미지 캡션 모델에 대한 일반적인 측정 메트릭입니다.

이 튜토리얼에서는 nltk 라이브러리의 sentence_bleu() 함수를 사용합니다. 시작하자.

Python에서 Bleu 점수 계산

Bleu 점수를 계산하기 위해서는 참고문장과 후보문장을 토큰 형태로 제공해야 합니다.

이 섹션에서 이를 수행하고 점수를 계산하는 방법을 배웁니다. 필요한 모듈을 가져오는 것부터 시작하겠습니다.

from nltk.translate.bleu_score import sentence_bleu

이제 목록 형식으로 참조 문장을 입력할 수 있습니다. 또한 sentence_bleu() 함수에 토큰을 전달하기 전에 문장에서 토큰을 생성해야 합니다.

1. 문장 입력 및 분할

참조 목록의 문장은 다음과 같습니다.

    'this is a dog'
    'it is dog
    'dog it is'
    'a dog, it is'

split 함수를 사용하여 토큰으로 분할할 수 있습니다.

reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
print(reference)

출력 :

[['this', 'is', 'a', 'dog'], ['it', 'is', 'dog'], ['dog', 'it', 'is'], ['a', 'dog,', 'it', 'is']]

이것은 문장이 토큰의 형태로 보이는 것입니다. 이제 sentence_bleu() 함수를 호출하여 점수를 계산할 수 있습니다.

2. Python에서 BLEU 점수 계산

점수를 계산하려면 다음 코드 줄을 사용하십시오.

candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

출력 :

BLEU score -> 1.0

후보 문장이 참조 집합에 속하므로 만점 1점을 얻습니다. 다른 것을 시도해 봅시다.

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

출력 :

BLEU score -> 0.8408964152537145

참조 세트에 문장이 있지만 정확히 일치하지는 않습니다. 이것이 우리가 0.84점을 받는 이유입니다.

3. Python에서 BLEU 점수를 구현하기 위한 전체 코드

이 섹션의 전체 코드는 다음과 같습니다.

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate )))

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

4. n-그램 점수 계산

문장을 일치시키는 동안 모델이 한 번에 일치시킬 단어 수를 선택할 수 있습니다. 예를 들어 한 번에 하나씩(1그램) 일치하는 단어를 선택할 수 있습니다. 또는 쌍(2-그램) 또는 세 쌍(3-그램)의 단어를 일치시키도록 선택할 수도 있습니다.

이 섹션에서는 이러한 n-gram 점수를 계산하는 방법을 배웁니다.

sentence_bleu() 함수에서 개별 그램에 해당하는 가중치가 있는 인수를 전달할 수 있습니다.

예를 들어 그램 점수를 개별적으로 계산하려면 다음 가중치를 사용할 수 있습니다.

Individual 1-gram: (1, 0, 0, 0)
Individual 2-gram: (0, 1, 0, 0). 
Individual 3-gram: (1, 0, 1, 0). 
Individual 4-gram: (0, 0, 0, 1).

동일한 Python 코드는 다음과 같습니다.

from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is a dog'.split()

print('Individual 1-gram: %f' % sentence_bleu(reference, candidate, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 1, 0, 0)))
print('Individual 3-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 1, 0)))
print('Individual 4-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 0, 1)))

출력 :

Individual 1-gram: 1.000000
Individual 2-gram: 1.000000
Individual 3-gram: 0.500000
Individual 4-gram: 1.000000

기본적으로 sentence_bleu() 함수는 BLEU-4라고도 하는 누적 4그램 BLEU 점수를 계산합니다. BLEU-4의 가중치는 다음과 같습니다.

(0.25, 0.25, 0.25, 0.25)

BLEU-4 코드를 살펴보겠습니다.

score = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
print(score)

출력 :

0.8408964152537145

이것이 n-gram 가중치를 추가하지 않고 얻은 정확한 점수입니다.

결론

이 튜토리얼은 Python에서 BLEU 점수를 계산하는 방법에 관한 것입니다. 그것이 무엇인지, 개별 및 누적 n-gram Bleu 점수를 계산하는 방법을 배웠습니다. 저희와 함께 즐거운 학습이 되셨기를 바랍니다!