R에서 R 제곱 - R에서 R2를 계산하는 방법?

안녕하세요, 독자 여러분! 이 기사에서는 기계 학습의 중요한 개념인 R 프로그래밍의 R 제곱(R2)을 살펴보겠습니다.

그럼 시작하겠습니다!!

R 제곱 오차 메트릭의 중요성

데이터 과학 및 기계 학습 영역에서 오류 메트릭의 중요성을 먼저 이해합시다!!

오류 메트릭을 사용하면 특정 데이터 세트에서 기계 학습 모델의 성능을 평가할 수 있습니다.

알고리즘 클래스에 따라 다양한 오류 메트릭 모델이 있습니다.

분류 알고리즘을 처리하고 평가하기 위한 Confusion Matrix가 있습니다. R 제곱은 회귀 알고리즘에 의해 만들어진 예측을 평가하는 중요한 오류 메트릭입니다.

R 제곱(R2)은 모델의 성능을 정당화하는 회귀 오류 메트릭입니다. 독립 변수가 응답/목표 변수에 대한 값을 설명할 수 있는 정도의 값을 나타냅니다.

따라서 R-제곱 모델은 대상 변수가 단일 단위로 독립 변수의 조합에 의해 얼마나 잘 설명되는지 설명합니다.

R 제곱 값의 범위는 0에서 1 사이이며 아래 공식으로 표시됩니다.

R2= 1- SSres/SStot

여기,

SSres: 잔차 오차의 제곱합
SStot: 오류의 총합을 나타냅니다.

항상 기억하십시오. R 제곱 값이 높을수록 예측 모델이 더 좋습니다!

I. 선형 회귀가 있는 R에서 R-제곱

이 예에서는 선형 회귀 모델에서 R 제곱 오차 메트릭의 개념을 구현했습니다.

처음에는 read.csv() 함수를 사용하여 데이터 세트를 로드합니다.
다음 단계는 데이터를 교육 및 테스트 데이터 세트로 분리하는 것입니다. 이것은 createDataPartition() 메서드를 사용하여 달성됩니다.
모델링 전에 아래 예와 같이 오류 측정항목에 대한 맞춤 함수를 지정했습니다.
마지막 단계는 lm() 함수를 사용하여 선형 회귀 모델을 적용한 다음 모델의 성능을 평가하기 위해 사용자 정의 R 제곱 함수를 호출했습니다

예:

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)

#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

### MODELLING OF DATA USING MACHINE LEARNING ALGORITHMS ###
#Defining error metrics to check the error rate and accuracy of the Regression ML algorithms

#1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

#2. R SQUARED error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
  cor(y_actual,y_predict)^2
}

##MODEL 1: LINEAR REGRESSION
linear_model = lm(cnt~., train_data) #Building the Linear Regression Model on our dataset
summary(linear_model)
linear_predict=predict(linear_model,test_data[-27]) #Predictions on Testing data

LR_MAPE = MAPE(test_data[,27],linear_predict) # Using MAPE error metrics to check for the error rate and accuracy level
LR_R = RSQUARE(test_data[,27],linear_predict) # Using R-SQUARE error metrics to check for the error rate and accuracy level
Accuracy_Linear = 100 - LR_MAPE

print("MAPE: ")
print(LR_MAPE)
print("R-Square: ")
print(LR_R)
print('Accuracy of Linear Regression: ')
print(Accuracy_Linear)

산출:

아래에서 볼 수 있듯이 R 제곱 값은 0.82입니다. 즉, 모델이 데이터에 대해 잘 작동했습니다.

> print("MAPE: ")
[1] "MAPE: "
> print(LR_MAPE)
[1] 17.61674
> print("R-Square: ")
[1] "R-Square: "
> print(LR_R)
[1] 0.8278258
> print('Accuracy of Linear Regression: ')
[1] "Accuracy of Linear Regression: "
> print(Accuracy_Linear)
[1] 82.38326

II. summary() 함수를 사용한 R 제곱 값

모델링 후 R 제곱 값을 추출하기 위해 R의 summary() 함수를 사용할 수도 있습니다.

아래 예에서는 데이터 프레임에 선형 회귀 모델을 적용한 다음 summary()$r.squared를 사용하여 r 제곱 값을 얻었습니다.

예:

rm(list = ls())
 
A <- c(1,2,3,4,2,3,4,1) 
B <- c(1,2,3,4,2,3,4,1) 
a <- c(10,20,30,40,50,60,70,80) 
b <- c(100,200,300,400,500,600,700,800) 
data <- data.frame(A,B,a,b) 

print("Original data frame:\n") 
print(data) 

ml = lm(A~a, data = data) 

# Extracting R-squared parameter from summary 
summary(ml)$r.squared

산출:

[1] "Original data frame:\n"
  A B  a   b
1 1 1 10 100
2 2 2 20 200
3 3 3 30 300
4 4 4 40 400
5 2 2 50 500
6 3 3 60 600
7 4 4 70 700
8 1 1 80 800

[1] 0.03809524

결론

이상으로 이 주제를 마치겠습니다. 궁금한 점이 있으면 아래에 의견을 남겨주세요.

그때까지 즐거운 배움!! :)