การเปรียบเทียบประสิทธิภาพตัวแบบการจำแนกกับข้อมูลด้านการเงิน

Please use this identifier to cite or link to this item: http://nuir.lib.nu.ac.th/dspace/handle/123456789/6005

Title:	การเปรียบเทียบประสิทธิภาพตัวแบบการจำแนกกับข้อมูลด้านการเงิน Comparison of classification models performance with financial data
Authors:	Praifa Kosasirisin ปรายฟ้า โกษศิริศิลป์ Anamai Na-udom อนามัย นาอุดม Naresuan University Anamai Na-udom อนามัย นาอุดม anamain@nu.ac.th anamain@nu.ac.th
Keywords:	การถดถอยลอจิสติกทวิภาค ต้นไม้ตัดสินใจแบบจำแนกและแบบถดถอย เทคนิคนาอีฟเบย์ ข้อมูลด้านการเงิน ข้อมูลไม่สมดุล Binary Logistic Regression Classification and Regression Tree Naïve Bayes Financial data Imbalance dataset
Issue Date:	2566
Publisher:	Naresuan University
Abstract:	This research aims to study the construction and compare the performance of three classification models, namely binary logistic regression, classification and regression tree, and Naïve Bayes techniques using three financial datasets with different numbers of qualitative and quantitative independent variables. The characteristics of three datasets are classified as the German credit dataset with higher number of qualitative than quantitative independent variables, Default of credit card client dataset with fewer qualitative than quantitative independent variables, and Bank marketing dataset with equal number of qualitative and quantitative independent variables, respectively. The study was employed under the original data set and the data set where the imbalance was adjusted using over sampling, under sampling, and hybrid methods. The performance of each classification technique was validated using the 5–Fold Cross-Validation technique and the efficiency comparison was performed by considering accuracy, recall, precision, and overall accuracy criteria. The results showed that Binary logistic regression performs best on the German credit dataset with a higher number of qualitative than quantitative independent variables and on a Bank marketing dataset with an equal number of qualitative and quantitative independent variables, with the accuracy of 76.00% and 83.93%, respectively. It was also found that the classification and regression tree technique performed best on Default of credit card clients dataset with fewer qualitative than quantitative independent variables with an accuracy of 81.99%. งานวิจัยนี้มีวัตถุประสงค์เพื่อศึกษากระบวนการทำงานและเปรียบเทียบประสิทธิภาพของตัวแบบการจำแนก 3 เทคนิค ได้แก่ การถดถอยลอจิสติกทวิภาค เทคนิคต้นไม้ตัดสินใจแบบจำแนกและแบบถดถอย และเทคนิคนาอีฟเบย์ โดยใช้ชุดข้อมูลด้านการเงินที่มีจำนวนของตัวแปรอิสระเชิงคุณภาพและจำนวนของตัวแปรอิสระเชิงปริมาณแตกต่างกัน 3 ชุดข้อมูล ได้แก่ ชุดข้อมูลเครดิตเยอรมันที่มีจำนวนตัวแปรอิสระเชิงคุณภาพมากกว่าเชิงปริมาณ ชุดข้อมูลลูกค้าบัตรเครดิตที่มีจำนวนตัวแปรอิสระเชิงคุณภาพน้อยกว่าเชิงปริมาณ และชุดข้อมูลการตลาดของธนาคารที่มีจำนวนตัวแปรอิสระเชิงคุณภาพเท่ากับเชิงปริมาณ โดยศึกษาภายใต้ชุดข้อมูลตั้งต้นและชุดข้อมูลที่มีการปรับปรุงความไม่สมดุลด้วยเทคนิคการสุ่มเพิ่ม เทคนิคการสุ่มลด และเทคนิคการสุ่มผสมผสาน จากนั้นทำการทดสอบประสิทธิภาพด้วยหลักการ 5–Fold Cross-Validation โดยมีการวัดประสิทธิภาพตัวแบบการจำแนกจากค่าความแม่นยำ ค่าเรียกคืน ค่าความเที่ยง และค่าประสิทธิภาพโดยรวม ผลการวิจัยพบว่า การถดถอยลอจิสติกทวิภาคมีประสิทธิภาพดีที่สุดบนชุดข้อมูลเครดิตเยอรมันที่มีจำนวนตัวแปรอิสระเชิงคุณภาพมากกว่าเชิงปริมาณ และชุดข้อมูลการตลาดของธนาคารที่มีจำนวนตัวแปรอิสระเชิงคุณภาพเท่ากับเชิงปริมาณ โดยมีค่าความแม่นยำเท่ากับ 76.00% และ 83.93% ตามลำดับ ในขณะที่เทคนิคต้นไม้ตัดสินใจแบบจำแนกและแบบถดถอยมีประสิทธิภาพดีที่สุดบนชุดข้อมูลลูกค้าบัตรเครดิตที่มีจำนวนตัวแปรอิสระเชิงคุณภาพน้อยกว่าเชิงปริมาณ โดยมีค่าความแม่นยำเท่ากับ 81.99%
URI:	http://nuir.lib.nu.ac.th/dspace/handle/123456789/6005
Appears in Collections:	คณะวิทยาศาสตร์

Files in This Item:

File	Description	Size	Format
PraifaKosasirisin.pdf		1.98 MB	Adobe PDF	View/Open

Show full item record