การเปรียบเทียบเทคนิคการเรียนรู้ของเครื่องเพื่อสร้างตัวแบบการจำแนก ด้วยการปรับปรุงชุดข้อมูลอสมดุล

Please use this identifier to cite or link to this item: http://nuir.lib.nu.ac.th/dspace/handle/123456789/5332

Full metadata record

DC Field	Value	Language
dc.contributor	WARITPON SAENGTHONGRATTANACHOT	en
dc.contributor	วริทธิ์พล แสงทองรัตนโชติ	th
dc.contributor.advisor	Anamai Na-udom	en
dc.contributor.advisor	อนามัย นาอุดม	th
dc.contributor.other	Naresuan University	en
dc.date.accessioned	2023-04-18T02:56:15Z	-
dc.date.available	2023-04-18T02:56:15Z	-
dc.date.created	2565	en_US
dc.date.issued	2565	en_US
dc.identifier.uri	http://nuir.lib.nu.ac.th/dspace/handle/123456789/5332	-
dc.description.abstract	The purpose of this research was to study the performance of classification techniques on 3 different datasets, which are Bank dataset with an equal number of qualitative and quantitative independent variables; Data Scientist dataset with a greater number of qualitative than quantitative independent variables; and Rice Species dataset with a greater number of quantitative than qualitative. Since these datasets are imbalanced, two under sampling techniques were applied here, which are simple random sampling and k-mean clustering, to enhance the equilibrium of the data set. 5-Fold cross validation concept was applied for constructing the classification models, when designing a training dataset and test dataset. Each dataset was used to build the classification models based on 5 selected techniques including Discriminant Analysis, Naive Bayes, Decision Tree C4.5, Random Forest and Artificial Neural Network. The results indicated that Random Forest outperformed when the dataset with the same number of independent and quantitative variables. Discriminant Analysis worked well when a greater number of quantitative variables and Artificial Neural Network performed well when datasets with a greater number of qualitative variables. Moreover, the result has also shown that balancing the dataset with simple random sampling yielded a more efficient classification model than k-mean clustering. The last notice from this study, the study confirmed that measuring the performance of imbalanced classification model with only accuracy was probably not so effective. Therefore, the precision, recall and F-measure should be considered when selecting the most appropriate classification models for making an application.	en
dc.description.abstract	งานวิจัยนี้มีวัตถุประสงค์เพื่อศึกษาประสิทธิภาพเทคนิคการจำแนกกับชุดข้อมูลที่มีจำนวนของตัวแปรอิสระเชิงคุณภาพและเชิงปริมาณแตกต่างกันทั้งหมด 3 ชุดข้อมูลได้แก่ ชุดข้อมูลสถาบันการเงินซึ่งเป็นชุดข้อมูลที่มีจำนวนตัวแปรอิสระเชิงคุณภาพและเชิงปริมาณเท่ากัน ชุดข้อมูลสายพันธุ์ข้าวซึ่งเป็นชุดข้อมูลที่มีตัวแปรอิสระเชิงปริมาณเท่านั้นและชุดข้อมูลนักวิทยาศาสตร์ข้อมูลซึ่งเป็นชุดข้อมูลที่มีจำนวนตัวแปรอิสระเชิงคุณภาพมากกว่าเชิงปริมาณ โดยปรับปรุงชุดข้อมูลอสมดุลให้สมดุลด้วยวิธีสุ่มลด 2 เทคนิคได้แก่ การสุ่มตัวอย่างแบบง่ายและการแบ่งกลุ่มข้อมูลแบบเคมีน แบ่งชุดข้อมูลเรียนรู้และชุดข้อมูลทดสอบด้วยหลักการ 5-Fold โดยนำชุดข้อมูลแต่ละชุดมาสร้างตัวแบบการจำแนกด้วยเทคนิคการจำแนกทั้งหมด 5 เทคนิคได้แก่ การวิเคราะห์จำแนกกลุ่มเชิงเส้นโดยวิธีของฟิชเชอร์ เทคนิคนาอีฟเบย์ ต้นไม้ตัดสินใจด้วยอัลกอริทึม C4.5 เทคนิคป่าสุ่มและโครงข่ายประสาทเทียม ผลจากการศึกษาพบว่า เทคนิคป่าสุ่มสามารถทำงานได้ดีภายใต้ชุดข้อมูลที่มีจำนวนตัวแปรอิสระเชิงคุณและปริมาณเท่ากัน เทคนิคการจำแนกการวิเคราะห์จำแนกกลุ่มเชิงเส้นโดยวิธีของฟิชเชอร์สามารถทำงานได้ดีภายใต้ชุดข้อมูลที่มีตัวแปรอิสระเชิงปริมาณทุกตัวและโครงข่ายประสาทเทียมสามารถทำงานได้ดีภายใต้ชุดข้อมูลที่มีจำนวนตัวแปรอิสระเชิงคุณภาพมากกว่าปริมาณและพบว่าการปรับปรุงชุดข้อมูลให้สมดุลด้วยการสุ่มตัวอย่างแบบง่ายให้ตัวแบบการจำแนกที่มีประสิทธิภาพสูงกว่าการแบ่งกลุ่มข้อมูลแบบเคมีน อีกทั้งยังพบว่าการวัดประสิทธิภาพตัวแบบการจำแนกที่สร้างจากชุดข้อมูลอสมดุล โดยใช้เพียงค่าความแม่นอย่างเดียวอาจไม่เพียงพอต่อการประเมินประสิทธิภาพ ดังนั้นควรนำค่าความเที่ยง ค่าการเรียกคืนและค่าประสิทธิภาพ มาพิจารณาประกอบการสินใจด้วย	th
dc.language.iso	th	en_US
dc.publisher	Naresuan University	en_US
dc.rights	Naresuan University	en_US
dc.subject	การวิเคราะห์จำแนกกลุ่มเชิงเส้นโดยวิธีของฟิชเชอร์	th
dc.subject	ต้นไม้ตัดสินใจด้วยอัลกอริทึม C4.5	th
dc.subject	เทคนิคป่าสุ่ม	th
dc.subject	โครงข่ายประสาทเทียม	th
dc.subject	การสุ่มตัวอย่างแบบง่าย	th
dc.subject	การแบ่งกลุ่มข้อมูลแบบเคมีน	th
dc.subject	เทคนิคนาอีฟเบย์	th
dc.subject	Fisher's linear discriminant analysis	en
dc.subject	Naive Bayes	en
dc.subject	Decision trees with C4.5 algorithm	en
dc.subject	Random Forest	en
dc.subject	k-mean segmentation	en
dc.subject	Simple random sampling technique	en
dc.subject	Artificial Neural Network	en
dc.subject.classification	Mathematics	en
dc.subject.classification	Education	en
dc.subject.classification	Statistics	en
dc.title	การเปรียบเทียบเทคนิคการเรียนรู้ของเครื่องเพื่อสร้างตัวแบบการจำแนก ด้วยการปรับปรุงชุดข้อมูลอสมดุล	th
dc.title	Comparison of machine learning techniques for classification model construction with modifying imbalanced data	en
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	Anamai Na-udom	en
dc.contributor.coadvisor	อนามัย นาอุดม	th
dc.contributor.emailadvisor	anamain@nu.ac.th	en_US
dc.contributor.emailcoadvisor	anamain@nu.ac.th	en_US
dc.description.degreename	Master of Science (M.S.)	en
dc.description.degreename	วิทยาศาสตรมหาบัณฑิต (วท.ม.)	th
dc.description.degreelevel	Master's Degree	en
dc.description.degreelevel	ปริญญาโท	th
dc.description.degreediscipline	Department of Mathematics	en
dc.description.degreediscipline	ภาควิชาคณิตศาสตร์	th
Appears in Collections:	คณะวิทยาศาสตร์

Files in This Item:

File	Description	Size	Format
WaritponSaengthongrattanachot.pdf		2.72 MB	Adobe PDF	View/Open

Show simple item record