Doğal dil işleme ve makine öğrenmesi ile resmi yazışmaların konu bazlı sınıflandırılması

Bozdoğan, Zeynep

Doğal dil işleme ve makine öğrenmesi ile resmi yazışmaların konu bazlı sınıflandırılması

Dosyalar

15285.pdf (2.33 MB)

Tarih

2024

Yazarlar

Bozdoğan, Zeynep

Yayıncı

Düzce Üniversitesi

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Dijital gelişimler ile birlikte kamu kurumlarındaki resmi yazışma belgeleri Elektronik Belge Yönetim Sistemi (EBYS)'leri aracılığı ile yürütülmektedir. Belgelerin Standart Dosya Planı (SDP) kodlarının uygun olarak belirlenmesi, doğru arşivleme ve arşiv imha işlemleri için önem arz etmektedir. Belgeyi oluşturan kişiler tarafından belgeye verilen SDP kodu bilgisi çeşitli sebeplerle hatalı olarak yazılabilmektedir. Bu hataları önlemek için belgelerin SDP kodunu otomatik olarak doğru tespit eden uygulamaların geliştirilmesi faydalı olacaktır. Bu amaçla çalışmada iki farklı veri seti oluşturulmuş ve bu veri setleri üzerinde öncelikle ön işlemler, ardından ön işlem uygulanmış veriler üzerinde çeşitli sınıflandırma algoritmaları uygulanarak belgelerin SDP kodunun sınıflandırma işlemleri ile tespit edilmesi gerçekleştirilmiştir. Sınıflandırma işlemlerinin sonuçları karşılaştırılarak analiz edilmiştir. Birinci veri seti üzerinde yapılan analizlerde en başarılı sınıflandırma sonuçları, Lojistik Regresyon (LR) algoritması ile 1000 adet resmi yazışma belgesinin 978 tanesinin SDP kodunun doğru tahmin edilmesi ile elde edilmiştir. İkinci veri seti üzerinde yapılan analizlerde en başarılı sınıflandırma sonuçları genel olarak 2100 adet belgenin 1851 tanesinin doğru konulara (SDP koduna) sınıflandırıldığı ve % 88,14 oranında başarı gösteren Non-Negatif Matrix Factorization (NNMF) algoritması ile elde edilmiştir.
In line with digital advancements, official correspondence documents in public institutions are managed through Electronic Document Management Systems (EDMS). Appropriate determination of the Standard File Plan (SFP) codes of documents is important for correct archiving and archival destruction process. The SFP code information given to the document by the people who created the document may be written incorrectly for various reasons. To prevent these errors, it would be useful to develop applications that automatically detect the correct SFP code of documents. For this purpose, two different data sets were created in the study; initially, preprocessing was performed on these sets, followed by the application of various classification algorithms on the preprocessed data to detect, the documents' SFP codes. The results of the classification processes were compared and analyzed. In the analysis of the first dataset, the most successful classification results were obtained by using the correctly predicting the SFP code of 978 out of 1000 official correspondence documents with the Logistic Regression (LR) algorithm. In the analyses performed on the second dataset, the most successful classification results were obtained with the Non-Negative Matrix Factorization (NNMF) algorithm, which classified 1851 of 2100 documents into the correct subjects (SFP code) and achieved 88.14% success rate.

Anahtar Kelimeler

Bilgi ve Belge Yönetimi, Information and Records Management, Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol

Bağlantı

https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=KMB79M3N7zK1UR2WYeRgQsoeOiyjoiYx4Ywy-U5YgcjanDjflsi_GviMfkoLlAo1
https://hdl.handle.net/20.500.12684/15285

Koleksiyon

Fen Bilimleri Enstitüsü Tez Koleksiyonu

Detaylı Öğe Kaydı

Doğal dil işleme ve makine öğrenmesi ile resmi yazışmaların konu bazlı sınıflandırılması

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon