Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Quinlan J.R. — C4.5: Programs for Machine Learning
Quinlan J.R. — C4.5: Programs for Machine Learning



Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: C4.5: Programs for Machine Learning

Автор: Quinlan J.R.

Аннотация:

Despite its age this classic is invaluable to any serious user of See5 (Windows) or C5.0 (UNIX). C4.5 (See5/C5) is a linear classifier system that is often used for machine learning, or as a data mining tool for discovering patterns in databases. The classifiers can be in the form of either decision trees or rule sets. Just like ID3 it employs a "divide and conquer" strategy and uses entropy (information content) to compute its gain ratio (the split criteria).

C5.0 and See5 are built on C4.5, which is open source and free. However, since C5.0 and See5 are commercial products the code and the internals of the See5/C5 algorithms are not public. This is why this book is still so valuable. The first half of the book explains how C4.5 works, and describes its features, for example, partitioning, pruning, and windowing in detail. The book also discusses how C4.5 should be used, and potential problems with over-fit and non-representative data. The second half of the book gives a complete listing of the source code; 8,800 lines of C-code.

C5.0 is faster and more accurate than C4.5 and has features like cross validation, variable misclassification costs, and boost, which are features that C4.5 does not have. However, since minor misuse of See5 could have cost our company tens of millions of dollars it was important that we knew as much as possible about what we were doing, which is why this book was so valuable.

The reasons we did not use, for example, neural networks were:
(1) We had a lot of nominal data (in addition to numeric data)
(2) We had unknown attributes
(3) Our data sets were typically not very large and still we had a lot of attributes
(4) Unlike neural networks, decision trees and rule sets are human readable, possible to comprehend, and can be modified manually if necessary. Since we had problems with non-representative data but understood these problems as well as our system quite well, it was sometimes advantageous for us to modify the decision trees.

If you are in a similar situation I recommend See5/C5 as well as this book.


Язык: en

Рубрика: Computer science/

Статус предметного указателя: Неизвестно

ed2k: ed2k stats

Год издания: 1992

Количество страниц: 312

Добавлена в каталог: 30.01.2014

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2024
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте