Link To Content
1.1 INTRODUCTION
The aim of this book is to provide an up-to-date review of different approaches to classification, compare their performance on a wide range of challenging data-sets, and draw conclusions on their applicability to realistic industrial problems.
Before describing the contents, we first need to define what we mean by classification, give some background to the different perspectives on the task, and introduce the European Community StatLog project whose results form the basis for this book.
1.2 CLASSIFICATION
The task of classification occurs in a wide range of human activity. At its broadest, the term could cover any context in which some decision or forecast is made on the basis of currently available information, and a classification procedure is then some formal method for repeatedly making such judgments in new situations. In this book we shall consider a more restricted interpretation. We shall assume that the problem concerns the construction of a procedure that will be applied to a continuing sequence of cases, in which each new case must be assigned to one of a set of pre-defined classes on the basis of observed attributes or features. The construction of a classification procedure from a set of data for which the true classes are known has also been variously termed pattern recognition, discrimination, or supervised learning (in order to distinguish it from unsupervised learning or clustering in which the classes are inferred from the data).
Contexts in which a classification task is fundamental include, for example, mechanical procedures for sorting letters on the basis of machine-read postcodes, assigning individuals to credit status on the basis of financial and other personal information, and the preliminary diagnosis of a patient’s disease in order to select immediate treatment while awaiting definitive test results. In fact, some of the most urgent problems arising in science, industry and commerce can be regarded as classification or decision problems using complex and often very extensive data.
1.3 PERSPECTIVES ON CLASSIFICATION
As the book’s title suggests, a wide variety of approaches has been taken towards this task. Three main historical strands of research can be identified: statistical, machine learning and neural network. These have largely involved different professional and academic groups, and emphasized different issues. All groups have, however, had some objectives in common. They have all attempted to derive procedures that would be able:
* to equal, if not exceed, a human decision-maker’s behaviour, but have the advantage of consistency and, to a variable extent, explicitness,
* to handle a wide variety of problems and, given enough data, to be extremely general,
* to be used in practical settings with proven success.
1.3.1 Statistical approaches
Two main phases of work on classification can be identified within the statistical community. The first, “classical” phase concentrated on derivatives of Fisher’s early work on linear discrimination. The second, “modern” phase exploits more flexible classes of models, many of which attempt to provide an estimate of the joint distribution of the features within each class, which can in turn provide a classification rule.
Statistical approaches are generally characterized by having an explicit underlying probability model, which provides a probability of being in each class rather than simply a classification. In addition, it is usually assumed that the techniques will be used by statisticians, and hence some human intervention is assumed with regard to variable selection and transformation, and overall structuring of the problem.
1.3.2 Machine learning
Machine Learning is generally taken to encompass automatic computing procedures based on logical or binary operations, that learn a task from a series of examples. Here we are just concerned with classification, and it is arguable what should come under the Machine Learning umbrella. Attention has focussed on decision-tree approaches, in which classification results from a sequence of logical steps. These are capable of representing the most complex problem given sufficient data (but this may mean an enormous amount!). Other techniques, such as genetic algorithms and inductive logic procedures (ILP), are currently under active development and in principle would allow us to deal with more general types of data, including cases where the number and type of attributes may vary, and where additional layers of learning are superimposed, with hierarchical structure of attributes and classes and so on.
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. They must mimic human reasoning sufficiently to provide insight into the decision process. Like statistical approaches, background knowledge may be exploited in development, but operation is assumed without human intervention.
1.3.3 Neural networks
The field of Neural Networks has arisen from diverse sources, ranging from the fascination of mankind with understanding and emulating the human brain, to broader issues of copying human abilities such as speech and the use of language, to the practical commercial, scientific, and engineering disciplines of pattern recognition, modelling, and prediction. The pursuit of technology is a strong driving force for researchers, both in academia and industry, in many fields of science and engineering. In neural networks, as in Machine Learning, the excitement of technological progress is supplemented by the challenge of reproducing intelligence itself.
Abroad class of techniques can come under this heading,but, generally, neural networks consist of layers of interconnected nodes, each node producing a non-linear function of its input. The input to a node may come from other nodes or directly from the input data. Also, some nodes are identified with the output of the network. The complete network therefore represents a very complex set of interdependencies which may incorporate any degree of nonlinearity, allowing very general functions to be modeled.
In the simplest networks, the output from one node is fed into another node in such a way as to propagate “messages” through layers of interconnecting nodes. More complex behaviour may be modelled by networks in which the final output nodes are connected with earlier nodes, and then the system has the characteristics of a highly nonlinear system with feedback. It has been argued that neural networks mirror to a certain extent the behaviour of networks of neurons in the brain.
Neural network approaches combine the complexity of some of the statistical techniques with the machine learning objective of imitating human intelligence: however, this is done at a more “unconscious” level and hence there is no accompanying ability to make learned concepts transparent to the user.
1.3.4 Conclusions
The three broad approaches outlined above form the basis of the grouping of procedures used in this book. The correspondence between type of technique and professional background is inexact: for example, techniques that use decision trees have been developed in parallel both within the machine learning community, motivated by psychological research or knowledge acquisition for expert systems, and within the statistical profession as a response to the perceived limitations of classical discrimination techniques based on linear functions. Similarly strong parallels may be drawn between advanced regression techniques developed in statistics, and neural network models with a background in psychology, computer science and artificial intelligence.
It is the aim of this book to put all methods to the test of experiment, and to give an objective assessment of their strengths and weaknesses. Techniques have been grouped according to the above categories. It is not always straightforward to select a group: for example some procedures can be considered as a development from linear regression, but have strong affinity to neural networks. When deciding on a group for a specific technique, we have attempted to ignore its professional pedigree and classify according to its essential nature.
1.1 INTRODUCTION
The aim of this book is to provide an up-to-date review of different approaches to classification, compare their performance on a wide range of challenging data-sets, and draw conclusions on their applicability to realistic industrial problems.
Before describing the contents, we first need to define what we mean by classification, give some background to the different perspectives on the task, and introduce the European Community StatLog project whose results form the basis for this book.
1.2 CLASSIFICATION
The task of classification occurs in a wide range of human activity. At its broadest, the term could cover any context in which some decision or forecast is made on the basis of currently available information, and a classification procedure is then some formal method for repeatedly making such judgments in new situations. In this book we shall consider a more restricted interpretation. We shall assume that the problem concerns the construction of a procedure that will be applied to a continuing sequence of cases, in which each new case must be assigned to one of a set of pre-defined classes on the basis of observed attributes or features. The construction of a classification procedure from a set of data for which the true classes are known has also been variously termed pattern recognition, discrimination, or supervised learning (in order to distinguish it from unsupervised learning or clustering in which the classes are inferred from the data).
Contexts in which a classification task is fundamental include, for example, mechanical procedures for sorting letters on the basis of machine-read postcodes, assigning individuals to credit status on the basis of financial and other personal information, and the preliminary diagnosis of a patient’s disease in order to select immediate treatment while awaiting definitive test results. In fact, some of the most urgent problems arising in science, industry and commerce can be regarded as classification or decision problems using complex and often very extensive data.
1.3 PERSPECTIVES ON CLASSIFICATION
As the book’s title suggests, a wide variety of approaches has been taken towards this task. Three main historical strands of research can be identified: statistical, machine learning and neural network. These have largely involved different professional and academic groups, and emphasized different issues. All groups have, however, had some objectives in common. They have all attempted to derive procedures that would be able:
* to equal, if not exceed, a human decision-maker’s behaviour, but have the advantage of consistency and, to a variable extent, explicitness,
* to handle a wide variety of problems and, given enough data, to be extremely general,
* to be used in practical settings with proven success.
1.3.1 Statistical approaches
Two main phases of work on classification can be identified within the statistical community. The first, “classical” phase concentrated on derivatives of Fisher’s early work on linear discrimination. The second, “modern” phase exploits more flexible classes of models, many of which attempt to provide an estimate of the joint distribution of the features within each class, which can in turn provide a classification rule.
Statistical approaches are generally characterized by having an explicit underlying probability model, which provides a probability of being in each class rather than simply a classification. In addition, it is usually assumed that the techniques will be used by statisticians, and hence some human intervention is assumed with regard to variable selection and transformation, and overall structuring of the problem.
1.3.2 Machine learning
Machine Learning is generally taken to encompass automatic computing procedures based on logical or binary operations, that learn a task from a series of examples. Here we are just concerned with classification, and it is arguable what should come under the Machine Learning umbrella. Attention has focussed on decision-tree approaches, in which classification results from a sequence of logical steps. These are capable of representing the most complex problem given sufficient data (but this may mean an enormous amount!). Other techniques, such as genetic algorithms and inductive logic procedures (ILP), are currently under active development and in principle would allow us to deal with more general types of data, including cases where the number and type of attributes may vary, and where additional layers of learning are superimposed, with hierarchical structure of attributes and classes and so on.
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. They must mimic human reasoning sufficiently to provide insight into the decision process. Like statistical approaches, background knowledge may be exploited in development, but operation is assumed without human intervention.
1.3.3 Neural networks
The field of Neural Networks has arisen from diverse sources, ranging from the fascination of mankind with understanding and emulating the human brain, to broader issues of copying human abilities such as speech and the use of language, to the practical commercial, scientific, and engineering disciplines of pattern recognition, modelling, and prediction. The pursuit of technology is a strong driving force for researchers, both in academia and industry, in many fields of science and engineering. In neural networks, as in Machine Learning, the excitement of technological progress is supplemented by the challenge of reproducing intelligence itself.
Abroad class of techniques can come under this heading,but, generally, neural networks consist of layers of interconnected nodes, each node producing a non-linear function of its input. The input to a node may come from other nodes or directly from the input data. Also, some nodes are identified with the output of the network. The complete network therefore represents a very complex set of interdependencies which may incorporate any degree of nonlinearity, allowing very general functions to be modeled.
In the simplest networks, the output from one node is fed into another node in such a way as to propagate “messages” through layers of interconnecting nodes. More complex behaviour may be modelled by networks in which the final output nodes are connected with earlier nodes, and then the system has the characteristics of a highly nonlinear system with feedback. It has been argued that neural networks mirror to a certain extent the behaviour of networks of neurons in the brain.
Neural network approaches combine the complexity of some of the statistical techniques with the machine learning objective of imitating human intelligence: however, this is done at a more “unconscious” level and hence there is no accompanying ability to make learned concepts transparent to the user.
1.3.4 Conclusions
The three broad approaches outlined above form the basis of the grouping of procedures used in this book. The correspondence between type of technique and professional background is inexact: for example, techniques that use decision trees have been developed in parallel both within the machine learning community, motivated by psychological research or knowledge acquisition for expert systems, and within the statistical profession as a response to the perceived limitations of classical discrimination techniques based on linear functions. Similarly strong parallels may be drawn between advanced regression techniques developed in statistics, and neural network models with a background in psychology, computer science and artificial intelligence.
It is the aim of this book to put all methods to the test of experiment, and to give an objective assessment of their strengths and weaknesses. Techniques have been grouped according to the above categories. It is not always straightforward to select a group: for example some procedures can be considered as a development from linear regression, but have strong affinity to neural networks. When deciding on a group for a specific technique, we have attempted to ignore its professional pedigree and classify according to its essential nature.