A critical analysis of the conflict between Classical
and Connectionist theories of Cognition.
971261500
Professor John Vervaeke
Date: November 26th, 1998
When a
self-proclaimed radical eliminativist like Paul Churchland admits (however
grudgingly) that there exist many “historical cases of successful
inter-theoretic reduction,” (Churchland, 1988, p43) certainly we of more
moderate nature should remain open to the proposal, in any context. Yet to an alarming degree, the key figures
of the Connectionism/Classicism debate have operated notwithstanding such
considerations, with each side claiming full title to the one true answer.[1] The Classicists, lead by Fodor and Pylyshyn
make sweeping claims about how “Connectionist cognitive architectures cannot …
support productive cognitive capacities” (Fodor & Pylyshyn, 1988,
p35). Nor is fault wholly theirs, with
Connectionists like Ramsey, Stitch and Garon using Connectionism as a
justification for the elimination of ‘folk psychology’ (see Ramsey et al,
1990). This antagonism, which I refer
to as mutual eliminativism has created a deep trench in the surrounding fields
of cognitive science, artificial intelligence and philosophy of mind, one that
impairs our ability to progress in a meaningful way towards an adequate theory
(and perhaps mechanisation) of thought.
The purpose of this
paper is to examine the arguments made by each group, in light of a new hypothesis
or perspective on the issue; specifically, with an eye to the possibility that
the two theories are not mutually exclusive, but rather, complementary. This argument will be taken largely from the
notion of levels of analysis that has
so frequently emerged in the literature on this topic. First, however, each half of the mutual
eliminativism argument must be elucidated, and refuted. It is important to understand that the goal
of this treatment is not to determine the validity
of Classicism, nor of Connectionism, but rather, to demonstrate that they are
not mutually exclusive theories.
This argument, as
previously observed, has been lead for the past several years by the work of
Jerry Fodor and Zenon Pylyshyn. They
propose a three part criticism[2]
of Connectionist networks, based largely on their (perceived) inability to
conform with the Language of Thought
Hypothesis put forward by Fodor ten years earlier (Fodor, 1976). The argument stems from the observation that
thought, like natural language, seems to possess certain semantic properties,
namely: productivity, systematicity, and further that thought, at least, seems
also to possess inferential coherence (Fodor and Pylyshyn, 1988, p. 33).
The first of these, productivity, describes
the observation that just as English sentences, for example, can express an
unbounded number of concepts, (for a formal argument to this effect, see
Chomsky, 1968) so too can thoughts represent an unbounded number of
propositions.[3] The argument
that a Connectionist network cannot support the productivity attribute of
thought is as follows:
(1)
Sentences (or Thoughts) can be arbitrarily long, by being built up of
recursively defined sub-units.
(2)
(1) implies that there are arbitrarily many non-atomic expressions.
(3)
For (2) to be valid, the mind must be composed of a symbol system, with
semantic relationships between concepts, or symbols.
(4)
Connectionist networks do
not incorporate semantic relations between nodes representing different
concepts, only causal relations.
Thus they do not satisfy the criteria of (3) and therefore cannot be
productive in nature. (Adapted from Fodor and Pylyshyn, 1988, pp. 33-36.)
The key to the argument’s undoing lies in the premise, stated but
unproven, of (4). The premise (italicised) is actually only applicable to a subset
of Connectionist networks, known as localist
networks (for elaboration of the localist/distributed distinction, see Ramsey,
Stitch and Garon, 1990, p. 510). Fodor
and Pylyshyn acknowledge this, in fact, in a footnote:
To simplify the exposition, we assume a ‘localist’ approach, in which each semantically interpreted node corresponds to a single Connectionist unit; but nothing relevant to this discussion is changed if these nodes actually consist of patterns over a cluster of units. (Fodor and Pylyshyn, 1988, p15)
Therein lies the flaw of the argument.
The argument is substantially
different if we assume a truly distributed net instead of a localist
one. In a localist network, where each
node is labelled to correspond to a single semantic atom, it is entirely true
that the only relation it can have to other nodes is a numerical (i.e. causal,
non-semantic) one. In a distributed net
however, varying activation levels of the
same nodes, (not discrete clusters, as Fodor and Pylyshyn seem to think)
can represent different semantic concepts, and furthermore, these states can be
conceptually combined, subtracted from one another, or be modified by a third
state, expressing a semantic relation between the two of them (for more on the
combination of distributed states, see
McClelland, Rumelhart, and Hinton, 1986, especially pp. 37-39). While their argument does indeed hold for a
network where all semantic content is localised to individual nodes, it fails
to generalise to Connectionist networks as a whole, and consequently fails as
an argument against them.
The second argument, from systematicity and compositionality, has a very similar flavour. As Fodor and Pylyshyn put it,
What we mean when we say that linguistic capacities are systematic is that the ability to produce/understand some sentences is intrinsically connected to the ability to produce/understand certain others. (Fodor and Pylyshyn, 1988, p.37)
More prosaically, this refers to the
observation reiterated by many of the Classicist faction, that the ability to understand
the phrase “John loves Mary” intrinsically implies the ability to understand
the phrase “Mary loves John.” What examinations
of this type demonstrate, argue Fodor and Pylyshyn, is that thought is composed
of semantically discrete concepts and relations between concepts. Further, that having a representation of a
given semantic relation R implies that it can not only be used in the form aRb, but also in bRa, or cRd, as long as
a, b, c, d, are all well-represented concepts in the mind. The contention as to Connectionism is that
since relations between nodes are only
represented though a numerical connection strength between two atomic concepts,
it is possible to create a Connectionist network that could understand aRb
without the ability to understand bRa.
This, they argue, contradicts what we know of how thought works,
consequently, Connectionism cannot explain cognition (at least, in this
respect).
The defence, however
dull, is a simple one. Fodor and
Pylyshyn make the same localist assumption here that they made in trying to
argue from productivity. Their argument
that such a contradictory Connectionist net could be created is based on their
assumption that semantic relations can only be expressed through a weighted
connection between two localised nodes
of semantic content. As has already
been demonstrated however, even the most infallible argument from that premise
is a failure as an attack on Connectionism as a whole, since it fails to scale
up to all Connectionist networks.
The final argument is
perhaps the simplest to disprove, since there is now empirical data to
contradict it. Fodor and Pylyshyn’s
third claim is that human thought possesses inferential
coherence and neural networks do not.
In a way, this is a sort of subset of the systematicity argument, which
says that our inferential mechanisms (as a sort of relation of the kind
described in the previous paragraphs) must remain consistent. To use Fodor and Pylyshyn’s example, our
ability to infer P from P&Q&R should necessarily imply our ability to
infer P&Q, or Q, or R from the same premises, that is, our inferential mechanisms
should be consistent, or coherent (Fodor and Pylyshyn, 1988, pp. 46-48). The argument is that while you could create a neural net that possessed
inferential coherence, you could equally well create one that lacked it.
Unfortunately, their argument on this point is
crippled two-fold. First, from a
theoretical standpoint, their argument is once again subject to the assumption
that individual inferential units (i.e. P, P&Q, P&Q&R) are embodied
in individual nodes. It should be noted
now, after repeatedly seeing this pattern, that Fodor and Pylyshyn have indeed presented
a very strong case against localist Connectionism,
but again, their argument here fails to scale up. Their second challenge comes from the network described in
Ramsey, Stitch and Garon, which, when given certain physical information about
cats and dogs and fish, correctly inferred that cats have legs, but not scales,
without having been exposed to that statement ahead of time (Ramsey, Stitch and
Garon, 1990, p. 516). In other words, their
network possesses inferential coherence.
In short, the
Classical position presents some persuasive descriptions of mental processes,
ones which I personally believe to be valid.
What they fail to do, however, is successfully refute the Connectionist
hypothesis. They have provided a strong
argument for the case that Classical theory is a necessary component of an eventual theory of cognition, what they
have not demonstrated is that it is sufficient
on its own, for such a theory.
The mutual
eliminativism debate is actually somewhat slanted. While there are some Connectionists
that see no place for Classical theory, a large body of Connectionist literature
is somewhat more defensive. Rather than
outwardly attacking the Classical view, they are preoccupied with the (often
monumental) task of defending their views from the onslaughts brought on by
philosophers like Fodor. Still, any movement has its fanatics, and in
considering this half of the issue, I will look principally at the work of
Ramsey et al, who propose the conditional that if connectionism turns out to be valid, then it we ought to take an eliminativist stance towards ‘folk’
theories of human psychology and thought.[4] The treatment will be considerably shorter than
that of Fodor and Pylyshyn’s work partly because Ramsey et al’s claim is
weaker,[5]
and partly because my reply is simpler.
What Ramsey et al
argue, essentially, is that if they can demonstrate that Connectionist nets operate
without employing Classical concepts about discrete representation and
propositional attitudes, and if it ends up being the case that Connectionist
theory is correct, then we can do away with the Classical notions of thought
insofar as they not only contribute nothing, but are actually wrong,
[M]erely showing that a theory in which a class of entities plays a role is inferior to a successor theory plainly is not sufficient to show that the entities do not exist. Often a more appropriate conclusion is that the rejected theory was wrong, perhaps seriously wrong, about some of the properties of the entities in its domain, or about the laws governing those entities… (Ramsey, Stitch, and Garon, 1990, p. 501).
The argument begins by
outlining what they feel is an adequate description of folk psychological
theory. Specifically, they identify three
properties of propositional (i.e. mental) states that they feel are essential
to folk psychology, namely, that they are functionally
discrete, semantically interpreted
states with causal relations to other
propositional states (Ramsey, Stitch, and Garon, 1990, p. 504).[6] Functional discreteness describes the
relatively comfortable hypothesis that we can lose or forget one propositional
attitude (e.g. “My keys are in the kitchen”) without disturbing the rest of our
attitudes, they are separate from one another. Semantic interpretability is exactly what it sounds like, the
statement that thoughts possess meaning, that the propositional attitudes are referential
to actual concepts or objects.[7] Causal relation describes the fact that our
beliefs and desires can interact, create, or alter other beliefs and desires –
this is apparent to anyone who, upon observing (i.e. forming the attitude) that
“the cat has walked out of the room,” suddenly finds their belief about “the
cat is in the room” significantly altered.
The argument for the
elimination of Classical architecture comes from their claim that they can
design a neural network that performs the cognitive tasks associated with
humans (in a drastically restricted problem space) while being incompatible
with the three tenets of folk psychology previously identified. The crux of this argument actually comes
down to the second property, semantic interpretability. By designing their network as a distributed,
rather than localist net, the claim is that no semantic interpretability is “comfortable”
(Ramsey, Stitch, Garon, 1990, p. 508).
Their pursuant demonstrations are everything
they claim to be, the network does perform learning and reasoning tasks, and, if
you accept their premise, it does so without appeal to semantic
interpretability. So where, then, lies
the counter argument?
It may already have become apparent that the counter
to this proposal is very similar, almost identical, in fact, to Dennett’s instrumentalism
(Dennett, 1987). That is, the Classical
theory of cognition still works perfectly
well and there is consequently no reason for rejecting it. Referring back to Ramsey et al’s comment, a
theory cannot be rejected only because the successor is more accurate, it must
be materially, demonstrably, wrong. While Ramsey et al contend that they can
perform the same cognitive tasks without a semantically interpretable system,[8]
this demonstration does not adequately contest either the predictive or the
explanatory power of the Classical theory.
What these Connectionist arguments from example
demonstrate is that Connectionism is prima
facie, a viable mechanism for cognition.
There is not much argument, even from Classicists like Fodor, that
neural nets of the kind described can and
do perform many cognitive tasks.
While there is argument as to whether they can be adequately scaled up
to handle human-level cognition, the fact is that Connectionists have proposed
a feasible mechanism for cognition.
What they have yet to demonstrate that they have a monopoly on cognitive
processes.
Having examined the failings of both philosophical extremes, it seems only
natural to attempt some resolution through combination. It is important to reiterate that what has been refuted thus far is not the validity of either school of thought, but the validity of the argument that, in each case, the other school must necessarily be wrong. Given this, what we would ideally like to describe is a metatheory of cognition that had, as different components of its explanatory power, the separate, but compatible notions of Classical and Connectionist cognition.
To develop this theory, I will make use of a highly over-used and often misconstrued concept known as levels of analysis. To avoid ambiguity and misinterpretation, let me say that I am using this term precisely as it was laid out by Marr, his interpretation being both unambiguous and best suited to our purposes here (Marr, 1982). Marr defines three levels of analysis at which a machine can be understood (particularly a machine carrying out information processing tasks):
1. Computational Theory: What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?
2. Representation and Algorithm: How can this computational theory be implemented? In particular, what is the representation for the input and output, and what is the algorithm for the transformation?
3. Hardware Implementation: How can the representation and algorithm be realised physically?
What we must do now, to develop the metatheory, is outline a system under which both theories of cognition can co-exist.
Let us take Classical theory first, as its position is most obvious. No Classical theorist will argue, I think, the statement that the mechanisms proposed for Classical models are not at an implementation level – there is no hypothesis as to how such things as semantic relations or thought-sentences are represented in the human brain, and while higher order computer languages allow for such representations, few theorists would claim that our mechanisms of thought work on the same principles as LISP or PROLOG. Nor, though it may be a slightly less obvious transition, does Classical theory fit at the algorithmic level. While it’s true that Classical theory deals quite explicitly with representation, it does not consider implementation of its theories in a mechanical context, cannot do so, in fact, until it can mechanise meaning.[9] No, the true place for a Classical theory is at the Computational Theoretical level. What Classic arguments describe are principles of cognition: productivity, systematicity, inference. They posit answers to the questions of “why is the logic of Classical strategy justified?” and “what is the appropriate viewpoint on the structure of thought?” Classical theory provides a fairly high-level description of cognition, with little consideration for the more technical or implementation aspects.
Connectionism, on the other hand, provides a wonderful counterpart to Classicism, in that the mechanisms it investigates, and the theories it supports are ones based on the specifics of implementation. Whereas Classical theorists have a large bulk of literature and debate, Connectionists have actual, working models of certain basic cognitive tasks. The question as to which level, 2 or 3, Connectionism falls under has a great deal to do with the type of Connectionism being discussed. Smolensky’s PTC nets are obviously at an algorithmic level, as he takes pains to distance himself from actual, neural level representation (Smolensky, 1988, p. 8). Others, like Ramsey et al. examine the issue from a more physical, implementation level, but in both cases, the results are compatible with higher-level, Classicist descriptions. In fact, Fodor and Pylyshyn don’t
deny this, they support the possibility of Connectionism as an implementation of Classical ideas (Fodor and Pylyshyn, 1988, pp. 64-66).
In the same light that physicists don’t describe a planet’s orbital velocity in terms of quantum level fluctuations, in the same way medical doctors can effectively treat a bullet wound without appealing to cell theory, description of cognition can happen independently, correctly, on multiple levels. There does not have to be an absolute truth of cognition. As this paper has shown, attempts to attack the validity of one level of analysis from the standpoint the other are futile – they do not demonstrate that one theory is right and the other wrong, they only serve to demonstrate that the two operate at different conceptual levels. In fact, it is arguable, as it has been argued here, that both theories are saying the same thing, merely in different language.
Works Referenced
Chomsky, N. Language
and Mind. New York: Harcourt, Brace
and World, 1968.
Churchland, Paul M. Matter
and Consciousness, Revised Edition.
Cambridge, MA: A Bradford Book, The MIT Press, 1988.
Dennett, Daniel C. ‘True Believers’, in The Intentional Stance. Cambridge, MA: A Bradford Book, The MIT
Press, 1987. 13-35.
Fodor, J. The
Language of Thought. Sussex: Harvester Press, 1976.
Fodor, J. and Z. W.
Pylyshyn.: 1988, ‘Connectionism and Cognitive Architecture: A Critical
Analysis’, Cognition 28, 3-71.
Marr, D. Vision. San Francisco: W. H. Freeman, 1982.
McClelland, J. L., D. E.
Rumelhart and G. E. Hinton: 1986, ‘The Appeal of Parallel Distributed
Processing’, in Rumelhart and McClelland (1986a), pp. 3-44.
Ramsey, W., S. Stitch, and
J. Garon: 1990, ‘Connectionism, Eliminativism, and the Future of Folk
Psychology’, in J. Tomberlin (ed.), Philosophical
Perspectives, Vol. 4, Ridgeview,
Atascadero, California, pp. 499-533.
Rumelhart, D. E., and J. L.
McClelland eds. Parallel Distributed
Processing, 2 vols., Cambridge, MA: MIT Press, 1986a.
Smolensky, P.: 1988, ‘On the
Proper Treatment of Connectionism’, Behavioral
and Brain Sciences 11, 1-74.
[1] Actually, this is an exaggeration. There are some modern theorists, mostly Connectionist that support a compatibility hypothesis, see for example, Rumelhart et al, 1986.
[2] In Fodor and Pylyshyn, 1988, the argument is actually presented in four parts: productivity, systematicity, compositionality and inferential coherence. However, Fodor and Pylyshyn themselves acknowledge that compositionality is really subsumed by systematicity, making the argument (effectively) three part.
[3] As Fodor, 1976, points out, this is necessary, since the expression of each sentence must correlate with the speaker having a corresponding thought.
[4] Including, of course, such ‘folk’ notions as Fodor’s Language of Thought,
[5] Being conditional, not absolute.
[6] The arguments offered in support of this thesis are, in fact must be, relatively obvious, since the theory they are presenting for analysis is a “common sense” theory of psychology. Moreover, they are outside the scope of this paper.
[7] Though, of course, this is not to say that we can only form propositional attitudes about ‘real’ or ‘true’ concepts and relations. Elves riding unicorns are perfectly permissible.
[8] This is a dubious claim. While they consider several objections, they are unduly dismissive, talking about how it seems “highly improbable” that a discrete semantic representation could be found. A full treatment of their objections, however, is beyond the scope of this paper.
[9] And thereby solve the problem of mechanical reasoning, not an easy task.