Development of Language, Thought, and EgoCentric Speech
Development of Language, Thought, and EgoCentric Speech (Updated From: Neuropsychiatry, Neuropsychology, Clinical Neuroscience, Academic Press, New York, 2000)
by Rhawn Joseph, Ph.D.

Development of Language, Thought, and EgoCentric Speech
Rhawn Joseph, Ph.D.


Emotion, social-perception, melodic speech, visual-spatial reasoning, as well as visual-emotional dream activity are associated with activity within the right hemisphere and inferior temporal lobe -as well as within the brainstem (chapter 17). Conversely, verbal thinking is associated with the left hemisphere (Joseph 2017). Although an individual may utilize visual, emotional, olfactory, musical, or tactile "imagery" when they think, thinking may also take the form of "words" which might be "heard" or rather, experienced, within one's own head (or mind). When engage in verbal thought, the language axis of the left hemisphere typically becomes activated as indicated by functional imaging (Buchel et al., 2008; Demonet, et al., 2014; Paulesu, et al., 1993; Peterson et al., 2015).

Verbal thinking is clearly a form of communication and generally consists of an organized hierarchy of associations, symbols, and labels which appear before an observer, or which are heard by the thinker --within the minds "ear" and "eye," or rather, within Wernicke's area and the frontal lobe.

Thought (i.e.verbal thinking), can be a means of deduction, clarification, plan and goal formation, and reality manipulation (Craik, 1943; Freud, 1900; Miller et al. 1960; Piaget, 1962). However, it is also a progression, an associative advance which leads from an inner or outer perception to linguistic-motor expression (Freud, 1900); and an elaboration which some have argued appears with an initial or leading idea that is followed by a series of related verbal ideations, or, as originating developmentally from the non-accessible regions of the mind (Freud, 1900; James, 1961; Jung, 1954; Piaget, 1962). In the process of thinking in-words, one often acts to organized information which is "not thought out" and that is not clearly understood, so that it may become thought out and thus comprehended (Ach, 1951; James, 1961; Joseph, 1982; Schilder, 1951).

On the other hand, sometimes the verbal "train of thought" emerges spontaneously and reflexively, as if albeit related ideas simply become strung and attached together with no specific goal or purpose in mind. Moreover, in these instances, sometimes these thoughts rapidly alternate in content and fluctuate between seemingly unrelated ideas --as if triggered by a verbal domino effect where associated ideas become sequentially aroused, each subsequent idea triggering the next. Verbal thoughts are also triggered by agents external to the left half of the brain (such as the right hemisphere or limbic system). Sometimes, however, the production of thought reflects random neural activity. Indeed, sometimes it is exactly that, random and reflexive.


Directed, reflexive or spontaneous, the verbal thoughts always unfold before an observer and are heard within the minds ear. It is a series of pseudo-auditory transactions which are experienced as well as produced sometimes as a purposeful means of explanation. Thinking sometimes is experienced as a form of self-explanation through which ideas, impulses, desires, or thing-in-the-world may be understood, comprehended and possibly acted upon. Paradoxically, it is often a process by which one explains things to oneself. Indeed, as a means of deduction or explanation, and as a form of internal language, it is almost as if one is talking to oneself inside one's head.

Nevertheless, the fact that one acts as both audience and orator, raises a curious question: "who is explaining what to whom ?" A functional duality and in fact a functional multiplicity is thus implied in the production and reception of thought.

Assuming that the subject of thought originates in me, the thinker, and given that the organization of this often linear verbal arrangement is also a product of Self-generated activity, then it should be expected in some instances that "I" should know what "I" am about to think prior to thinking it. "I" should also know the conclusion before it is communicated. In fact, often we do know (albeit non-verbally, tacitly) before we think (and while we think). Sometimes we do not think (at least in words), simply because the question-answer-implications are simultaneously understood without the aid of verbal thought (Joseph, 1982). There is thus some redundancy built into the thinking process as well as an almost inescapable sense of duality in its production and reception.

In that thinking is often a form of communication, it seems that one aspect of the Self, or rather, the brain, has access to the information which is to be verbally thought about, before it is thought about in a verbal form. That is, the source of what will become thought that will be thought about, is often within the Self. However, that source, or pre-thought, exists in a pre-verbal form, and must then be translated and be organized in a verbal linear sequence in order to be thought about and thus understood--at least verbally. In this regard, thinking is sometimes a form of communication through which one part of the brain gains access and an understanding regarding information or knowledge possessed in yet other brain regions; albeit, in non-linguistic form.

Indeed, thinking often serves in part as a means of organizing, interpreting, and explaining impulses which arise in the non-linguistic portions of the nervous system so that the language dependent regions may achieve understanding (Joseph, 1982, 1986a). In fact, although thought may take various non-linguistic forms, e.g. musical thought, visual-imagery, etc., one need only listen to one's own thoughts in order to realize that thinking often consists of an internal linguistic monologue, a series of words heard within one's own head. And, because these particular forms of thought are structured and perceived as words heard within one's head, then they must rely on the same neural pathways subserving the production and perception of language and speech sounds produced outside the head; i.e. the Language Axis--as also demonstrated through functional imaging (Buchel et al., 2008; Demonet, et al., 2014; Paulesu, et al., 1993; Peterson et al., 1988; Price, 2009).


The left hemisphere is genetically predisposed to become dominant for the denotative, syntactic, lexical, grammatical, and motor-expressive aspects of speech--a consequence, in part, of the earlier maturation of the left corticospinal tract which provides the left frontal motor areas a competitive advantage over the right in motor expression. However, this genetic predisposition is also evident prenatally. For example, and as is well known, Wernicke's area and the left superior temporal lobe are dominant for language receptive--and the left superior planum temporal is generally larger in the left hemisphere. As originally determined by Geschwind and Levitsky (1968) the posterior sylvian fissures (the planum temporale that form the core of Wernicke's area) is larger in the left hemisphere in 65% of those brains examined, larger in the right hemisphere in 25% of the examined brain, with 10% showing no difference. This asymmetry, however, is present in the planum temporalr of the fetus (Wada et al., 1975), as well as neonates (Witelson & Palli, 2013); which indicates that the structures involved in the comprehension of language are created prenatally and are determined genetically and by genetic patterns of neural cell migration.

Initially, however, the right and left hemisphere may be somewhat equipotent in regard to language acquisition. Hence, with early left hemisphere injury, language may be acquired by the right hemisphere (Joseph, 1986b; Joseph & Novelly, 1983). Indeed, right hemisphere language acquisition has been demonstrated through dichotic listening, tachicstiscope, and by left hemisphere anesthesia in over 20 adults with histories of early left hemisphere injury (Joseph, 1986b, Joseph & Novelly, 1983). Since the right hemisphere also becomes activated during language tasks--as measured through functional imaging (Bookheimer, et al., 1995; Bottini et al., 2014; Peterson, et al., 1988; Price et al., 1996; Shaywitz, et al., 1995)-- and as this half of the brain is dominate for emotional language production in infancy, producing, hierarchically, what has been termed "limbic language" (Joseph, 1982) this half of the brain, therefore can also acquire language with massive early left hemisphere injury (Joseph, 1986b; Novelly & Joseph, 1983).

Over the ensuing years, the left hemisphere increasingly established dominance, though the right hemisphere remains dominant in regard to emotional melodic language production and comprehension, and due to the visual-spatial nature of the task, becomes activated while reading, as demonstrated by functional imaging studies (Bottini et al., 2014; Cuenod, et al., 1995; Price et al., 2016) and when engage in interpreting the figurative aspects of language (Bottini et al., 2014).

As the neocortex of the left hemisphere matures, is begins to stamp temporal sequences onto the melodic emotional patterns of right hemisphere/limbic speech, thus producing left hemisphere speech. However, this is a prolonged process, such that for most of the first year, language is limbic (see chapter 15). It is only near the end of the first year that the neocortex begins to hierarchically gain control, as evident by the development of jargon babbling and then the production of the first words. In fact, the pattern of neurological activity during the performance of language tasks, does not begin to resemble the adult pattern until the onset of puberty (Hollcomb et al., 1992).


Broadly considered, there are three maturational stages of verbal development that correspond to the acquisition and development of language and thought (Joseph 1982, 2017). Initially, linguistic expression in the infant is reflexive and/or indicative of generalized and diffuse feelings states. Vocalizations are largely emotional-prosodic in quality, and mediated by limbic and brainstem nuclei (Chapter 15).

At approximately 3-4 months of age the infants utterances begin to assume meaningful as well as semi-imitative qualities, and are indicative of specific feelings states, and begin to become influenced by both the right and left cerebral hemisphere. It is during this time period that a second babbling stage develops and the childs prosodic utterance begin to assume temporal-sequential characteristics. That is the left hemisphere begins to provide rhythm and specification to the melody and associated feeling states expressed by the right hemisphere and limbic system. From this point on true language begins to develop.

However, it is not until a third stage of linguistic functioning makes it appearance that the child begins to not only speak in words, but to think them out-loud. This final stage coincides with the expression and development of ego-centric speech.


Initially and for the first few days after birth most behavior is initially mediated by limbic, brainstem, and spinal nuclei (Chugani, et al. 2007; Gibson, 2011; Joseph, 1982, Milner, 1967). For example, PET scan studies of glucose utilization in the newborn, indicates high levels of brainstem but very low levels of neocortical activity (Chugani, 2014; Chugani, et al. 2007). It is not until about one year of age that infant neocortical glucose activity begins to significantly increase (Chugani, et al. 2007) and not until ages 4-10 that the sensory and association cortical layers begin to become increasingly myelinated (Gibson, 2011; Lecours, 1975).

Therefore, because of neocortical immaturity the psychic functioning of the newborn is probably no more than a vague, somewhat undifferentiated awareness; consisting of a multitude of excitatory and inhibitory neuronal interactions and a series of transient feeling states and emotional upheavals which correspond to the activation of specific and related subcortical and limbic structures (chapters 23-25).

The neonate is essentially internally oriented, its psychic attentional functions almost entirely directed to stimuli impinging on the body-surface and sensations transmitted by the mouth (Milner, 1967). That is, although the newborn can cry and scream, turn his or her head to sounds, and within a few weeks can imitate facial expressions, reach for objects, and show defensive reactions (Gibson, 2011; Meltzoff 2010), these behaviors are under the control of the brainstem, limbic system and basal ganglia.

Cognition, therefore, consists largely of generalized and diffuse feeling states which are aimed at the alleviation of displeasure or painful affect and with the reactivation of experiences associated with pleasurable sensations (reviewed in chapter 13). In fact, from birth to 1 month, the infant displays only two attitudes, accepting and rejecting, and a very limited range of vocalization: crying and cooing (McGraw, 1969; Milner, 1967; Spitz & Wolf, 1946). These feeling states and vocalizations are largely mediated and expressed by the hypothalamus of the limbic system (Joseph, 1992a).

Indeed, as noted, the original impetus to speak springs forth from roots buried within the depths of the ancient limbic lobe and is bound with and tied to mood, impulse, feeling, desire, pleasure, pain, and fear. The infant cries, coes and produces various prosodic inflectional variations which are without temporal-sequential organization and which serves only to communicate diffuse feelings. It is only over the course of the first few months that these prosodic-melodic utterances become associated with specific moods and emotions (Joseph 1982, Piaget, 1952). It is at this time that babbling makes it appearance (Brain & Walton, 1969).


By 2-3 months of age amygdala-brainstem pyramidal fibers as well as corticospinal axons have already begun to myelinate. These maturational events coincide with an initial shift in the emotional utterances of the infant which become progressively complex and prosodic and increasingly subject to sequencing and segmentation. The infant begins to "coo," "goo," in a repetitive fashion that has been referred to as "early babbling." This early babbling stage generally involves the repetition of pleasant friction and voicing sounds which tend to be produced while making face-to-face and eye-to-eye contact and while engaged in social interaction (Kent & Miolo, 1995), which in turn implicates the amygdala (see chapter 15). Moreover, whereas the expression of pleasant sounds are in the ascendant, crying tends to become less frequent but more variable in tone, and can be differentiated into requests, calls, and sounds of discomfort (D'Odorico, 1984; Wolff, 1969). This indicates that the infant's behavior is less reflexive, and is increasingly under the control of the rapidly maturing limbic system, the amygdala in particular (see chapter 15).

As the amygdala and other limbic forebrain structures mature, and the larynx begins to assume an adult pattern of orientation, the infant not only babbles but vocalizes a variety of sounds which increasingly convey probable meanings which may signify to the listener a variety of diffuse feelings and needs (D'Odorico, 1984; Wolff, 1969). Infants will in fact produce different noncry vocalizations depending on context and in reaction to people vs objects (Fernald, 1992; Hauser, 2009). For example, if the 4 month old infant coos and babbles "mama," (depending on context, facial expression, and prosody/fundamental frequency) the mother may interpret this to mean: "mama come here," "mama I hurt," "mama I thirst," etc. (e.g., D'Odorico, 1984; Fernald, 1992; Joseph 1982; Piaget, 1952; Vygotsky, 1962; Wolff, 1969). Hence, although the infant's utterances are not referential and may at times represent little more than the random universal babbling produced by all infants, they can also convey meaning and serve as a means of communicating with the primary caretaker (see Fernald, 1992; Hauser, 2009, for related discussion).

Early babbling, in part is associated with the maturation of the amygdala, a structure which can trigger lip smacking, rhythmic jaw movement, and (fear-induced) babbling and manidibular-teeth "chattering" (see chapter 15). Likewise, early babbling may be produced by reflexive jaw movement (e.g, MacNeilage & Davis, 2010; Moore & Ruark, 1996; Weiss, 1951) and lip smacking. Hence, early babbling may reflect immature amygdala (as well as amygdala-striatal and motor neocortical) influences on the brainstem and periaqueductal gray which reflexively triggers the oral musculature thereby inducing rhythmic movement of the jaw.

"Early" babbling is soon replaced by "late" babbling which has its onset around 4 months of age (de Boysson-Bardies, Bacri, Sagart, & Poizat, 2013; Oller, 1980; Oller & Lunch, 1992). Late babbling is sometimes described as "repetitive babbling" (Mitchell & Kent, 2010), and at later stages of development may include the repetitive production of CV syllables in which the same consonant is repeated, such as "dadada." The progressive development of "late babbling" in turn is associated with the progressive maturation of the anterior cingulate (see chapter 15). In fact, electrical stimulation in the cingulate and surrounding medial frontal tissues can trigger the repetitive babbling repetition of certain words and sounds, such as "dadadada" (Dimmer & Luders, 1995; Penfield & Welch, 1951).


By time the infant has reached 4 months of age a second babbing stage develops, i.e. "late babbling". Late babbling hearlds the first real shift from emotional-prosodic-melodic speech to what will become, after around on year of age, temporal-sequential language (Joseph 1982; Leopold, 1947); i.e. left hemisphere speech.

The development of late babbling occurs in conjunction with the infant's increased ability to produce sophisticated social-emotional nuances, and appears to be associated with increasing cingulate (as well as amygdala) maturational influences. For example, around 4-months, the infant's intonational-melodic vocal repertoire becomes more elaborate and tied to a variety of specific feeling states (Piaget, 1952); which may reflect increasing amygdala maturational dominance.

However, over the ensuing months vocalizations also begin to assume an imitative quality (Nakazima, 1980) which are often context specific but which do not necessarily reflect the infant's internal state. Some vocalizations are produced in mimicry and in play (Piaget, 1952). The late babbling stage has also been repeatedly described as a form of "sound play;" an activity which increasingly contributes to phonetic development (de Boysson-Bardies et al. 2013; Ferguson & Macken, 2013). As the cingulate is associated with mimicry and the onset of play behavior (chapter 15) and since the production of these sounds do not necessarily reflect the infant's true emotional state, the cingulate, therefore, is implicated all aspects of the late babbling stage.

Repetitive, late babbling increases in frequency until around the seventh to tenth month of postnatal development (de Boysson-Bardies et al. 2013; Ferguson & Macken, 2013; Nakazima, 1980; Oller, 1980; Oller & Lunch, 1992), at which point the tendency to produce phonetically varied multisyllables becomes dominant. Thus the late babbling stage comes to be largely replaced by what has been termed "variagated" or "canonical" babbling (Oller, 1980; Oller & Lunch, 1992) which in turn is followed by "jargon" babbling (around 12 months).

The develpoment of jargon babbling appears to correspond to maturational events taking place in the motor areas of the neocortex and may represent increasing pyramidal influences on the brainstem and oral-laryngeal musculature. In fact, jargon babbling appears to be a function of the immature somatomotor areas slowly gaining control over the limbic system, midbrain inferior-colliculus, and periaqueductal gray (see also Herschkowitz et al. 2009).

For example, pyramidal fibers from the somatomotor neocortex to the brainstem become increasingly well myelinated from 4 to 12 months of age (Debakan, 1970; Yakovlev & Lecours, 1967). Likewise, the somatomotor areas of the neocortex begin to rapidly mature around the first postnatal year (Brody et al. 2007; Chi, Dooling, & Gilles, 1977; Gilles et al. 1983; Scheibel, 2011, 1993). Hence, the neocortex likely increasingly contributes to babbling behavior, especially around one year of age.

Moreover, just as the pyramidal/corticospinal tracts as well as the somatomotor areas continue to mature and myelinate over the first and second years (Conel, 1937, 1941; Debakan, 1970; Yakovlev & Lecours, 1967), babbling continues throughout the first and second years. It is during these same time periods in which the child gradually acquires and develops the phonetic structure which underlies speech production (de Boysson-Bardies et al. 2013; Oller, 1980; Oller & Lunch, 1992). This implies considerable forebrain as well as right and left neocortical influences over vocal behavior (see below).

With increasingly neocortical control, what appears to be a "new and unique motor skill" slowly emerges (Moore & Ruark, 1996) which directly contributes to the development of speech. That is, around 1 year of age, and as these limbic-neocortical pathways myelinate and the frontal-temporal lobes begin to mature, the brainstem vocalization centers and limbic receptive and expressive language functions become increasingly subject to neocortical influences and articulatory control. Once the neocortical speech areas begin to establish hierarchical control, and begin to program the oral-laryngeal motor areas, a new form of (neural-muscular) vocalization emerges which appears somewhat distinct from its precursors (e.g. Moore & Ruark, 1996). The infant begins to jargon babble, and they also begin to speak their first words (Capute, Palmer, Shapiro, Wachtel, Schmidt, & Ross, 1986; Nelson, 1973, 2013; Oller, 1980; Oller & Lynch, 1992).


As the neocortex of the left cerebral hemisphere begins to mature, it begins to stamp and impose temporal sequences onto the stress, pitch, and melodic intonational contours which up at until that time have characterized infant speech output (chapter 15). This is part of what the late and especially the jargon babbling stage signifies: the ability to sequence.

That is, syllabication is imposed on the intonational contours of the child's speech by the still immature neocortex of left hemisphere, such that the melodic features of generalized vocal expression come to be punctuated, sequenced, and segmented, and vowel and consonantal elements begin to be produced (Joseph, 1982, 1993; see also Berry, 1969; De Boysson-Bardies, et al. 1980). Left hemisphere speech comes to be superimposed over limbic (and right hemisphere) melodic language output. However, due to the immaturity of the neocortex, the speech produced is "jargon."

Jargon babbling coincides with the production of the first words which are spoken around 11-12 months on average (Capute et al., 1986; Nelson, 1973, 2013; Oller, 1980; Oller & Lynch, 1992). In fact, jargon babbling resembles actual speech, and at a distance it may sound as if the infant is actually conversing and speaking real words, though in fact they are babbling prosodically sophisticated neologistic jargon. In fact, jargon ("conversational") babbling is similar to Wernicke's ("jargon") aphasia which is associated with severe injuries to the temporal-parietal junction (Christman, 2014; Goodglass & Kaplan, 2000; Kertesz, 1983; Marcie & Hecaen, 1979). However, rather than due to brain damage, jargon babbling reflects the extreme immaturity of the neocortical speech areas. Hence, the emergence of the jargon babbling stage signifies an obvious shift in sound production from the limbic system to the still immature neocortex.

In general, "jargon" babbling consists of normal stress and intonation, and is associated with the production of stops, nasals, and CV syllables as well as labial and dental/alveolar consonants (p, t, k, b, d, g, m, n, w, j, h, s), all of which are uttered in a temporal sequential fashion (Locke, 1995; Oller, 1980; Oller & Lynch, 1992). In part, it is the temporal sequential and varying prosodic nature of these utterances which give them their speech-like quality.

Jargon babbling not only resembles normal fluent speech but is often produced as the infant is gazing at or making eye-to-eye contact with the listener. The infant may appear to be engaging in an actual conversation, as if explaining some action, or a desire to direct the other's attention to some object or activity. Thus, jargon babbling often occurs in a social context and could be described as "conversational babbling."

Just as frequently, however, the infant may appear oblivious to any potential listener and may jargon babble while alone and at play, or while gazing at or exploring some object. As these vocalizations appear to be self-directed and are meant for the child's ears alone, they could be described as "egocentric babbling."

As noted, the emergence of jargon babbling is soon followed by the utterance of the infant's first words. However, although from age 1 through 2 vocabulary will expand from one word to over 300, the child will continue to produce "egocentric" and "conversational" jargon babble. In fact, jargon babbling does not completely disappear until well after age 2, and some children may continue to occasionally jargon babble as late as age 3 (Kent & Miolo, 1995; Locke, 1995).

However, at this latter age, although jargon babbling eventually disappears, the egocentric versus social-conversational nature of these vocalizations are retained. That is, as children acquire language, they will produce conversational speech that is directed toward others as well as speech which continues to be meant for their ears alone. It is this latter form of language which Piaget and Vygotsky identified as "egocentric speech," a form of overt thinking; that is, thinking out loud. According to Piaget (1952, 1962, 1974) and Vygotsky (1962), egocentric speech is slowly internalized between the ages of 3 and 5, and eventually becomes completely covert; at which point, the child has not only learned to speak in words, but to think in words as well.


As the left hemisphere matures and wrests control of the peripheral and cranial musculature from subcortical and limbic influences, a distinct form of language emerges, i.e. left hemisphere speech--a type of language which is unique to humans but which initially consists of jargon but which later becomes true speech. In contrast to limbic language, left hemisphere speech is grammatical, temporal-sequential, denotative, consists of word units, and is closely bound with the eventual expression and development of verbal thought. Verbal thinking, however, does not appear until much later in development; an unfolding event which corresponds to the appearance of a yet another form of language which is unique to humans; a self-directed form of language appears to be a form of thinking out loud: egocentric speech.

Egocentric speech is self-directed speech that consists of an explanatory monologue in which children comment on or explain their play and other actions, usually after the action has occurred. That is, the child essentially talks to themselves but in an explanatory fashion.

Egocentric speech is essentially speech for oneself (Joseph, 1982; Piaget, 1962; Vygotsky, 1962). It is a self-directed form of communication which heralds the first attempts at self-explanation via thinking-out-loud. According to Vygotsky (1962), egocentric speech makes its first appearance at approximately 3 years of age. According to Piaget (1952, 1962, 1974), at its peak, egocentric speech comprises almost 40-50% of the preoperational child's language; the remainder consisting of social speech (denotative, interactional, and emotional).

Social speech is produced so as to communicate with others. Ego-centric speech is produced so as to communicate with no one other than the child who produces it.

Prior to the development of ego-centric speech, communication is directed strictly toward outside sources. There is no attempt to verbally communicate with the Self, for there is no internal dialogue. Verbal thought has not yet developed and children do not talk to themselves about their ongoing behavior or feelings.

At around age 3 egocentric speech--the peculiar linguistic structure from which thought will arise--appears in the context of social-denotative vocalizations (Vygotsky, 1962). That is, part of the time the child engages in social speech, whereas the remainder of speech activities are ego-centric and directed and produced for the sole benefit of the child who listens to his speech and external commentary as he or she plays.

While the child is engaging in egocentric speech, he/she does not appear concerned with the listening needs of his/her audience simply because to all appearances his/her words are meant for his ears alone (Piaget, 1952, 1962, 1974; Vygotsky, 1962). The child is essentially thinking out loud in an explanatory fashion, commenting on and describing his or her actions (Joseph, 1982; Vygotsky, 1962).

When engaged in an egocentric monologue, there is no interest in influencing or explaining to others what in fact is being explained. In fact, the child will keep up a running verbal accompaniment to his actions, commenting on his behavior in an explanatory fashion even while alone. Moreover, while engaged in this self-directed external monologue the child appears oblivious to the responses of others to his statements (Piaget, 1952, 1962, 1974; Vygotsky, 1956). It is as if the child has no awareness that others hear him. In fact, many a child has been shocked when he later hears his mother (or a friend) repeat or comment upon something he assumed no one else could hear.

Egocentric speech is not simply talking out loud, but rather tends to be self-explanatory, serving as a form of commentary that is initially produced only after an action has been completed. The child observes what he or she has done and then comments on and/or explains what has taken place.

Egocentric speech presents us with a curious anomaly, for we must accept that the child knows what he has done without commenting or explaining her actions; moreover, she must know why she has performed certain behaviors without the need to explain them to herself. Nevertheless, the fact that she explains and comments upon her behaviors after they occur argues otherwise. Paradoxically, the child acts as both actor and witness, explainer and explained to. Clearly, the child explains his actions to himself (Vygotsky, 1962).


Initially egocentric speech is completely external and after the fact. Presumably it is external and after the fact because the child is incapable of internally generating linguistic thoughts (Piaget, 1952, 1962, 1974; Vygotsky, 1956). Presumably this is due to the slow pace of myelination in the Language Axis and, in particular, the corpus callosum (Joseph 1982). Because of these internal limitations, the child therefore thinks about his or her behavior, out-loud.

It is important to emphasize, however, that egocentric speech is not random or pointless, but is largely explanatory. They are explaining their behavior to themselves. Since they are utilizing words, it is thus apparent that they are explaining their actions to their left hemisphere. Because the explanation occurs, initially, only after the behavior has been completed, suggests that the left hemisphere did not have access to the behavioral plan or the motivation behind it, until after the act was completed; which is then explained as a verbal commentary.

This suggests that the behavior being explained was therefore planned, initiated, or mediated, presumably, by the right hemisphere or limbic system (Joseph 1982). Because of the slow pace of corpus callosum myelination, the left hemisphere of the child's brain cannot gain access to right cerebral impulses, memories, or plans, until after they are expressed and can be observed and thus commented on. However, as the callosum matures, the child's left hemisphere begins to receive earlier or advanced access and thus produces the egocentric commentary earlier and earlier as well.

For example, first a child will paint a picture and then explain it. As she ages she will paint a picture and explain it while she is painting. Finally, she will announced what she is going to paint, and then paints it (Vygotsky, 1962).

Hence, as the child grows older their comments and explanations occur earlier in the sequence of expression, until finally the child begins to explain his actions before they are performed instead of after they have occurred. Essentially, as the child ages she appears to receive advanced warning of her intentions and actions, until finally this information is available before rather than after she acts (Joseph, 1982). At this later stage, however, egocentric speech has been greatly internalized as verbal thought..

According to Vygotsky (1962), after its initial appearance and elaboration, egocentric speech also begins to occur internally and in fact becomes progressively more covert as the child grows older.

At its overt maximum, when it appears to be fully developed (comprising by age 4 almost 50% of the child's speech), its traits and structures are simultaneously being internalized and strengthened and comprise a greater portion of the child's cognitive activities than may be witnessed (like an iceberg). That is, egocentric speech never disappears but becomes completely internalized in the form of inner speech, i.e. thought. The child has now learned to think words as well as to speak them; and to think them in a temporal and organized sequence which retains its original and primary function--self communication.


The essential feature of the external components of egocentric speech is that it is based on stimuli and actions which occur outside the child's immediate sphere of understanding and experience; at least insofar as the language dependent left hemisphere is concerned. In this regard, children, when they "misbehave", are probably sometimes telling the truth when they say they "don't know why" they did such and such. That is, the left hemisphere does not know why.

Although egocentric speech is a self-directed monologue, it is nevertheless a product of the left cerebral hemisphere. That egocentric speech appears initially only after an action has occurred indicates that the left hemisphere of the young child is responding to impulses and actions initiated outside its immediate realm of experiences and comprehension. It seems that the left hemisphere in the production of egocentric speech is not only "thinking out loud", but is attempting to interpret what it observes and experiences externally, thus creating a meaningful explanatory sequence which it then linguistically communicates to itself.

As noted above, the appearance and eventual internalization of egocentric speech occurs in response to several maturational changes in the central nervous system, and parrallels the myelination of the corpus callosum and the increased ability for the cerebral hemispheres to communicate (Joseph, 1982). Hence, as has been demonstrated by a number of independent researchers, communication between the right and left half of the brain is somewhat poor prior to age 3 and remains limited until approximately after age 5 (Deruelle & Schonen, 2011; Finlayson 1975; Gallagher & Joseph 1982; Galin et al. 1977, 1979; Joseph & Gallagher, 1985; Joseph et al. 1984; Kraft et al. 1980; Molfese et al. 1975; O'Leary, 1980; Ramaekers & Njiokiktjien, 2011; Salamy, 1978). Presumably this is a function of the immaturity of the corpus callosal fibers connections between the hemispheres (Yakovlev & Lecours, 1967).

Essentially, egocentric speech appears to be a function of the left hemispheres attempt to organize, interpet, and make sense of behavior inititated by the right hemisphere and limbic system (Joseph, 1982). Presumably, because interhemispheric communication is at best grossly incomplete, the left utilizes language to explain to itself the behavior which it observes itself to be be engaged.

As the commissures mature and information flow within and between the hemispheres increases, the left gaining increased access to these impulses as they are formulated in the non-linguistic portions of the brain, begins to internally linguistically organize what it experiences internally (rather than externally). Essentially, increased commissural transmission allows the left hemisphere access to right hemisphere impulses-to-action before the action occurs rather than forcing it to make sense of the behavior after its completion (which is typical of split-brain patients, see chapter 10).

As noted in Chapter 10, even in the "normal" intact adult, commissural transmission is often incomplete. As such, the adult left hemisphere sometimes finds itself witnessing and participating in behaviors which it did not initiate, and which it does not understand.


The transition from canonical to jargon babbling, the acquisition of vocabulary-rich grammatical language, and the emergence and eventual internalization of egocentric speech, are directly correlated with maturational events occurring within the frontal-temporal speech areas and their intra- and inter-hemispheric pathways, the arcuate fasciculus and corpus callosum (Joseph, 1982; Lecours, 1975).

For example, at age one, the superior temporal lobe and Wernicke's area are exceedingly immature, and have grown in total surface volume to only 55% of the adult temporal lobe (Conel, 1937, 1941; Blinkov & Glezer, 1968). Moreover, in size and growth, the middle temporal lobe, which in adults appears to serve as a word storage area (see above), has only grown to within 19.3% of the adult middle temporal lobe (Blinkov & Glezer, 1968). Likewise, the IPL, which is directly linked with Wernicke's area and acts to provide verbal labels for visual, somesthetic, and auditory events, is relatively unmyelinated (Flechsig, 1901; Yakovlev & Lecours, 1967) and in surface area is less than 40% of the adult IPL (Blinkov & Glezer, 1968).

Wernicke's area (in conjunction with the IPL) organizes and transmits internally generated lingustic impulses, via the arcuate fasciculus, to Broca's speech area for expression (Geschwind, 1965, Joseph, 1988a, 2013). Thus, as demonstrated through functional imaging, the left frontal lobe becomes activated when engaged in speech activities (Passingham, 2009; Peterson et al., 1988, 1989; Price, 2009). However, if injured, Wernicke's area instead transmits abnormal streams of neologistic jargon (Goodglass & Kaplan, 1982; Kertesz, 1983). Indeed, damage to Wernicke's area is typically associated with a condition referred to as "jargon aphasia" such that at a distance it sounds as if the patient may be speaking in a normal fashion, when in fact they are uttering prosodically sophisticated neologistic nonsense. Hence, when the child jargon babbles this can be attributed to Wernicke's immaturity.

By 13-15 months most infants have acquired ten words (and can understand about 110) and some will begin to combine words without pauses thereby producing compressed sentences (Fenson, Dale, Reznick, Thal, Bates, Hartung, Pethick, & Reilly, 1993; Nelson, 1973). These compressed sentences are spoken rapidly, which is similar to the "fluent" speech of a recovering Wernicke's aphasic; a condition also referred to as "fluent aphasia" (Goodglass & Kaplan, 1982; Kertesz, 1983).

By 18 months the infant's vocabulary has grown to between 20-40 words, at which point the ability to acquire new words mushrooms and they will rapidly learn up to 8 or more new words each week (Mervis & Betrand, 1995; Nelson, 1973; Thal et al., 2009). Hence, by 20 months the vocabulary consists of approximately 50 words (and they can understand about 180). Moreover, once the vocabulary expands to 50 items, most children will typically combine these words to form sentences that variably come to represent all the major grammatical classes of adult language (Nelson, 1973). Likewise, by 20 months the surface area of the superior temporal lobe has greatly expanded and is about 65% of that of the adult (Blinkov & Glezer, 1968; Conel, 1937, 1941).

By two years most children have learned 186 to 310 words on average, whereas by 30 months they may have a vocabulary of about 500 items (Fenson et al., 1993; Nelson, 1973). Likewise, at age 2 the superior temporal lobe and Wernicke's area, although still exceedingly immature (Conel, 1941) have expanded and grown to about 80%, whereas at 30 months it has reached about 85% of the adult temporal lobe (Blinkov & Glezer, 1968).

As comprehension, word knowledge, and fluent speech is dependent on Wernicke's area (as well as the frontal and inferior parietal speech areas), the tremendous growth in this region from age 1 (55%) to 30 months (85%) coincides with the rapid advance in word acquisition during this time period.

However, since the middle temporal lobe likely serves as a memory store for words (chapter 21), and as this tissue by ages 2 to 3, has only grown to 35% to 37% of the adult (BLinkov & Glezer, 1968), whereas the IPL is still poorly myelinated and thus exceedingly immature (Flechsig, 1901; Yakovlev & Lecours, 1967) this may explain why at age 2-3 the infant's vocabulary is limited to about 500 words, whereas the lexicon of the adult ranges from 50,000 to 250,000 words (Aitchison, 2007).


Broca's expressive speech area is located along the outer surface of the left frontal lobe, adjacent to the secondary and primary motor area that represents the lips, mouth, jaw, and tongue. Upon receiving linguistic impulses from Wernicke's area (and the IPL), Broca's area acts to program the primary motor areas and the oral-laryngeal pathways in order to produce fluent and grammatical speech. However, as noted, if Wernicke's area were damaged, Broca's area (although uninjured) would instead spout nonsense words: "fluent aphasia" (Goodglass & Kaplan, 2000; Kertesz, 1983).

By contrast, damage to Broca's area results in expressive aphasia, and the patient is able to speak only a few well learned and emotional words (Bastiannse, 1995; Goodglass & Kaplan, 2000; Haarman & Kolk, 2014; Sarno, 2008). Nevertheless, although patients may only be capable of speaking a word or two, emotional-prosodic speech remains relatively intact, and they may be able to swear and sing the words they can no longer say (Joseph, 1988a, 1996b; Yamadori, Osumi, Mashuara, & Okuto, 1977); singing, swearing, and the production emotional words being a function of the right frontal-temporal emotional-melodic speech areas (Gorelick & Ross, 2007; Heilman, Bowers, Speedie, & Coslett, 1975; Joseph, 1988a; Lalande, Braun, Charlebois, & Whitaker, 1992; Shapiro & Danly 1985; Tucker, Watson, & Heilman, 1977; Ross, 1993). Thus, in mild cases of Broca's aphasia, emotional, melodic, and prosodic production may remain somewhat normal; that is, unless the lesion is deep and encroaches on the cingulate in which case prosody becomes decidedly abnormal. With deep frontal lesions or if the cingulate is negatively impacted, patients may sound as if they are speaking with a foreign accent (Graff-Radford, Coper, & Colsher, 1986).

At one year of age, although infants are capable of uttering only a word or two, they are nevertheless quite vocal, and produce and sing prosodically sophisticated nonsense. Correspondingly, at one year the frontal lobes are exceedingly immature and in surface area have grown to only 40% of the adult frontal lobe (Blinkov & Glezer, 1968). However, by age 2 the frontal lobes have nearly doubled in size, growing to within 72% of the adult (Blinkov & Glezer, 1968); a pattern of growth which corresponds to the tremendous increase in the child's expressive vocabulary which consists of 186 to 310 words on average (Fenson et al., 1993; Nelson, 1973).

However, the right frontal lobe, and in fact, the sensory and non-motor regions of the right hemisphere appear to initially mature at a faster rate as compared to corresponding regions in the left hemisphere (Joseph, 1982; Gilles, Leviton, & Dooling, 1983; Scheibel, 2011, 1993; Thatcher, 1992a). This differential growth rate is also reflected in the manner in which children first utilize and express language. That is, the child's first words and first sentences tend to be uttered in a highly prosodic, nonsegmental, holistic fashion, such that there are few or no pauses between words or morphemes (Nelson, 1973, 2013; Peters, 1983). These speech patterns are also stereotypically associated with the right hemisphere (Joseph, 1988a; Novelly & Joseph, 1983).

Similar to those with right hemisphere speech (Joseph, 1986b; 1988a; Novelly & Joseph, 1983), the child's first words and sentences usually consist of normal prosody and melodic intonation coupled with reduced phonemic articulation such that speech appears to be slurred or mumbled. Hence, the emphasis is on intonation and prosody and not form (Nelson, 1973, 2013).

However, as the child approaches 2-years of age, Broca's area has generally caught up with and has overtaken the emotional-melodic speech area in the right (Scheibel, 2011, 1993), and children increasingly employ the left hemisphere during language processing (Mount, Reznick, Kagan, Hiatt, & Szpak, 1989). Hence, vocabulary rapidly increases, articulation improves, sentences become segmental, speech consists of a greater number of nouns (Nelson, 1973, 1975,2013; Peters, 1983) and by 2-years of age children may have a vocabulary of over 200 to 300 words, though they may still jargon babble.

By age 3, however, jargon babbling disappears, vocabulary has increased to over 500 items, whereas the frontal and temporal lobes have respectively grown to 75% to 80% of the adult (Blinkov & Glezer, 1968). As noted, it is at this later age that a new form of language emerges, egocentric speech, which, like egocentric babbling is essentially speech for oneself, a form of thinking out loud. However, between the ages of 5 and 7 ego-centric speech has become almost completely internalized, such that external speech, although social, emotional, and containing limbic elements, is largely dominated by the neocortex and the language axis of the left hemisphere.


Copyright: 1996, 2000, 2010, 2018 - Rhawn Joseph, Ph.D.