SPEAK(I)                     8/15/73                     SPEAK(I)







NAME

     speak - word to voice translator



SYNOPSIS

     speak [ -epsv ] [ vocabulary [ output ] ]



DESCRIPTION

     Speak turns a stream of words into  utterances  and  outputs

     them  to a voice synthesizer, or to a specified output file.

     It  has  facilities  for  maintaining  a   vocabulary.    It

     receives, from the standard input



       -   working lines: text of words separated by blanks

       -   phonetic lines:  strings  of  phonemes  for  one  word

          preceded  and separated by commas.  The phonemes may be

          followed by comma-percent then a `replacement  part'  -

          an  ASCII  string with no spaces.  The phonetic code is

          given in vsp(VII).

       -   empty lines

       -   command  lines:  beginning  with  !.   The   following

          command lines are recognized:



          !r file    replace coded vocabulary from file

          !w file    write coded vocabulary on file

          !p         print parsing for working word

          !l         list  vocabulary  on  standard  output  with

                    phonetics

          !c  word    copy  phonetics  from   working   word   to

                    specified word

          !d         print phonetics for working word



     Each working line replaces its predecessor.  Its first  word

     is  the  `working  word'.   Each  phonetic line replaces the

     phonetics stored for the working  word.   In  particular,  a

     phonetic  line  of  comma  only  deletes  the  entry for the

     working word.  Each working line,  phonetic  line  or  empty

     line  causes  the  working  line to be uttered.  The process

     terminates at the end of input.



     Unknown words are pronounced by rules, and failing that, are

     spelled.   Spelling  is done by taking each character of the

     word, prefixing it with *, and looking it  up.   Unspellable

     words burp.



     Speak is initialized with a coded vocabulary stored in  file

     /usr/lib/speak.m.    The  vocabulary  option  substitutes  a

     different file for /usr/lib/speak.m.



     A set of single letter  options  may  appear  in  any  order

     preceded by -.  Their meanings are:



         -e   suppress English steps (4-8) below

         -p   suppress pronunciation by rule

         -s   suppress spelling

         -v   suppress voice output



     The steps of pronunciation by rule are:



     (1)   If there were no lower case  letters  in  the  working

          line, fold all upper case letters to lower.

     (2)   Fold an initial cap to lower case, and try again.

     (3)   If word has only one letter,  or  has  no  lower  case

          vowels, quit.

     (4)   If there is a final s, strip it.

     (5)   Replace final -ie by -y.

     (6)   If any changes have been made, try whole word again.

     (7)   Locate probable long vowels and capitalize them.  Mark

          probable silent e's.

     (8)   Put back the s stripped in (4), if any.

     (9)   Place # before and after word.

     (10)  Prefix word with %, and look up longest initial  match

          in the stored table of words; if none, quit.

     (11)  Use  phonemes  from  the  stored  phonetic  string  as

          pronunciation,  and  replace  the  matched stuff by the

          replacement part of the phonetic string.

     (12)  If anything remains, go to (10).



     Long vowels are located this way in step (7):



     (1)   A u appearing in context [^aeiou]u[^aeiouwxy][aieouy].

          (The notation is just a regular expression รก la ed(I).)

          (pustUlous)

     (2)   One   of    [aeo]    appearing    in    the    context

          [aeo][^aehiouwxy][ie][aou]    or    in    the   context

          [aeo][^aehiouwxy]ien is assumed long.   The  digram  th

          behaves  as  a  single  letter  in this test.  (rAdium,

          facEtious, quOtient, carpAthian)

     (3)   If the first vowel in the word is i followed by one of

          aou, it is assumed long.  (Iodine, dIameter, trIumph)

     (4)   If the only vowel in the word is final e, the vowel is

          assumed long.  (bE, shE)

     (5)   If the only vowels in the word appear in  the  pattern

          [aeiouy][^aeiouwxy]S, where S is one of the suffixes

                  -al     -le     -re     -y

          then the first vowel is assumed long.  (glObal,  tAble,

          lUcre, lAdy)

     (6)   If no suffix was  found  in  (5),  as  many  of  these

          suffixes  as  possible are isolated from right to left.

          Stripping stops when e has  been  stripped,  nor  is  e

          stripped before a suffix beginning with e.  Each suffix

          is marked by inserting   just before the first  letter,

          or just after e in those suffixes that begin with e.

                  -able   -ably   -e      -ed     -en

                  -er     -ery    -est    -ful    -ly

                  -ing    -less   -ment   -ness   -or

          (care ful ly, maj or, fine ry, state , caree r)

     (7)   If the word, exclusive of suffixes, ends in  i  or  y,

          and  contains  no earlier vowel, then i or y is assumed

          long.  (pY (from pie), crY ing, lIe d)

     (8)   If the first suffix begins with one  of  [aeio],  then

          the  vowel [aeiouy] in an immediately preceding pattern

          [^aeo][aeiouy][^aeiouwxy] is assumed long.  The  digram

          th   behaves   as   a   single  letter  in  this  test.

          (cAre ful ly, bAthe d, mAj or, pOt able, port able)

     (9)   In these exceptional cases no long letter  is  assumed

          in the preceding step:

          (i)   before  g,  if  there  are  any  earlier   vowels

               (postage , stAge , college )

          (ii)  e is not long before l (travele d)

     (10)  If the first suffix begins with one of [aeio], and the

          word  exclusive  of  suffixes ends in [aeiouyAEIOUY]th,

          then digram th is capitalized.  (breaTH ing, blITHe ly)

     (11)  An attempt is made to recognize silent e in the middle

          of  compound words.  Such an e is marked by a following

           , and preceding vowels, other than e, are assumed long

          as  in  step  (8).   Silent  e is marked in the context

          [bdgmnprst][bdgpt]le[^aeioruy ]S, where S is any string

          that  contains  [aeiouy]  but does not contain   or the

          end of the word.   Silent  e  is  also  marked  in  the

          context          [^aeiu][aiou][^aeiouwxy]e[^aeinoruy]S.

          (simple ton, fAce guard, cAve man, cavernous)



FILES

          /usr/lib/speak.m



SEE ALSO

          vs(VII), vs(IV)



DIAGNOSTICS

          `?' for unknown command with !, or  for  unreadable  or

          unwritable vocabulary file



BUGS

          Vocabulary overflow  is  unchecked.   Excessively  long

          words cause dumps.  Space is not reclaimed from deleted

          entries.