The Mental Illness in These Two Peoples Are Caused By Psychotropic Drugs.
Monday, December 29, 2014
Wayback Machine
APR OCT AUG
Previous capture 9 Next capture
2002 2003 2009
14 captures
3 Oct 02 - 21 Aug 14
sparklines
Close Help
Understanding the Second-Order Entropies of Voynich Text
by Dennis J. Stallings
May 11, 1998
Abstract
The anomalous second-order entropies of Voynich text are among its most puzzling features. h1-h2, the difference between conditional first- and second order entropies, equals the difference H1-h2, the difference between the first-order absolute entropy and the second- order conditional entropy. h1-h2 or H1-h2 is a theoretically significant number; it denotes the average information carried by the first character in a digraph about the second one. Therefor it was chosen as a simple measure of what is being sought, although the whole entropy profile of text samples was considered.
Tests show that Voynich text does not have its low h2 measures solely because of a repetitious underlying text, that is, one that often repeats the same words and phrases. Tests also show that the low h2 measures are probably not due to an underlying low-entropy natural language. A verbose cipher, one which substitutes several ciphertext characters for one plaintext character, can produce the entropy profile of Voynich text.
Table of Contents
Introduction
Measures of Relative Second-Order Entropy
Entropies of Voynich Texts
Verbose Ciphers
Repetitive Texts
Schizophrenic Language
Low-Entropy Natural Languages
Japanese
Hawaiian
Discussion of Phonemic versus Syllabic Notation
The Size of the Character Set
The Effect of Word Divisions
Redundancy
The Effect of Syllable Divisions
Final Thoughts on Low-Entropy Natural Languages
Suggestions for Further Work
Acknowledgments
References for Electronic Texts
Printed References
Introduction
William Ralph Bennett first applied the entropy concept to the study of the Voynich Manuscript in his Scientific and Engineering Problem Solving with the Computer (Englewood Cliffs: Prentice-Hall, 1976). His book has introduced many people to the VMs.
The repetitive nature of VMs text is obvious to casual examination. Entropy is one possible numerical measure of a text's repetitiousness. The higher the text's repetitiousness, the lower the second-order entropy (information carried in letter pairs). Bennett noted that only some Polynesian languages have second-order entropies as low as VMs text. Typical ciphers do not have a low second-order entropy either.
This paper examines other possible reasons for the low second- order entropy of Voynich texts: a verbose cipher or a repetitious underlying text. It also examines the low-entropy natural languages Hawaiian and Japanese for further insight into that hypothesis.
Measures of Relative Second-Order Entropy
Jacques Guy's MONKEY program was used to calculate second-order entropies. (Note: the bug-free, "sensible" MONKEY on the EVMT Project Home Page was used; the author believes that the version of MONKEY on Garbo as of this writing has bugs.) Note that MONKEY in its present form only takes the first 32,000 characters in a file. Some long texts were divided up into portions so that MONKEY could analyze them separately.
The conditional entropies were used, as is customary on the Voynich E-mail list. Say that H1 is the absolute first-order entropy and H2 is the absolute second-order entropy. Then h1 and h2 are the first- and second-order conditional entropies. h2 = H2-H1, since it is conditional on more than one character. h1 = H1, since it depends on only single characters; thus h1 is really not conditional.
The following measures were considered:
h0: zero-order entropy (log2 of the number of different characters)
h1: first-order conditional or absolute entropy
h2: second-order conditional entropy
h1-h2: difference between conditional first- and second order entropies, which equals the difference -
H1-h2: the difference between the first-order absolute entropy and the second-order conditional entropy.
As will be seen, there is a need here to compare systems with very different numbers of characters, to scale the statistics somehow to the size of the character set. h1-h2 or H1-h2 is a theoretically significant number; it denotes the average information carried by the first character in a digraph about the second one. It is perhaps the best single, simple measure of what is being sought.
The % of the second-order maximum absolute entropy might have been used. One could calculate the % of H2 from the total H2 that could be delivered by each alphabet. Using digraphs with an alphabet of m characters, H2(max) is:
log2(m^2)
and the %H2(max) is:
(H2/log2(m^2))/100
However, the H2(max) depends tremendously on m, the size of the character set chosen. For Voynich text, Currier has 36 characters and Basic Frogguy has 23 characters. Characters that are hardly ever used have little effect on h1 and h2, but could make a tremendous difference in H2(max). Therefore, this measure was not used.
To start the discussion, here are some data from the English King James Bible:
Table 1:
English King James Bible - 1 Kings
Passage Beginning at
# ch.
File Size
h0
h1
h2
h1-h2
1:1
27
32000
4.755
4.022
3.068
0.953
8:19
27
32000
4.755
4.028
3.090
0.939
15:27
27
32000
4.755
3.998
3.092
0.906
Average of three
27
96000
4.755
4.016
3.083
0.933
The h1-h2 range for different portions of the same text is 0.906-0.953.
And here are data on the corresponding portions of the Latin Vulgate Bible:
Table 2:
Latin Vulgate Bible - 1 Kings
Passage Beginning at
# ch.
File Size
h0
h1
h2
h1-h2
1:1
24
32000
4.585
4.002
3.309
0.692
8:19
24
32000
4.585
3.994
3.287
0.707
15:27
24
32000
4.585
4.005
3.304
0.700
Average of three
24
96000
4.585
4.000
3.300
0.700
The average h1-h2 is 0.700, compared to 0.933 for the English text. This is undoubtedly due to the fact that English uses more combinations of two or more letters to represent single phonemes than Latin does. The range of h1-h2 for the Latin text is 0.692-0.707, narrower than for the English text.
The next table shows the h1-h2 statistic for assorted files in various languages and notations. This shows how the h1-h2 statistic sometimes shows unexpected information. For instance, Hawaiian and Japanese have low h2 values, approaching Voynich text, in phonemic notation. However, the h1-h2 values for Hawaiian and Japanese are far less than Voynich text.
Table 3:
h1-h2 Statistics for Selected Texts
File
# ch.
File Size
h0
h1
h2
h1-h2
Latin - Vulgate Bible, 1 Kings, first 32K
24
32000
4.585
4.002
3.309
0.692
Hawaiian (Bennett, limited phonemic)
13
15000
3.700
3.200
2.454
0.746
Hawaiian newspaper (full phonemic)
19
13473
4.248
3.575
2.650
0.925
English - King James Bible - Genesis, first 32K
27
32000
4.755
3.969
3.020
0.949
Japanese Tale of Genji - Section 1 (romaji)
22
32000
4.459
3.763
2.677
1.086
Japanese Tale of Genji - Section 1 (kana)
71
20622
6.150
4.764
3.393
1.370
Voynich Herbal-B (Currier)
34
13858
5.087
3.796
2.267
1.529
Voynich Herbal-B (EVA)
21
16061
4.392
3.859
2.081
1.778
Entropies of Voynich Texts
Here are entropy results for Voynich texts, a sample of Herbal-A and Herbal-B. The Herbal-A sample's h1-h2 ranges 1.479-1.945, depending on which transcription alphabet is used. The Herbal-B sample's h1-h2 ranges 1.529-1.897. All these are far greater than the 0.93 for English and 0.70 for Latin.
The choice of transcription alphabet also makes an enormous difference. From Currier to Frogguy the range of h1-h2 is 1.5-1.9. The direction is what one would expect. Currier is the most synthetic, while Frogguy is the most analytical, decomposing single Currier characters into several Frogguy characters. Thus Currier Q = Frogguy cqpt.
Table 4:
Voynich Texts
Type of Voynich Text
Transcription Alphabet
# ch.
File Size
h0
h1
h2
h1-h2
Herbal-A
Currier
33
9804
5.044
3.792
2.313
1.479
Herbal-A
FSG
24
10074
4.585
3.801
2.286
1.515
Herbal-A
EVA
21
12218
4.392
3.802
1.990
1.812
Herbal-A
Frogguy
21
13479
4.392
3.826
1.882
1.945
Herbal-B
Currier
34
13858
5.087
3.796
2.267
1.529
Herbal-B
FSG
24
14203
4.585
3.804
2.244
1.560
Herbal-B
EVA
21
16061
4.392
3.859
2.081
1.778
Herbal-B
Frogguy
21
17909
4.392
3.846
1.949
1.897
The samples of Voynich text are relatively small. The following statistics of samples of a single known Latin text gives some idea of how much difference this might make.
Table 5:
Texts from Latin Vulgate Bible, 1 Kings, For Study of Effect of Sample Size on Entropy Data. Passages All Begin at 1:1
Passage Ending at
# ch.
File Size
h0
h1
h2
h1-h2
2:18
23
8929
4.524
3.994
3.263
0.731
4:21
24
18623
4.585
3.995
3.298
0.697
7:17
24
29647
4.585
4.003
3.309
0.694
It is doubtful whether h1-h2 or any other single measure can tell us all we want. However, the representation system is probably the heart of the issue. The following discussion of verbose ciphers is a case in point.
Verbose Ciphers
A verbose cipher, one that substitutes several ciphertext characters for one plaintext character, can produce the entropy profile seen with Voynich text. Such a system is Cat Latin C, which is to be applied to Latin plaintext. Vowels and consonants were added roughly in proportion to their occurence in Latin. This keeps the h1 roughly the same as with Latin and Voynich FSG. The repeated digraphs are what reduce h2 to where it is desired. If q is followed by u, it is as with normal Latin; otherwise it fits one of the consonant patterns. So this scheme is unambiguous. This scheme does produce VMs-like entropies!
This table shows the Cat Latin verbose cipher:
Table 6:
Cat Latin C
Plaintext Ciphertext
a a
b bqbababa
c c
d dqdede
e e
f fqfififi
g gqgogogo
h h
i i
j jqjajaja
k k
m mqmememe
n nqninini
o o
p pqpopopo
qu qu
r rqrarara
s sqsesese
t tqtititi
u u
v v
w w
x xqxoxoxo
y y
z zqzazaza
For comparison here are VMs results in FSG, since the size of that character set is closest to Latin.
Table 7:
Verbose Cipher Compared to Voynich Text
File
# ch.
File Size
h0
h1
h2
h1-h2
Voynich Herbal-A (FSG)
24
10074
4.585
3.801
2.286
1.515
Voynich Herbal-B (FSG)
24
14203
4.585
3.804
2.244
1.560
Latin Vulgate, 1 Kings, 1:1 - 2:11
23
8232
4.524
3.996
3.262
0.734
Above passage, Cat Latin C
23
28754
4.524
3.873
2.278
1.595
However, it's clear that this is not the same pattern as Voynich text. It might be best to look for patterns subjectively. Here are some text samples.
The start of the Voynich Herbal-A sample file (f29v, lines 1- 9), in EVA:
kshol qoocph shor pshocph shepchy qoty dy shory
ykcholy qoty chy dy qokchol chor tchy qokchody cheor o
chor chol chy choiin
tshoiin cheor chor o chty qotol sheol shor daiin qoty
otol chol daiin chkaiin shoiin qotchey qotshey daiiin
daiin chkaiin
pchol oiir chol tsho daiin sho teo chy chtshy dair am
okain chan chain cthor dain yk chy daiin cthol
sot chear chl s choly dar
The beginning of a Hawaiian sample file, from a Hawaiian newspaper, to be discussed later:
kepakemapa mei puke kepakemapa mei mahalo 'ia ka 'Olelo hawai'i e nA mAka' na ho'Olanani kim ma ka lA o malaki ua noa ka pAka 'o kapi'olani no ke anaina na lAkou ke kuleana 'o ka mAlama 'ana ma ka 'Olelo 'ana aku i ka 'Olelo hawai'i ma laila nO i 'Akoakoa ai ka po'e haumAna ka po'e kumu ka po'e mAkua a me ka po'e hoa o kElA 'ano kEia 'ano o ka 'Olelo hawai'i a ma laila nO ho'i i launa ai ka po'e ma o ka 'Olelo hawai'i kapa 'ia kEia lA hoihoi 'o ka lA 'ohana
Finally, the beginning of the Latin Vulgate 1 Kings in Cat Latin C:
etqtititi rqrararaexqxoxoxo dqdedeavidqdede sqseseseenqnininiuerqrararaatqtititi habqbababaebqbababaatqtititique aetqtititiatqtititiisqsesese pqpopopolurqrararaimqmememeosqsesese dqdedeiesqsesese cumqmememeque opqpopopoerqrararairqrararaetqtititiurqrarara vesqsesesetqtititiibqbababausqsesese nqnininionqninini calefqfififiiebqbababaatqtititi dqdedeixqxoxoxoerqrararaunqnininitqtititi erqrararagqgogogoo ei sqseseseerqrararavi ...
Look at these samples and think about the kind of repetition involved in each case! The "Cat Latin C" verbose cipher is clearly not the same thing as Voynichese.
Here are the entropy values for these samples:
Table 8:
Statistics on Text Samples
File
# ch.
File Size
h0
h1
h2
h1-h2
Voynich Herbal-A (EVA)
21
12218
4.392
3.802
1.990
1.812
Hawaiian newspaper (full phonemic)
19
13473
4.248
3.575
2.650
0.925
Latin Vulgate, 1 Kings, 1:1 - 2:11, Cat Latin C
23
28754
4.524
3.873
2.278
1.595
The author's personal opinion is that the rigid internal structure of Voynich text accounts for the low h2 measures. The majority of Voynich "words" follow a paradigm. Robert Firth (Work Note #24) and Jorge Stolfi (Voynich Page) both have identified paradigms. Captain Prescott Currier (Currier's Papers ) identified several other kinds of internal structure in Voynich text.
Repetitive Texts
From time to time, some have suggested that the Voynich Manuscript is simply a very repetitious text. Here is a magical spell in medieval High German that is repetitious:
eiris sazun idisi sazun her duoder
suma hapt heptidun suma heri lezidun
suma clubodun umbi cuoniouuidi
insprinc haptbandun inuar uigandun
phol ende uuodan uuorun zi holza
du uuart demo balderes uolon sin uuoz birenkit
thu biguol en sinthgunt sunna era suister
thu biguol en friia uolla era suister
thu biguol en uuodan so he uuola conda
sose benrenki sose bluotrenki
sose lidirenki
ben zi bena bluot zi bluoda
lid zi geliden sose gelimida sin
Merseburger Zaubersprüche (Magic Spells from Merseburg) in Old High German. Note: 'uu' = 'w'.
An experiment to test this idea is to take samples of known repetitious texts (food recipes, religious texts, catalogs) and compare their second-order entropies with those of known texts that should be less repetitious (prose fiction, essays).
Note that some long texts were larger than MONKEY's 32,000 character limit; in those cases MONKEY just took the first 32,000 characters. Some long texts were divided up into separate portions that MONKEY could analyze.
Jacobean English. Ever since its publication, many commentators have noted how repetitious the Book of Mormon is. The Bible itself is, of course, somewhat repetitious. A (relatively) non-repetitious text in Jacobean English is the Essays of Sir Francis Bacon.
The Book of Mormon appears to be the most repetitious. h1- h2 for the Book of Mormon excerpts range 0.931-0.980. The King James Bible is next, 0.904-0.983. The non-repetitious Essays of Francis Bacon have 0.827-0.837. Taking averages, the difference for h1-h2 between the most repetitious text and the least is 0.951 versus 0.831, a difference of 0.120.
Table 9:
Elizabethan English Texts of Varying Repetition
File
# ch.
File Size
h0
h1
h2
h1-h2
Book of Mormon - 1 Nephi
27
32000
4.755
4.033
3.090
0.942
Book of Mormon - Alma
27
32000
4.755
4.041
3.109
0.931
Book of Mormon - Ether
27
32000
4.755
4.009
3.029
0.980
King James Bible - Genesis
27
32000
4.755
3.969
3.020
0.949
King James Bible -Joshua
27
32000
4.755
4.012
3.029
0.983
King James Bible -Acts
27
32000
4.755
4.041
3.137
0.904
Francis Bacon's Essays, Part 1
27
32000
4.755
4.048
3.220
0.827
Francis Bacon's Essays, Part 2
27
32000
4.755
4.042
3.214
0.828
Francis Bacon's Essays, Part 3
27
32000
4.755
4.066
3.229
0.837
Latin (Late Classical). Samples of the Vulgate Bible and Boethius' Consolations of Philosophy were analyzed. There is little difference in the statistics between the Vulgate Bible and the presumably less repetitious Consolatio Philosophiae.
Table 10:
Latin Texts of Varying Repetition
File
# ch.
File Size
h0
h1
h2
h1-h2
1 Kings, Vulgate, 1:1
24
32000
4.585
4.002
3.309
0.692
1 Kings, Vulgate, 8:19
24
32000
4.585
3.994
3.287
0.707
1 Kings, Vulgate, 15:27
24
32000
4.585
4.005
3.304
0.700
Boethius - Consolatio Philosophiae - Books 3 & 4
25
32000
4.644
3.971
3.272
0.699
Modern English. Repetitive texts: food recipes (chicken and Cajun), a catalog of technical standards, and a Roman Catholic litany. For a non-repetitious text: a short story, "The Blue Hotel" by Stephen Crane.
The non-repetitious short story "The Blue Hotel" has an h1-h2 of 0.826, while the repetitious Roman Catholic Litany has an h1-h2 of 0.968. The difference is 0.968 - 0.826 = 0.142. The other texts mostly fall in between, although the presumably repetitious Cajun recipe has an h1-h2 of 0.827, almost identical to the short story.
Table 11:
Modern English Texts of Varying Repetition
File
# ch.
File Size
h0
h1
h2
h1-h2
Modern English - Roman Catholic litany
26
9492
4.700
4.071
3.103
0.968
Modern English - ISO 14000 catalog
27
6696
4.755
4.076
3.137
0.939
Modern English - The Blue Hotel by Stephen Crane (short story)
27
32000
4.755
4.073
3.247
0.826
Modern English - Cajun recipe
27
27363
4.755
4.124
3.297
0.827
Modern English- Chicken recipe
27
18461
4.755
4.131
3.193
0.938
For comparison, here are data for Voynich texts in FSG, which has the character set closest in size to the ordinary Latin alphabet.
Table 12:
Voynich Texts in FSG
Type of Voynich Text
Transcription Alphabet
# ch.
File Size
h0
h1
h2
h1-h2
Herbal-A
FSG
24
10074
4.585
3.801
2.286
1.515
Herbal-B
FSG
24
14203
4.585
3.804
2.244
1.560
When one compares the h1-h2 values of Voynich text with the differences due to repetition in English texts (0.968 - 0.826 = 0.142 for modern English and 0.951 - 0.831 = 0.120 for Jacobean English) with the h1- h2 values for Voynich text (1.515 or 1.560), it becomes clear that repetitious underlying format or subject matter could not change a text in a normal European language to a Voynich text! Thus, Voynich text does clearly not have its low h2 measures solely because of a repetitious underlying text, that is, one that often repeats the same words and phrases.
Schizophrenic Language
In an important paper that discusses the Voynich Manuscript, Professor Sergio Toresella says that the VMs author had a psychiatric disturbance. In one of the works cited by Toresella in this connection, Creativity by Silvano Arieti, Arieti talks about the distorted language of schizophrenics but not other language phenomena.
At the Kooks Museum, there is a sample of schizophrenic language. In the Schizophrenic Wing, there is a transcript of flyers by Francis E. Dec, containing two Rants:
Kooks Museum
Francis E. Dec, Esquire
Transcripts of flyers
Here is an excerpt from Rant #2:
"Computer God computerized brain thinking sealed robot operating arm surgery cabinet machine removal of most of the frontal command lobe of the brain, gradually, during lifetime and overnight in all insane asylums after Computer God kosher bosher one month probation period creating helpless, hopeless Computer God Frankenstein Earphone Radio parroting puppet brainless slaves, resulting in millions of hopeless helpless homeless derelicts in all Jerusalem, U.S.A. cities and Soviet slave work camps. Not only the hangman rope deadly gangster parroting puppet scum-on-top know this top medical secret, even worse, deadly gangster Jew disease from deaf Ronnie Reagan to U.S.S.R. Gorbachev know this oy vay Computer God Containment Policy top secret. Eventual brain lobotomization of the entire world population for the Worldwide Deadly Gangster Communist Computer God overall plan, an ideal worldwide population of light-skinned, low hopeless and helpless Jew-mulattos, the communist black wave of the future."
The samples and discussion of schizophrenic talk in Arieti resemble Francis Dec's, in repeated but disconnected ideas, alliteration, etc.
MONKEY was run on the two Rants and the results were compared with examples of normal English text:
Table 13:
Schizophrenic Rant Compared to Other English Texts
File
# ch.
File Size
h0
h1
h2
h1-h2
Schizophrenic rant
27
12967
4.755
4.182
3.428
0.755
King James Bible - Genesis
27
32000
4.755
3.969
3.020
0.949
Francis Bacon's Essays, Part 1
27
32000
4.755
4.048
3.220
0.827
Modern English - Roman Catholic litany
26
9492
4.700
4.071
3.103
0.968
Modern English - The Blue Hotel by Stephen Crane (short story)
27
32000
4.755
4.073
3.247
0.826
The second-order entropy of the schizophrenic rants is definitely higher, and h1-h2 lower, than any of the ordinary texts. As with the repetitive texts, the nature of the text itself would not by itself explain the puzzling nature of VMs text.
Low-Entropy Natural Languages
One may write Japanese in Latin characters (romaji) or in syllabic scripts (hiragana and katakana, the kana). In romaji Japanese is a low-entropy language because of a relatively low phonemic inventory and severe phonotactic constraints. A Japanese syllable may begin in zero or one consonant (counting ts, ry, and ky as one consonant), have one vowel, and end with nothing or -n (although the following syllable's consonant may be doubled). (There are at least some long and short vowels in Japanese, which complicates this a little.)
However, the very fact of these severe phonotactic constraints makes only a limited number of syllables possible in Japanese and therefore makes a syllabic script such as kana feasible. One would expect Japanese in kana to have a higher relative h2 (lower h1- h2) than Japanese in romaji.
Hawaiian has even more severe phonotactic constraints, and thus one ought to be able to write Hawaiian in a syllabic script. In Hawaiian a syllable may begin in zero or one consonant, have only one vowel, and may only end in nothing! Hawaiian has a much more limited phonemic inventory than Japanese. Hawaiian is especially significant because Bennett compared Voynichese to Hawaiian and noted that they had similar second-order entropies. Bennett said that some Polynesian languages are the only natural languages with second-order entropies as low as Voynichese.
Therefore, in order to gain insight on these issues, Hawaiian and Japanese are compared in syllabic as well as phonemic notation.
Japanese
The classic Japanese novel Tale of Genji is written almost entirely in kana. Gabriel Landini kindly adapted this both into romaji and into a kana notation that MONKEY could analyze.
Table 14:
Entropies of Japanese in Romaji and Kana
File
Orthography
# ch.
File Size
h0
h1
h2
h1-h2
Tale of Genji - Section 1
Romaji
22
32000
4.459
3.763
2.677
1.086
Tale of Genji - Section 2
Romaji
20
31505
4.322
3.751
2.627
1.124
Tale of Genji - Section 3
Romaji
20
29474
4.322
3.749
2.639
1.110
Tale of Genji - Section 4
Romaji
20
32000
4.322
3.750
2.641
1.109
Tale of Genji - Section 5
Romaji
20
27064
4.322
3.744
2.630
1.114
Tale of Genji - Overall
Romaji
22
152043
4.459
3.751
2.643
1.108
Tale of Genji - Section 1
Kana
71
20622
6.150
4.764
3.393
1.370
Tale of Genji - Section 2
Kana
71
20622
6.150
4.764
3.393
1.370
Tale of Genji - Section 3
Kana
70
18574
6.129
4.709
3.410
1.298
Tale of Genji - Section 4
Kana
70
20386
6.129
4.716
3.464
1.252
Tale of Genji - Section 5
Kana
70
17096
6.129
4.698
3.362
1.337
Tale of Genji - Overall
Kana
71
97300
6.150
4.730
3.404
1.326
As one would expect, the absolute h0, h1, and h2 numbers for kana are much higher than those for romaji. However, the differences for h1-h2 are consistently higher for kana, which one would not expect.
Hawaiian
Bennett did his Hawaiian study with a limited Hawaiian orthography that did not recognize vowel length or the glottal stop. Therefore, statistics were run both on Hawaiian in limited phonemic and syllabic spellings, with long/short vowels not separated and glottal stop not indicated, and in full phonemic and syllabic notation.
Hawaiian has the following phonemes:
Consonants: h k l m n p w '(glottal stop)
Vowels: a e i o u A E I O U (cap's means long)
Bennett used a "lossy" Hawaiian orthography that did not distinguish the long vowels and did not write the glottal stop (call this Hawaiian limited phonemic). He also had his own Voynich transcription alphabet. Finally, he only compared the absolute h2 values and not relative measures such as h1-h2. It's as good as any an illustration of the problems here.
Here is a sample of the Hawaiian newspaper text used in this paper for statistics in Bennett's notation:
ma ka la o malaki ua noa ka paka o kapiolani no ke anaina na lakou ke kuleana o ka malama ana ma ka olelo ana aku i ka olelo hawaii ma laila no i Akoakoa ai ka poe haumana ka
And here is the same text in full phonemic notation:
ma ka lA o malaki ua noa ka pAka 'o kapi'olani no ke anaina na lAkou ke kuleana 'o ka mAlama 'ana ma ka 'Olelo 'ana aku i ka 'Olelo hawai'i ma laila nO i 'Akoakoa ai ka po'e haumAna ka
Here are the entropy values.
Table 15:
Entropies of Hawaiian Texts in Different Orthographies
File
Orthography
# ch.
File Size
h0
h1
h2
h1-h2
Hawaiian (Bennett)
limited phonemic
13
15000
3.700
3.200
2.454
0.746
Hawaiian newspaper
limited phonemic
13
13097
3.700
3.224
2.437
0.787
Hawaiian newspaper
limited syllabic
39
9533
5.285
3.816
2.929
0.887
Hawaiian newspaper
full phonemic
19
13473
4.248
3.575
2.650
0.925
Hawaiian newspaper
full syllabic
77
9160
6.267
4.361
3.162
1.200
And here are data for Bennett's and this paper's Voynich texts for comparison:
Table 16:
Voynich Texts for Comparison with Hawaiian
Type of Voynich Text
Transcription Alphabet
# ch.
File Size
h0
h1
h2
h1-h2
Voynich (Bennett)
Bennett
21
10000
4.392
3.660
2.220
1.440
Herbal-A
Currier
33
9804
5.044
3.792
2.313
1.479
Herbal-A
FSG
24
10074
4.585
3.801
2.286
1.515
Herbal-A
EVA
21
12218
4.392
3.802
1.990
1.812
Herbal-A
Frogguy
21
13479
4.392
3.826
1.882
1.945
Herbal-B
Currier
34
13858
5.087
3.796
2.267
1.529
Herbal-B
FSG
24
14203
4.585
3.804
2.244
1.560
Herbal-B
EVA
21
16061
4.392
3.859
2.081
1.778
Herbal-B
Frogguy
21
17909
4.392
3.846
1.949
1.897
Bennett compared his Voynich text in a 21-character transcription to Hawaiian in a 13-character orthography (including the space character). He got h2 values of 2.220 for Voynich text and 2.454 for his Hawaiian text. However, a sample of Hawaiian text in a full phonemic orthography, with 19 characters including spaces, has h2 of 2.650, even higher. A comparison of h1-h2 values shows a dramatic difference between Hawaiian and Japanese on one hand and Voynichese on the other. h1-h2 equals 1.8 for Voynichese in EVA. h1-h2 is 0.746 for Bennett's Hawaiian data, 0.925 for Hawaiian in full phonemic notation, and 1.1 for Japanese romaji. These figures are all very different from Voynichese.
Discussion of Phonemic versus Syllabic Notation
While perhaps not germane to the Voynich Manuscript problem, it is odd that h1-h2 increases from phonemic to syllabic notation, both for Japanese and Hawaiian. In syllabic notation, given the first character, the second character is more predictable than it is in phonemic notation. This is quite puzzling. How can we explain these results for Hawaiian and Japanese?
The Size of the Character Set
In going from phonemic to syllabic, the text becomes shorter, more information is packed into fewer characters --but that is accomplished by using a larger character set. The numbers of characters for the syllabic notations are more than three times those for the phonemic notations. The measure h1-h2 was chosen to minimize the effect of the size of the character set, but surely is not entirely successful in doing that.
The Effect of Word Divisions
Perhaps one loses predictability because the number of space characters in relation to the total is greater for syllabic notation than for phonemic. If that were the case, leaving out the spaces ought to decrease h1-h2 for syllabic notation more than for phonemic notation. MONKEY runs were made leaving out the spaces to test this. However, the h1-h2 results for syllabic notation decrease less than those for phonemic notation do.
Table 17:
The Effect of Word Divisions on Statistics for Japanese and Hawaiian
File
Orthography
Spaces Included
# ch.
File Size
h0
h1
h2
h1-h2
Japanese Tale of Genji - Section 1
Romaji
Yes
22
32000
4.459
3.763
2.677
1.086
Japanese Tale of Genji - Section 1
Romaji
No
21
26106
4.392
3.803
2.935
0.868
Japanese Tale of Genji - Section 1
Kana
Yes
71
20622
6.150
4.764
3.393
1.370
Japanese Tale of Genji - Section 1
Kana
No
70
14051
6.129
5.666
4.330
1.337
Hawaiian newspaper
Full Phonemic
Yes
19
13473
4.248
3.575
2.650
0.925
Hawaiian newspaper
Full Phonemic
No
18
10433
4.170
3.622
2.935
0.687
Hawaiian newspaper
Full Syllabic
Yes
77
9160
6.267
4.361
3.162
1.200
Hawaiian newspaper
Full Syllabic
No
76
6120
6.248
5.156
3.982
1.174
Redundancy
Gabriel Landini, who did graduate studies in Japan, noted that the redundancy of Japanese is only apparent, that it is actually rather ambiguous. In writing this is overcome with ideographs (kanji), while in speech it is overcome with the context of the speech and with rigid structures (phrases and expressions).
However, Jacques Guy (doctorate in Polynesian languages, was once fluent in Tahitian) notes that Tahitian (similar to Hawaiian) is no more ambiguous than English or French! So redundancy is not likely the explanation.
The Effect of Syllable Divisions
Could the (relatively) high h1-h2 values for syllabic Hawaiian and Japanese mean that combinations of two syllables (eg. yama in Japanese, wiki in Hawaiian) are as repetitious and fixed as combinations of phonemes within syllables?
The phonemic vs. syllabic problem here is more complex than this. Take "yamamoto" in romaji and in kana: (ya)(ma)(mo)(to). When we are analysing the second-order entropy in romaji, one is looking for the distributions of "ya" "am" "mo" "ot" "to", while for kana it is "(ya)(ma)" "(ma)(mo)" "(mo)(to)". For half (or so) of the romaji, one deals with combinations of letters ("am", "ot") that are never represented in kana. So the second-order entropy in one type of text is not strictly comparable with the second-order entropy in the other. The second-order entropy order of the romaji text is in principle "near" in meaning to the first-order entropy of the kana, but about only half of the digraphs correspond to kana.
While the differences in statistics between syllabic and phonemic notation are interesting, they are not necessarily relevant to the Voynich Manuscript. They are chiefly interesting in raising questions about the use of the entropy concept.
Final Thoughts on Low-Entropy Natural Languages
Consider again the start of the Herbal-A sample file (f29v, lines 1-9), in EVA:
kshol qoocph shor pshocph shepchy qoty dy shory
ykcholy qoty chy dy qokchol chor tchy qokchody cheor o
chor chol chy choiin
tshoiin cheor chor o chty qotol sheol shor daiin qoty
otol chol daiin chkaiin shoiin qotchey qotshey daiiin
daiin chkaiin
pchol oiir chol tsho daiin sho teo chy chtshy dair am
okain chan chain cthor dain yk chy daiin cthol
sot chear chl s choly dar
And then the beginning of the Hawaiian newspaper sample file:
kepakemapa mei puke kepakemapa mei mahalo 'ia ka 'Olelo hawai'i e nA mAka' na ho'Olanani kim ma ka lA o malaki ua noa ka pAka 'o kapi'olani no ke anaina na lAkou ke kuleana 'o ka mAlama 'ana ma ka 'Olelo 'ana aku i ka 'Olelo hawai'i ma laila nO i 'Akoakoa ai ka po'e haumAna ka po'e kumu ka po'e mAkua a me ka po'e hoa o kElA 'ano kEia 'ano o ka 'Olelo hawai'i a ma laila nO ho'i i launa ai ka po'e ma o ka 'Olelo hawai'i kapa 'ia kEia lA hoihoi 'o ka lA 'ohana
One sees that the low h2's of Hawaiian and Japanese are due to their very strict consonant-vowel alternation. The EVA Voynich sample shows that the consonant-vowel alternation of Voynichese (as determined by the Sukhotin vowel-recognition algorithm) is not as strict.
Once again, h1-h2 equals 1.8 for Voynichese in EVA. h1-h2 is 0.746 for Bennett's Hawaiian data, 0.925 for Hawaiian in full phonemic notation, and 1.1 for Japanese romaji. These figures are all very different from Voynichese.
For these reasons, it seems unlikely that an underlying low- entropy natural language explains the low h2 measures of Voynich text.
Suggestions for Further Work
The various h2 measures are only crude, partial measures of all the factors that interest us. However, the entropy measure will continue to be useful. It would be nice to have a program that would calculate the entropies of files larger than 32K and calculate higher- order entropies more accurately.
The author believes that the "paradigms" and other structural restrictions of Voynichese explain the low h2 measures. Further study of these structural constraints will be most useful.
Acknowledgments
Many of these ideas and data were previously discussed on the Voynich E-mail list. A special thanks to Gabriel Landini and Rene Zandbergen for their assistance.
References for Electronic Texts
Voynich Text
Rene Zandbergen kindly provided samples of Herbal-B and Herbal-A from voynich.now.
Herbal-B: 26r, 26v, 31r, 31v, 33r, 33v, 34r, 34v, 39r, 39v, 40r, 40v, 41r, 41v, 43r, 43v, 46r, 46v, 48r, 48v, 50r, 50v, 55r, 55v, 57r
Selected Herbal-A: 28v, 29r, 29v, 30r, 30v, 32r, 32v, 35r, 35v, 36r, 36v, 37r, 37v, 38r, 38v, 42r, 42v, 44r, 44v, 45r, 45v, 47r, 47v, 49r, 49v
Jacobean English
Book of Mormon
Bible, KJV
Sir Francis Bacon, Essays
Late Classical Latin Vulgate Latin Bible
Estragon
or
Gopher
Boethius: Consolatio Philosophiae: Book 3 & Book 4
Modern English
Catholic Litany
ISO Standard Catalog
"The Blue Hotel", by Stephen Crane
Chicken Recipe
Cajun Recipes, Part 1 and Part 2
Japanese Text
Gabriel Landini kindly prepared this. The text is from the Genji monogatari's [Tale of Genji, a classic Japanese novel mostly written in hiragana] first 4 parts: 01 Kiritsubo 02 Hahakigi 03 Utsusemi 04 Yugao.
The "kana" output is not kana, of course, but an arbitrary substitution for kana so that MONKEY could be applied.
Hawaiian
The author prepared the Hawaiian texts. Hawaiian has the following phonemes:
Consonants: h k l m n p w '(glottal stop)
Vowels: a e i o u A E I O U (cap's means long)
However, the difference between long and short vowels is often not indicated. Also, the glottal stop is often not written. Obviously both of these things need to be written, since even with them Hawaiian has a rather limited phonemic inventory!
The Hawaiian text came from all the articles in this issue of a Hawaiian newspaper:
Na Maka o Kana
Puke 5, Pepa 5
15 Malaki, 1997
The text was changed to the notation above. All numbers, English, Japanese, and other foreign words were removed until the character set (the number of characters MONKEY showed) matched the Hawaiian notation. A syllabic script for Hawaiian using characters that MONKEY recognizes was devised.
Schizophrenic Language
At the Kooks Museum, in the Schizophrenic Wing, there is a transcript of flyers by Francis E. Dec, containing two schizophrenic Rants:
Francis E. Dec, Esquire
Transcripts of flyers
Printed References
Arieti, Silvano. Creativity : the magic synthesis. New York : Basic Books, c1976. Library of Congress call number: BF408.A64
Bennett, William Ralph. Scientific and Engineering Problem Solving with the Computer. Englewood Cliffs: Prentice-Hall, 1976. [Contains a chapter on VMS.]
D'Imperio, M. E. The Voynich Manuscript--An Elegant Enigma. National Security Agency, 1978. Aegean Park Press, 1978?
Toresella, Sergio. ``Gli erbari degli alchimisti.'' [Alchemical herbals.] In Arte farmaceutica e piante medicinali -- erbari, vasi, strumenti e testi dalle raccolte liguri, [Pharmaceutical art and medicinal plants -- herbals, jars, instruments and texts of the Ligurian collections.] Liana Saginati, ed. Pisa: Pacini Editore, 1996, pp.31-70. [Profusely illustrated. Fits the VMS into an ``alchemical herbal'' tradition.]
Copyright © 1998 by Dennis J. Stallings, all rights reserved.
1
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment