Download the IPhOD
The IPhOD is freely available to download for research purposes. There are several available options
for downloading the IPhOD, which permits the user to efficiently obtain the files needed. The first two links contain
compressed archives of the word or pseudoword database in tab-delimited textfiles. The third gives the release notes,
and the fourth gives my pronunciation key (based on CMUPD). Summary information on the organization of the database and
the meaning of each value and columnar layout for versions 1.4 and 2.0 can be found on the
details page. The files are listed and linked below.
The text archives are divided into Words and Pseudowords; there is an archive that can be downloaded
for each. The archive for Real Words contains a single tab-delimited textfile that lists all IPhOD words and their
values, row by row. Since there are so many pseudowords (815,066), these were organized into 16 textfiles, all included
in the Pseudoword archive. Each pseudoword textfile is organized by the number of phonemes, so file #2 contains
two-phoneme long pseudowords, file #3 contains three-phoneme long pseudowords, etc., up to file #17.
IPhOD was originally developed in Fall 2003, and underwent several organization changes, until
IPhOD (v 1.3) was released in Winter 2005. Version 1.4 was August 14, 2009, correcting a calculation error
that affected positional probability measures in columns 39-44. Finally, Version 2.0 was released on December 1, 2009
and includes homophones and homographs, plus replaced Kucera Francis word frequencies with SUBTLEXus word frequencies.
Version 2.0 also contains a greater number of word entries; 54k, up from 33k), which means that the calculations were
performed on an even larger sample of words than previously. These are significant improvements and expansions of the database.
Contact: IPhOD is free software, copyrighted by Kenny Vaden and distributed under the
GPL. If you use the database and are published in a peer-reviewed journal,
conference proceedings, or thesis, please send me your citation data. This helps justify future tool developments, and gives
researchers a better idea of how this database is being used. Also, please cite your use of IPhOD in the following way:
Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file].
Available from http://www.iphod.com.
Please cite your use of IPhOD in the following way:
Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 1.4. [Data file].
Available from http://www.iphod.com.
Version 1.4 note: The first column of the pseudoword files shows the *word that was changed to produce the
pseudoword*. This may be confusing since many people read the "word" column entry, and don't realize that there is a different CMUPD
transcription, which is really the pseudoword, as it is pronounced. Since each pseudoword was generated by changing one phoneme from
a real word, it helps to see what that word was when you're going to try to pronounce it correctly. For example, "Fox" might show up as
the pseudoword's "word" entry - but reading the transcription columns tells you "F AH Z", so it is pronounced "foz".
Using PERL to search IPhOD (version 1.4 only, as of Dec 1, 2009)
In many cases, using PERL scripts may allow you to search more elegantly and powerfully than using a spreadsheet program
like Excel, but at a cost of programming time - decide wisely. If you have some programming background, you can modify these scripts
to create new search functions better suited to your research questions. For example, I modified this script to search for CVC items
only, or CVC words which share the CV-onset. I am interested in seeing your code if you modify mine to improve it. The instructions below
are based on my development OS (Windows), and assume you have downloaded and unpacked the word or pseudoword contents
1. If you do not have PERL on your PC yet, then install it. Active State PERL (Windows, free).
I wrote these scripts on Windows machine, but some slight changes allow it to run beautifully in Linux, which includes PERL by default in most cases.
MAC OS may also include PERL without any installation, but I don't know.
2. Download the PERL search script and search textfile (archived): IPhod Search ZIP file is above.
.... ZIP archive containing search script and query file (archive updated Mar 17, 2009).
3. Unzip contents of IPhod_Search.ZIP into the directory containing either word OR pseudoword textfiles.
4. Edit SEARCH_VALS.TXT using a text editor or spreadsheet program. Column #1 shows a value label that corresponds
to the header row of the word or pseudoword files. Column #2 gives the minimum allowed value, and Column #3 is the maximum.
If you do not specify a value (blank field), then that variable is ignored when filtering the results.
5. Execute the IPHOD_SEARCH.PL script. Using the DOS prompt or command window, navigate to the directory containing all
the search files, including iphod_search.pl and files you are searching, then type "iphod_search.pl". The search output (Output.txt) should
contain only words or pseudowords within the value range specified in step 4, above. The layout of the search_vals.txt and command line,
using a real example are shown in the figure, below (click for larger image).
Click to enlarge image.
Using PERL to Calculate New Values
If a word or pseudoword isn't in the IPhOD, there is another perl script I wrote to calculate new density values and
phonotactic probabilities the same way that they were originally done, for a list of items in CMU transcription format. This is advanced
IPhOD useage only - so contact me with a list or to obtain those additional PERL files at . Alternatively,
you may use the online calculator - probably a more timely way to proceed.