IPHOD: HOME, BLOG, DOWNLOAD, SEARCH, CALCULATOR, DETAILS, KENNY VADEN
Last Webpage Update: December 30, 2010

Notes: The most current version of IPhOD (v. 2.0) was updated on December 1, 2009, and is now available for download. The previous version (v. 1.4) was released on August 14, 2009; containing corrections to the positional probability calculations. Please refer questions to Kenny Vaden.

Download the IPhOD

The IPhOD is freely available to download for research purposes. There are several available options for downloading the IPhOD, which permits the user to efficiently obtain the files needed. The first two links contain compressed archives of the word or pseudoword database in tab-delimited textfiles. The third gives the release notes, and the fourth gives my pronunciation key (based on CMUPD). Summary information on the organization of the database and the meaning of each value and columnar layout for versions 1.4 and 2.0 can be found on the details page. The files are listed and linked below.

The text archives are divided into Words and Pseudowords; there is an archive that can be downloaded for each. The archive for Real Words contains a single tab-delimited textfile that lists all IPhOD words and their values, row by row. Since there are so many pseudowords (815,066), these were organized into 16 textfiles, all included in the Pseudoword archive. Each pseudoword textfile is organized by the number of phonemes, so file #2 contains two-phoneme long pseudowords, file #3 contains three-phoneme long pseudowords, etc., up to file #17.

Version History

IPhOD was originally developed in Fall 2003, and underwent several organization changes, until IPhOD (v 1.3) was released in Winter 2005. Version 1.4 was August 14, 2009, correcting a calculation error that affected positional probability measures in columns 39-44. Finally, Version 2.0 was released on December 1, 2009 and includes homophones and homographs, plus replaced Kucera Francis word frequencies with SUBTLEXus word frequencies. Version 2.0 also contains a greater number of word entries; 54k, up from 33k), which means that the calculations were performed on an even larger sample of words than previously. These are significant improvements and expansions of the database.

IPhOD Version 2.0 Files:

1. IPhOD Words: IPhODv2.0_REALS.zip

2. IPhOD Pseudowords: IPHODv2.0_PSEUDO.zip

3. Release notes: 2009_Dec01_Release_Readme.txt

4. CMU Pronunciation Key with IPA Glyphs PDF: CMU_pronunciation_key.pdf

5. PERL Scripts: IPhod_Search.ZIP (coming soon)

Contact: IPhOD is free software, copyrighted by Kenny Vaden and distributed under the GPL. If you use the database and are published in a peer-reviewed journal, conference proceedings, or thesis, please send me your citation data. This helps justify future tool developments, and gives researchers a better idea of how this database is being used. Also, please cite your use of IPhOD in the following way:

Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file]. Available from http://www.iphod.com.

IPhOD Version 1.4 Files:

1. IPhOD Words: IPHODv1.4_REALS.zip

2. IPhOD Pseudowords: IPHODv1.4_PSEUDO.zip

3. Release notes: 2009_Aug14_Release_Readme.txt

4. CMU Pronunciation Key with IPA Glyphs PDF: CMU_pronunciation_key.pdf

5. PERL Scripts: IPhODv1.4_Search.zip (click here for instructions.)

Please cite your use of IPhOD in the following way:

Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 1.4. [Data file]. Available from http://www.iphod.com.

Version 1.4 note: The first column of the pseudoword files shows the *word that was changed to produce the pseudoword*. This may be confusing since many people read the "word" column entry, and don't realize that there is a different CMUPD transcription, which is really the pseudoword, as it is pronounced. Since each pseudoword was generated by changing one phoneme from a real word, it helps to see what that word was when you're going to try to pronounce it correctly. For example, "Fox" might show up as the pseudoword's "word" entry - but reading the transcription columns tells you "F AH Z", so it is pronounced "foz".

Using PERL to search IPhOD (version 1.4 only, as of Dec 1, 2009)

In many cases, using PERL scripts may allow you to search more elegantly and powerfully than using a spreadsheet program like Excel, but at a cost of programming time - decide wisely. If you have some programming background, you can modify these scripts to create new search functions better suited to your research questions. For example, I modified this script to search for CVC items only, or CVC words which share the CV-onset. I am interested in seeing your code if you modify mine to improve it. The instructions below are based on my development OS (Windows), and assume you have downloaded and unpacked the word or pseudoword contents (above).

1. If you do not have PERL on your PC yet, then install it. Active State PERL (Windows, free). I wrote these scripts on Windows machine, but some slight changes allow it to run beautifully in Linux, which includes PERL by default in most cases. MAC OS may also include PERL without any installation, but I don't know.

2. Download the PERL search script and search textfile (archived): IPhod Search ZIP file is above.
.... ZIP archive containing search script and query file (archive updated Mar 17, 2009).

3. Unzip contents of IPhod_Search.ZIP into the directory containing either word OR pseudoword textfiles.

4. Edit SEARCH_VALS.TXT using a text editor or spreadsheet program. Column #1 shows a value label that corresponds to the header row of the word or pseudoword files. Column #2 gives the minimum allowed value, and Column #3 is the maximum. If you do not specify a value (blank field), then that variable is ignored when filtering the results.

5. Execute the IPHOD_SEARCH.PL script. Using the DOS prompt or command window, navigate to the directory containing all the search files, including iphod_search.pl and files you are searching, then type "iphod_search.pl". The search output (Output.txt) should contain only words or pseudowords within the value range specified in step 4, above. The layout of the search_vals.txt and command line, using a real example are shown in the figure, below (click for larger image).


Click to enlarge image.

Using PERL to Calculate New Values

If a word or pseudoword isn't in the IPhOD, there is another perl script I wrote to calculate new density values and phonotactic probabilities the same way that they were originally done, for a list of items in CMU transcription format. This is advanced IPhOD useage only - so contact me with a list or to obtain those additional PERL files at . Alternatively, you may use the online calculator - probably a more timely way to proceed.