Software
Program  Description  Version 

species.bat  Batch file that runs metadata program  2.1 
metadata.txt  Maple program that sets input/output file definitions and global program options, and runs desired modelfitting program  2.1 
poisson.txt  Fits Poisson/equal class sizes model  2.1 
negbin.txt  Fits gamma mixedPoisson/negative binomial model  2.1 
invgauss.txt  Fits inverse Gaussianmixed Poisson model  2.1 
lognormal.txt  Fits lognormal mixedPoisson model  2.1 
pareto.txt  Fits Pareto mixedPoisson model  2.1 
mixed_expl.txt  Fits mixture of 2 exponentials mixedPoisson model  2.1 
Operational overview
Our computer programs are written in Maple. The basic structure of our algorithm is the same across different parametric models (although mathematical differences between the models require somewhat different specific computational strategies in each case). This allows precise comparison of different models fitted to the same dataset.
We have found that, while the graphical user interface (GUI) of Maple is convenient, the numericallyintensive nature of these computations requires maximum speed. We have therefore elected to bypass the GUI in favor of a batchprocessing system. Under this system, to run an analysis you simply submit the batch (.bat) file to a "command prompt window" or to the "run" window under Windows. Upon completion the output is written to the location specified in the "metadata" file (see below for details).
Getting started
The following resources are required.
 A reasonably fast computer with a substantial amount of RAM. These programs perform a sequence of numerical searches, which can be timeconsuming (high number of iterations), and they loop through (what can be) a large number of subsets of the data. We recommend at least a 1GHz processor with 512MB RAM (we are currently using a 3.4GHz processor with 4GB RAM).
 The current version of Maple. Many universities use the program for a variety of purposes, especially for teaching calculus in the mathematics department, so check to see if your institution has a site license.
 All of the programs in the table above (under "Downloadable code"). For convenience, copy all of these files to a single directory (folder).
 Your dataset in text (.txt) file format, structured exactly as described below.
To run an analysis, simply run the species.bat file (after specifying input and output files, and other program options, as described below). You can do this either in a "run" window, or at a command prompt in a DOS window.
The main program files
 The species.bat file. This is a batch program that runs under DOS. It is very simple; in fact here it is in its entirety:
REM batch program for local machine with one processor REM set echo on echo on REM set path for commandline Maple (cmaple) home directory path C:\Program Files\Maple 8\bin.win REM run metadata program cmaple8 "C:\Documents and Settings\John Smith\My Documents\species\metadata.txt"
In this version of species.bat it is assumed that the commandline Maple program, cmaple, is in the directory C:\Program Files\Maple 8\bin.win. To find this directory on your computer, search for the filename "cmaple," and edit the pathname in the species.bat file accordingly if necessary, using a text editor such as Notepad. It is also assumed that the downloaded program files (from this website, above) are in a folder named "species" under the "my documents" folder of user "John Smith." Again, you must edit this line in species.bat to reflect your installation. 
The metadata file. This is a Maple program that sets the filenames for the input data file (the dataset to be analyzed), the desired modelfitting program, and the two output files. The structure of this file must be maintained exactly as it is given here, since it is a Maple program. It is in large part selfexplanatory. You must edit the following options in metadata.txt, using a text editor such as Notepad:
 program_file: the complete filename for the program that fits the desired model.
 data_file: the complete filename for your dataset (see below for details on format).
 output_fits_file: the complete filename to contain the fitted values at each right truncation point.
 output_analysis_file: the complete filename to contain the full analysis, including estimated number of species, standard error, goodnessoffit statistics, etc., at each right truncation point.
 fmin: the lowest right truncation point; that is, the smallest subset of the data to be analyzed will contain frequencies from 1 to fmin. Default: 5.
 fmax: the largest right truncation point; that is, the largest subset of the data to be analyzed will contain frequencies from 1 to fmax. Default: the maximum frequency occurring in the data.
 The following options may be left at their defaults, or changed by the user:
 significant digits: the number of significant digits used by Maple in its computations. Default: 16.
 subsequent program options are more technical.

Modelfitting programs. Each program fits a specific parametric model to the observed frequency data, via the method of maximum likelihood. The basic goals of the program are to
 Compute maximum likelihood estimates (MLE's) of the distribution parameters;
 Compute the "conditional maximum likelihood estimate" of the unobserved and of the total number of classes;
 Compute the standard error of these estimates;
 Compute the pvalue of the classical chisquared goodnessoffit (GOF) statistic; and
 Output text files with (i) fitted values, and (ii) all relevant statistics and program error diagnostics.
These computations are done to a level of precision specified by the user (16 significant digits by default). The complete analysis is run on each of a sequence of subsets of the data: each subset consists of the frequency data from 1 to up to a given right truncation point t, where t ranges from some minimum frequency specified by the user (5 by default), up to a maximum set by the user (by default, the maximum frequency encountered in the data). Each row or line in the output file contains the complete analysis at a given right truncation point t. Thus the user can compare analyses at different right truncation points; typically the fit will vary with t.
The general architecture is as follows: First, given a fixed set of starting values, the program attempts to find methodofmoments estimates of the unknown parameters. If this fails (unusual), the program stops and continues to the next right truncation point t.
 The GOF is computed based on the momentmethod estimates. If the GOF falls below a userspecified threshold (default: p < 10^(6)), the program stops and continues to the next right truncation point t.
 Using the momentmethod estimates as starting values, the program searches for the MLE's. This process continues through a number of steps, and yields values for the MLE's that are as precise as the program is able to compute (ideally exactly correct).
 The GOF is then computed based on the MLE's. If the GOF falls below a userspecified threshold (default: p < 10^(3)), the program stops and continues to the next right truncation point t.
 The standard error is computed using the MLE's.
Once all computations are complete at all right truncation points t, the output is formatted and written to a userspecified text file, which can then be read into Excel or any other package for editing and display.
Currently we have six parametric models. The output from each program is structured the same way. They are all mixedPoisson models (see Basic Theory), with different mixing distributions: Poisson, with a pointmass mixing distribution, that is, the ordinary unmixed Poisson. Under this model the sampling intensity is constant or identical for all classes in the population.
 Negative binomial, or gammamixed Poisson. The mixing distribution, or the distribution of the sampling intensities, or the stochastic abundance distribution, is the gamma.
 Inverse Gaussianmixed Poisson. The mixing distribution is the inverse Gaussian.
 Lognormalmixed Poisson. The mixing distribution is the lognormal.
 Paretomixed Poisson. The mixing distribution is the Pareto.
 2mixedexponentialmixed Poisson. The mixing distribution is a mixture of 2 exponentials.
We are continually searching for more families of mixing distributions that (i) have the potential to fit a wide variety of count data, particularly with high diversity and (some) large abundances; (ii) can be shown to satisfy the technical conditions required for the general theory (in particular asymptotic variances, i.e., standard errors) to be valid; and (iii) are feasibly computable. We will add programs for these as they become available.
The input and output files
 Your dataset file. This must be a text (ASCII) file, with two columns, tab delimited, with a carriage return/new paragraph mark at the end of each line (note that there must not be an extra return after the last line). The first column contains the frequencies, the second, the frequencies of frequencies. Here is a sample dataset, the same one discussed under Basic Theory. A file with this structure can be readily created using, e.g., Microsoft Excel.
 The fitted values file. The structure of this file is as follows. The first, leftmost column contains the integers from 1 up to the maximum frequency in the data, i.e., all (potentially) observed frequencies. The second column contains the actual observed frequencyoffrequency counts for each integer (some of these may be zero). Subsequent columns contain the values fitted by the model to the given frequency, from 1 to t; each column contains the fitted values for a given right truncation point t.

The analysis output file. Each row or line in the analysis output file contains the results of a complete analysis at a given right truncation point t. For a description of the analysis results see Basic Theory. From left to right, the statistics are:
 the right truncation point t;
 the MLE's of the parameters of the distribution;
 the MLE of the "noncoverage," i.e., p_{0};
 the estimated number of unobserved species, i.e., s_{0};
 the estimated number of species based only on the data up to the right truncation point, that is, excluding the species with observed frequencies greater than the right truncation point;
 the estimated total number of species, that is, including the species with observed frequencies greater than the right truncation point;
 the standard error of the estimate of the number of species (the standard error for the estimate based on the subset and for the estimated total is the same);
 a lower bound for the standard error (an empirical version of the simple binomial SE; see Chao and Bunge (2002));
 the "naïve" pvalue of the chisquared goodnessoffit test for the model, using all cells;
 the pvalue for an asymptotically correct chisquared goodnessoffit test based on concatenating adjacent cells so that all expected cell counts are at least 5, to conform with asymptotic theory;
 the "program error report," which is actually a numeric code indicating the state of the program when it terminated (not necessarily an error).