Distribution Plotting & Analysis
Go to the DataPlus Console and open the
file "smallseries". This is an artificial data set that was generated
using a program which creates random numbers having a Gaussian
distribution. Choose "Population" for the X-field and leave the others
as "None". The DataPlus Console now look like this:
Then, click on the distribution analysis tool icon at the far left side of the toolbar. The distribution analysis window should open as follows:

The
default view is a distribution histogram, which for a Gaussian (or
'Normal') distribution, should show a nice bell-shaped curve. Depending
upon the number of points, it will appear more or less 'lumpy'. As the
number of points increases, the
curve will begin to assume a smooth bell-like shape. However, just
because the curve appears lumpy, it still has a Gaussian distribution
and can be described by the statistics which appear in the box in the
left side of the window. The data is fitted to an ideal Gaussian
curve, with the fitted curve being shown.
Points: The number of data points in the set.
Range: The smallest and largest values in the data
set.
Mean: The mean (average) of all values in the data
set.
S.D.: The standard deviation of the data.
+/- 2SD: The values which are encompassed by two
standard deviations above and below the mean (about 97% of the points).
1/99 (5/95, etc): The values above or below which
are found 1% (5%, etc) of the data points.
Median value: The value where half the data points
lie above or below (50th percentile).
Mode Value(Gp): The most common value. Note that
this will depend upon how the points are grouped (Gp).
The distribution histogram is formed by taking the data and grouping it. The smaller the group size, the more 'fine-grained' the distribution will be, but it will also appear to be more non-uniform. The group size does not affect the statistics or the fitted curve; it simply controls how the data is shown. Using the "Group Size" edit box on the "Distribution Statistics" panel, set the value to 5 and click the "Apply" button. The distribution should now look like this:
This
recalculated the histogram by grouping all the results into sets
with values differing no more than 5.0. For example, all data points
between
90.0 and 95.0 were counted, and a bar was created whose height reflects
the number of data points in that range. Note that the distribution is
more ragged than before, since each bar represents fewer data
points.
The data can be plotted in either of three formats, a histogram, a
percentile plot, or a plot representing the data values as standard
deviations from the mean. To change the plot type, select the desired
output from the "Graph" menu as shown:
If you choose "Percentile" as shown here, the graph is redrawn as shown
below:
Here, the data points have been sorted, and the value of a point is
plotted against its percentile ranking in the data set. For example,
this set contains 100 points, so in a low -> high sorted list of
the points, point #5 would represent the 5th percentile, the 20th point
would be the 20th percentile, and so on. Note that, although rough, the
curve has the expected "S" shape of a Gaussian, or "Normal"
distribution.
Another way of looking at the distribution is by plotting the deviation
of the value from the mean (similar to a "Probit" or "Z-Score"), as
shown below.
This has the advantage that, in a "Normal" distribution, the plot
should be a straight line, which allows data fitting to be done.
The data fitting algorithm used in this application takes advantage of
this by performing a least-squares regression of the data points +/-
2SD from the mean. This is not a very rigorous method, but it works
well if the distribution is relatively Gaussian.
Now, go back to the DataPlus Console and choose "Append Data..." from the "File" menu. Select the file "tinyseries". The distribution icon should be dimmed, so select the 'Tiny Series" tab and set "Value_1" as the X-field.

Clicking the distribution icon should then give you a distribution histogram with two distributions. If you go to the "Distribution Statistics" panel and change the color of the "Values_2" set to red, you will have a graph like this:
This
shows the sets are essentially identical except for the number of
points they contain.