OzData: Insurance Premiums Keywords: Cubic spline regression Description Age specific term life premium rates for a sum insured of $50,000 are given in the table. The first column is the age of insured, the next two columns are the rates for male smokers and non-smokers, and the last two columns are the rates for female smokers and non-smokers. The four separate sets of points may be plotted and cubic spline regression used to fit them. Age M-Smoke M-Non M-Smoke F-Non 33 130 100 110 95 34 135 105 110 95 35 140 105 115 100 36 145 110 120 100 37 155 110 125 105 38 160 115 130 105 39 170 120 140 110 40 180 125 145 115 41 195 130 155 120 42 210 140 165 130 43 230 145 175 135 44 250 155 190 145 45 270 170 205 155 46 295 180 225 165 47 325 200 245 180 48 360 215 265 195 49 395 235 290 210 50 435 260 320 230 51 485 285 350 250 52 535 315 380 275 53 590 350 420 305 54 650 390 460 335 55 715 435 505 370 Skin Cancer in Texas and Minnesota Keywords: Logistic regression, Poisson regression. Description The data show the incidence of nonmelanoma skin cancer among women in Minneapolis-St Paul, Minnesota, and Dallas-Fort Worth, Texas. The towns are coded 0 for St Paul and 1 for Forth Worth. One would expect sun exposure to be greater in Texas than in Minnesota. Source Kleinbaum, D., Kupper, L., and Muller, K. (1989). Applied regression analysis and other multivariate methods. PWS-Kent, Boston, Massachusetts. Hand, D. et al, (1994). A handbook of small data sets. Chapman and Hall, London. Analysis Can use logistic regression for Cases/Population, or use Cases as a Poisson response with log(Population) as offset. OzData Data File This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1997. Last modified: 17 October 1998 Cases Town Age Population 1 0 15-24 172675 16 0 25-34 123065 30 0 35-44 96216 71 0 45-54 92051 102 0 55-64 72159 130 0 65-74 54722 133 0 75-84 32185 40 0 85+ 8328 4 1 15-24 181343 38 1 25-34 146207 119 1 35-44 121374 221 1 45-54 111353 259 1 55-64 83004 310 1 65-74 55932 65 1 85+ 7583 Passengers on the Titanic Keywords: binomial regression, contingency table Description The data give the survival status of passengers on the Titanic, together with their names, age, sex and passenger class. About half of the ages for the 3rd Class passengers are missing, although a good many of these could be filled in from the original source below. Variable Description Name Recorded name of passenger Pclass Passenger class: 1st, 2nd or 3rd Age Age in years Sex male or female Survived 1 = Yes, 0 = No Source Hinde, Philip (1998). Encyclopedia Titanica. Analysis Age, sex and passenger class all have strong relationships with whether the individual survived. There should also be strong interactions of passenger class with the other two variables. This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1998. Last modified: 23 October 1998 Aboriginal Deaths in Custody Keywords: binomial regression. Description The data give the number of deaths in prison custody in Australia in each of the six years 1990 to 1995, given separately for Aboriginal and Torres Strait Islanders (indigenous) and others (non-indigenous). Variable Description Year 1990 through 1995 Indigenous Yes = Aboriginal or Torres Strait Islander, No = Non-indigenous Prisoners Total number in prison custody Deaths Number of deaths in prison custody Population Adult population (15+ years) The data were collected in response to the Royal Commission into Aboriginal Deaths in Custody, the final report of which wastabled in the Federal Parliament on the 9 May 1991. The report of the Royal Commission has two streams. One is concerned with the ninety-nine Aboriginal and Torres StraitIslander deaths in custody which occurred throughout Australia during the period 1 January 1980 to 31 May 1989. Issues around the causes of death, culpability of custodians and their employers, and the prevention of future deaths were addressedin depth. The second stream concerned what the Royal Commission called the 'underlying issues': the social, cultural, and legal factors which, in the view of the Commissioners, had some bearing on the deaths. These underlying issues, as revealed from the chapter headings of the Royal Commission's National Report, included the Legacy of History, Aboriginal Society Today, Relations With the Non-Aboriginal Community, The Harmful Use of Alcohol and Other Drugs, Schooling, Employment, Unemployment and Poverty, Housing and Infrastructure, Land Needs, and Self-determination. The link between the Royal Commission's discussion of the individual deaths investigated, the prevention of future deaths and the underlying issues, is its position on the over-representation of Indigenous people in custody in Australia. A central conclusion of the Royal Commission, illustrating this point, was as follows: The work of the commission has established that Aboriginal people in custody do not die at a greater rate than non-Aboriginal people in custody. However, what is overwhelming different is the rate at which Aboriginal people come into custody, compared with the rate of the general community ... The ninety-nine who died in custody illustrate that over-representation and, in a sense, are the victims of it. The conclusions are clear. Aboriginal people die in custody at a rate relevant to their proportion of the whole population which is totally unacceptable and which would not be tolerated if it occurred in the non-Aboriginal community. But this occurs not because Aboriginal people in custody are more likely to die than others in custody, but because the Aboriginal population is grossly over-represented in custody. Too many Aboriginal people are in custody too often (Johnston, 1991, Vol 1, p6). Source Indigenous deaths in custody 1989 - 1996 / a report prepared by the Office of the Aboriginal and Torres Strait Islander Social Justice Commissioner for the Aboriginal and Torres Strait Islander Commission. Aboriginal and Torres Strait Islander Commission, Canberra, 1996. Analysis > p <- Deaths/Prisoners > glm.deaths <- glm(p~Indigenous*Year,family="binomial",weights=Prisoners) > anova(glm.deaths,test="Chi" ) Analysis of Deviance Table Binomial model Response: p Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(Chi) NULL 11 16.45645 Indigenous 1 2.740514 10 13.71594 0.0978333 Year 1 4.700794 9 9.01515 0.0301487 Indigenous:Year 1 1.259585 8 7.75556 0.2617297 This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1998. Last modified: 25 September 1998 http://www.maths.uq.edu.au/~gks/data/oz/custody.html Heart Valves in Dogs on Different Exercise Regimens Keywords: ordinal regression Description A new type of heart valve has been developed and is implanted in 63 dogs that have been raised on various levels of exercise. The numbers of valve transplants that succeed are recorded. Is the proportion of successful implants the same for dogs on all exercise regimens? Is there a trend with amount of exercise in the proportion of successful implants? Variable Description Exercise Amount of exercise: 1=None, 2=Slight, 3=Moderate, 4=Vigorous Implant 1=Successful, 2=Unsuccessful Frequency Number of dogs Source Zar, J. H. (1999). Biostatistical Analysis, Fourth Edition. Prentice-Hall International, Upper Saddle River, New Jersey. Exercise 24.20. Analysis This can be used as an example of ordinal logistic regression, with Exercise as the response and Implant as the explanatory variable. This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1999. Last modified: 03 March 1999 Exercise Implant Frequency 1 1 8 1 0 7 2 1 9 2 0 3 3 1 17 3 0 3 4 1 14 4 0 2 Wide of Ore-Bearing Layer Keywords: non-parametric regression, thin plate splines. Description Data were collected from a mine in Cobar, NSW, Australia. At each of 38 sampling points, several measurements were taken, one of which is the 'true-width' of an ore-bearing rock layer. Also given are the co-ordinates t1 and t2 of of the data sites. Green and Silverman (1994) use this data set to illustrate thin-plate splines for fitting a smooth surface. Source O'Connor, D. P. H., and Leach, B. G. (1979). Geostatistical analysis of 18CC Stope block, CSA mine, Cobar, NSW. Estimation of statement of mineral reserves, pp. 145-153. Australian IMM, Melbourne. Green, P. J., and Silverman, B. W. (1994). Nonparametric regression and generalized linear models. Chapman and Hall, London. This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1997. Last modified: 19 July 1998 t1 t2 Width -16 -15 17.0 -14 -4 18.0 -13 4 17.5 -7 5 19.0 -6 -43 22.0 -6 -36 24.0 1 -50 17.4 2 -39 23.0 2 -8 23.5 2 -51 15.0 9 -16 23.5 9 -42 25.0 17 -37 16.5 18 -12 19.5 24 -57 12.0 25 -29 18.5 26 -40 18.0 32 -7 14.0 33 -35 19.0 40 4 13.5 40 -61 18.0 44 -29 19.4 48 -65 13.0 48 -7 14.0 49 -32 19.5 55 -71 16.0 56 -14 16.0 59 -38 19.0 62 7 19.0 62 -3 21.5 64 -29 22.0 69 -28 20.5 70 -72 11.0 77 -19 26.0 78 -53 22.0 79 -37 26.0 84 -52 16.0 84 -16 16.0 Prawn Trawling in the Great Barrier Reef Keywords: regression, non-parametric regression. Description These data refer to a survey of the fauna on the sea bed lying between the coast of northern Queensland and the Great Barrier Reef. The sampling region covered a zone which was closed to commercial fishing, as well as neighbouring zones where fishing was permitted. In view of the large numbers and types of species captured in the survey the catch was summarized as a score, on a log weight scale, which combines information across species. Two such scores are available. The details of the survey, and a full analysis of the data, are in Poiner et al (1997). Variable Description Zone an indicator for the closed (1) and open (0) zones Year an indicator of 1992 (0) or 1993 (1) Latitude latitude of the sampling position Longitude longitude of the sampling position Depth bottom depth Score1 catch score 1 Score2 catch score 2 Source Poiner, IR, Balber, SJM, Brewer, DT, Burrdige, CY, Caeser, D, Connell, M, Denniss, D, Dews, GD, Ellis, AN, Farmer, M, Fry, GJ, Glaister, J, Gribble, N, Hill, BJ, Long, BG, Milton, DA, Pitcher, CR, Proh D, Salini, JP, Thomas, MR, Toscas, P, Veronise, S, Wang, YG, Wassenberg, TJ (1997). The effects of prawn trawling in the far northern section of the Great Barrier Reef, CSIRO Division of Marine Research, Queensland Department of Primary Industries. Bowman, A. W., and Azzalini, A. (1997). Applied smoothing techniques for data analysis. Clarendon Press, Oxford. This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1998. Last modified: 23 October 1998 http://www.maths.uq.edu.au/~gks/data/oz/reef.txt Measurements on Babies Keywords: analysis of covariance, spurious correlation. Description The data consist of measurements (x1, x2, Age in months) on 23 babies, collected in the Faculty of Medicine at the University of Hong Kong. It would be of great medical interest to find a relationship between x1 and x2. However, any correlation between them is likely spurious because both x1 and x2 tend to increase with age. See Chris Lloyd's original mailing to the ANZStat mailing list discussion. Source Chris Lloyd, University of Hong Kong. Analysis x2 is independent of x1 after adjustment for Age. Dependence of x2 on Age is approximately linear. There is some evidence of increasing variance, which can be handled by using gamma rather than normal regression. x2 is independent of x1 given Age. In fact, as the ANOVA belows shows, the dependence on Age is nearly linear. Analysis of Variance Table Response: x2 Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) Age 1 25928.03 25928.03 15.68836 0.0010093 as.factor(Age) 1 6420.35 6420.35 3.88479 0.0652303 x1 1 2098.36 2098.36 1.26966 0.2754847 as.factor(Age):x1 2 515.02 257.51 0.15581 0.8569284 Residuals 17 28095.76 1652.69 This page maintained by Gordon Smyth, Department of Mathematics, University of Queensland. (c) 1998. Last modified: 25 June 1998 x1 x2 Age 0.729 280.1 3 0.785 402.2 3 0.625 351.4 3 0.604 315.5 3 0.701 306 3 0.957 315 3 0.664 220.2 3 0.64 223.6 12 0.464 214.3 12 0.684 224.5 12 0.517 256 12 0.581 285.4 12 0.814 215.1 12 0.636 231 12 1.051 269.6 12 0.41 222.5 24 0.701 221.1 24 0.65 208.9 24 0.234 170.1 24 0.674 254.5 24 0.545 263.9 24 0.429 249.1 24 0.358 210.8 24 VISTA Analyses Exploratory and Descriptive Data Analysis Dynamic Exploratory Graphics include Spinplots, Scatterplots, Scatterplot Matrices, Histograms, Boxplots, Parallel Coordinate Plots, Mosaic Plots, Quantile Plots, Normal Probability Plots, Quantile-Quantile Plots, Diamond Plots, Dotplots, Biplots, and Guided Tour Plots. Plots support brushing and labeling, and are dynamically linked. Smoothers and Contours can be added to several plots. Descriptive Statistics including Means, Standard Deviations, Variances, Ranges, Quartiles, Medians, Correlations, Covariances, Distances Univariate Analysis Univariate Tests including T- and Z-tests (confidence intervals) for single sample, paired samples and two independent samples data, with Wilcoxon Signed-Rank and Mann-Whitney tests in appropriate situations. ANOVA - Univariate Analysis of Variance for balanced and unbalanced, one or multi-way data (data must be complete). Model may or may not include two-way (but not higher-way) interactions. The model visualization is a spreadplot composed of a boxplot, diamond plot, quantile plot, quantile-quantile plot and effects plot. Multiple Regression - Univariate regression includes simple, multiple, robust, and monotonic regression. The model visualization is a spreadplot comprised of a regression, added-variable, influence, leverage, and residuals plots. Weight plots are also included for robust and monotonic regression. Multivariate Analysis Multiple Regression - Multivariate Multiple Regression Analysis. The spreadplot consists of a biplot, spinplot, histogram and scatterplot-matrix. Principal Component Analysis of correlations or covariances. The model visualization is a spreadplot composed of a biplot, spin-plot, scree-plot and scatterplot-matrix. Multidimensional Scaling of one or more symmetric or asymmetric matrices. The model visualization is a spreadplot composed of a scatterplot, spin-plot, scree-plot and scatterplot-matrix. The spreadplot supports graphical re-estimation of model parameters. Correspondence Analysis of two-way contingency tables. The model visualization is a spreadplot composed of a biplot, spinplot, residuals plot and scree-plot. The spreadplot supports graphical re-estimation of model parameters. Copyright (c) 1998 by Forrest W. Young. All rights reserved. FFGRID and DENSITY display unevenly distributed data in 2-D and 3-D as ordinary colour-coded, regularly spaced data. This is very useful when dealing with data which are difficult to view using plot3 (which, in my experience, is the case with all 3-D data that are not completely smooth). FFGRID is a Fast `n' Furious way to do the same job that griddata does for you. The difference is that there is no interpolation. Empty points are left empty rather than trying to fill them using neighbouring points. Also, FFGRID has no problem with multiple points that fall in the same grid cell. Data are displayed using PCOLOR unless output arguments are specified, in which case the matrix of (regularly spaced) data is given along with the vectors s specifying x- and y-dimensions. DENSITY is akin to HIST in that it displays a density distribution, but this time in 2-D rather than 1-D. BIN is a small M-file that is needed for FFGRID and DENSITY to do the job right. Oyvind Breivik Oyvind.Breivik@gfi.uib.no 1/9/98 PPLOT is a graphical plot layout and design tool for both Matlab 4 and Matlab 5 (both PC and UNIX versions). PPLOT() is a substitute for the Matlab PLOT command and PPLOT without arguments it is a substitute for the Matlab FIGURE command. Now you can create legends, insert text, titles and labels. You can place, move and resize objects simply by 'click and drag'. You can change properties on any object like colors, font, linewidth, linetype etc., you can even rotate text. The original data is saved to be able to analyse complex data. You can make all kinds of calculations and analyses on the plotted data. Any number of figures containing any number of axes can be created. The plot goes to the active figure and you can select destination axes simply by clicking with the mouse. An unlimited undo makes it easy to test different layouts. PPLOT comes with a large number of plugins to plot Smith charts, draw arrows, filters for a number of file formats etc. - Everything you wanted to do with your plots but were afraid to ask... See also: http://extwww.lulea.trab.se/users/joajoh/pplot/ Joachim Johansson Joachim.K.Johansson@telia.se 1/18/99 STACKFIGS is used to display multiple figures simultaneously by stacking all open figures. STACKFIGS usage: stackfigs (no arguments) Restriction: max number of figures = (screen_vertical_rez_in_pixels/20) STACKFIGS has only been run under Matlab 5 Charles Plum cplum@nichols.com 12/24/98 plots a 3-D surface of constant value: f(x,y,z) = const. Ruslan L. Davidchack davidchack@kuphsx.phsx.ukans.edu http://weizen.chem.ukans.edu/ruslan 11/4/97 The TILEFIGS program is used to display multiple figures simultaneously by tiling the screen with all open figures. TILEFIGS usage: tilefigs ([nrows ncols],border_in pixels) Restriction: maximum of 100 figure windows Without arguments, tilefigs will determine the closest N x N grid for all open figures. TILEFIGS has only been run under Matlab 5 Charles Plum cplum@nichols.com 12/24/98 Numerická integrace: quadg.m quad2dg.m These functions are modified versions of the quadg.m and quad2dg.m. files found in the NIT (Numerical Integration Toolbox) The code has been vectorized in order to be able to perform fast integration of several integration limits. As before quadg and quad2dg only calculate one and two dimensional integrals, respectively, but you may specify several integration limits in a single call to the functions. It is also possible to integrate directly given functions enclosed in parenthesis Example: integration from 0 to 2 and from 2 to 4 for x is done in a single call by: >>quadg('(x.^2)',[0 2],[2 4]) ans= 2.6667 18.6667 similarly integration from 0 to 2 and from 2 to 4 for both x and y is done in a single call by: quad2dg('(x.^2.*y)',[0 2],[2 4],[0 2],[2 4]) ans= 5.3333 112.0000 The files were tested under Matlab version 5.2. It should be noted that both quadg and quad2dg require the Numerical Integration Toolbox (NIT) to calculate the weights. Also note that quad2dg uses distchk function in the Statistics Toolbox to check the integration limits and make sure they are of common size. This call is not strictly necessary and may be omitted if you do not have the statistics toolbox. Per A. Brodtkorb pab@marin.ntnu.no 02/17/99 Saves current variables in a delineated ASCII file: variable names 1st, horizontally, with data for each below the name. Usage/Input: save_ascii(loadname,savename,dataformat,delineator); -"loadname" = filename of the *.mat file to save as ASCII -"savename" = filename to save this text output to -"dataformat" = format of 'double array' data (e.g. '%6f' for six digit fixed-point notation) -"delineator" = what to delineate data blocks with (e.g. '\t' for tab) eg. save_ascii('data.mat','textfile.txt','%6f','\t'); Limitations: This script can only handle two data types: 'char' and 'double', where the 'char' types can only be one dimensional (e.g. size = 1X15), and the 'double array's can be one or two dimensional (e.g. sizes = 52X1, 1X52, or 30X344). Kirk Ireson kireson@ucsd.edu 3/9/1999 LETSROLL is a simple MATLAB script that demonstrates how a cycloid is made by tracing a point on a rolling circle. It is nothing flash, but a school teacher wanted a quick demo, and this is the result. I thought others might like it as well. There are two buttons: One named Let's Roll that starts the circle rolling, and another Quits. ENJOY! Peter Dunn (dunnp@dpi.qld.gov.au / dunn@romulus.sci.usq.edu.au) 05 June 1997 hilbert.m A .m-file which creates a square matrix with the indices of the hilbert space filling curve. A .m-file which creates vectors containing the row and column cooridinates for the hilbert space filling curve for an arbitrary sized matrix. Hlbrtcrv.m The Hilbert space filling curve has recently been introduced to digital halftoning as a scan order for spatial dithering. The advantage to using space filling curves is the error diffusion can be done in one dimension and the resulting patterns exhibit clustering. For related literature see works by Velho and Gomes in their book "Image processing for computer graphics." Daniel Leo Lau lau@eecis.udel.edu 6/26/98 group.m Returns the summary of the columns of X grouped by the first column of X. 'FUNC' is the summary function. If 'FUNC' is omitted, 'mean' is used. 'length' can be used to give a count of data by group. If X contains more than 2 columns, each additional row of 'FUNC' may contain a function to use to group each each additional column. If only one function is given, it is used to summarize all columns of X. If the function names are not the same length, pad the strings with trailing spaces. With one output argument, the summaries are returned in a table, [G Xbar]. With two output arguments, the group vector and table of summaries are returned as two variables, G and Xbar. Example: Summarize a set of data X grouped by measurement time TIME. The returned grouped table has columns: TIME, MEAN, STD, N group ([TIME X X X], ['mean '; 'std '; 'length']); Don R. Maszle maze@sparky.berkeley.edu 1/25/99 str2strs.m This function takes a deliminated string s and breaks it up into sub-strings stored in a cell-array of strings. Spaces in the string are converted to underscore characters. It is currently set-up for tab delimination though may be modified for another character. Any length string, of any number of elements, of any length elements are ok as input. The char function can then be used to recover the individual elements of the cell-array output. Tested with the Student version of Matlab 5. David Malicky University of Michigan malicky@umich.edu 5/11/98 This is a collection of Matlab files for time-frequency analysis. These programs are either a result of my research or something that I found useful enough to spend the time to implement (sometimes they even intersect). Included are: a rigorous implementation of time-frequency distributions (Cohen class), some quartic time-frequency distributions, atomic decomposition based on maximum likelihood estimation, fractional Fourier transform, time-varying filtering, and other useful little utilities. A README file is included and information on each function is available through the MATLAB "help" command. There isn't a manual, but you can find details in my papers at http://www.eecs.umich.edu/~jeffo or you can send me an email. Jeff O'Neill jeffo@eecs.umich.edu 9/3/98 FFTMSPEC Module Spectra of the Fourier Transform FFTMSPEC(XT,T) Plots the signal XT versus time T and the absolute value of the the discrete Fourier transform (DFT) of the signal vector XT versus the a frequency vector F. [XFM,F] = FFTMSPEC(XT,T) Returns the Fourier Transform of XT and the frequency range in vectors XFM and F. Jesus A. Rojas Zavarce. jrojasz@telcel.net.ve 12/07/98 fourgraph.m is a demo of ploting a fuorier series to a given one variable function; fourgraph.mat and draw.m are required for the demo to run. Yaniv Hollander. aet1417@aerodyne.technion.ac.il 2/26/98