Datasett

(sist revidert: 23.11.00)

© Tom Backer Johnsen

Datasett 1 : Glasses

Datasettet er fra Weindling, A.M., Bamford, F.N. & Whittall R.A. (1986) Health of juvenile Delinquints, British Medical Journal, 292, 447

"The following data come from a study comparing the heath of juvenile delinquent boys and a non-delinquent control group.  They relate to the subset of the boys who failed a vision test, and show the numbers that did and did not wear glasses.  Are delinquents with poor eyesight more or less likely to wear glasses than are nondelinquents with poor eysight ?  There is insufficient data for a chi-squared test, but Fishers exact test is possible" (Hand, 1994)

1   5  |  6
8   2  | 10
-----------
9   7  | 16

Datasett 2 : Enkel Linær regresjon

Data for simple linear regression practical P3/10(A-D). Four different sets of 11 data points (x, y) are given, the x-values being the same for each set. See "Statistics: Problems and Solutions", by E.E. Bassett et.al., problem 5A.3.

 y(A) y(B) y(C) y(D)
10.0 21.26 22.40 20.70 22.94
 8.0 20.57 21.84 20.43 19.85
13.0 20.15 21.35 25.31 19.65
 9.0 22.28 22.15 20.56 20.25
11.0 21.36 22.27 20.80 22.38
14.0 22.30 20.43 21.19 23.84
 6.0 21.35 20.25 20.21 20.55
 4.0 18.81 17.66 19.90 19.53
12.0 23.63 21.90 20.95 21.55
 7.0 18.73 21.15 20.31 19.72
 5.0 20.01 19.05 20.09 20.19

Datasett 3 : SIMPLE LINEAR REGRESSION - PRACTICAL

It is considered that a linear regression model may be appropriate for the analysis of the following data, consisting of 11 observations of a response variable y and an explanatory variable x.

Analyse the data and, in particular, find 95% confidence intervals for:

(a) the slope of the regression line;

(b) the expected value of y, given x = 10.

Plot the data, and comment on any significant features revealed.

  x    y
10.0 21.26 
 8.0 20.57 
13.0 20.15 
 9.0 22.28 
11.0 21.36 
14.0 22.30 
 6.0 21.35 
 4.0 18.81 
12.0 23.63 
 7.0 18.73 
 5.0 20.01

Datasett 4 : SIMPLE LINEAR REGRESSION - PRACTICAL

It is considered that a linear regression model may be appropriate for the analysis of the following data, consisting of 11  observations of a response variable y and an explanatory variable x.

Analyse the data and, in particular, find 95% confidence intervals for

Plot the data, and comment on any significant features revealed.

 x y
10.0 22.40
 8.0 21.84 
13.0 21.35 
 9.0 22.15 
11.0 22.27 
14.0 20.43 
 6.0 20.25 
 4.0 17.66 
12.0 21.90 
 7.0 21.15 
 5.0 19.05

Datasett 5 : SIMPLE LINEAR REGRESSION - PRACTICAL

It is considered that a linear regression model may be appropriate for the analysis of the following data, consisting of 11 observations of a response variable y and an explanatory variable x.

Analyse the data and, in particular, find 95% confidence intervals for (a) the slope of the regression line; (b) the expected value of y, given x = 10. Plot the data, and comment on any significant features revealed.

x y
10.0 20.70
 8.0 20.43
13.0 25.31
 9.0 20.56
11.0 20.80
14.0 21.19
 6.0 20.21
 4.0 19.90
12.0 20.95
 7.0 20.31
 5.0 20.09

Datasett 6 : SIMPLE LINEAR REGRESSION - PRACTICAL

It is considered that a linear regression model may be appropriate for the analysis of the following data, consisting of 11 observations of a response variable y and an explanatory variable x. Analyse the data and, in particular, find 95% confidence intervals for

Plot the data, and comment on any significant features revealed.

  x    y
10.0 22.94
 8.0 19.85
13.0 19.65
 9.0 20.25
11.0 22.38
14.0 23.84
 6.0 20.55
 4.0 19.53
12.0 21.55
 7.0 19.72
 5.0 20.19

Datasett 7 : SIMPLE LINEAR REGRESSION - PRACTICAL

It is considered that a linear regression model may be appropriate for the analysis of the following data, consisting of 11 observations of a response variable y and an explanatory variable x.

Analyse the data and, in particular, find 95% confidence intervals for (a) the slope of the regression line; (b) the expected value of y, given x = 10. Plot the data, and comment on any significant features revealed.

x y 
8.0 21.20 
8.0 18.74 
8.0 21.98
8.0 19.70 
8.0 22.66 
8.0 19.32 
8.0 21.62 
8.0 20.91 
8.0 20.13
8.0 20.44
19.0 23.75

Datasett 8 : Horsekicks

Dette er et av de mere absurde datasettene jeg har kommet over i litteraturen.   Kilden er :

Preece, D.A.; Ross, G.J.S.; and Kirby, S.P.J (1988) Borthkewitsch's horse-kicks and the Generalized Linear Model, The Statistician, 37, 313-318.

Hand et.al (1994) note : "The "Horse-kicks" are among the most well-known and least understood collections.  They summarize the number of Prussian Militaerpersonen killed by kicks of a horse for each of the 14 corps in each of 20 successive years 1875-1894.  In this paper the data are revisited. 

Most often they appear as summary data (A: 196 deaths during 280 corps-years) and show moderately good agreement with the poisson distribution.  Borthekewitsch noted that four of the corps were less representative that the others. After removing these (G, I, IV, and XI) the poission agreement is very good indeed."

  0  1  2  3  4
144 91 32 11  2
  0  1  2  3  4
109 65 22  3  1
--------------------------------------------------
1875  0  0  0  0  0  0  0  1  1  0  0  0  1  0   3
1876  2  0  0  0  1  0  0  0  0  0  0  0  1  1   5
1877  2  0  0  0  0  0  1  1  0  0  1  0  2  0   7
1878  1  2  2  1  1  0  0  0  0  0  1  0  1  0   9
1879  0  0  0  1  1  2  2  0  1  0  0  2  1  0  10
1880  0  3  2  1  1  1  0  0  0  2  1  4  3  0  18
1881  1  0  0  2  1  0  0  1  0  1  0  0  0  0   6
1882  1  2  0  0  0  0  1  0  1  1  2  1  4  1  14
1883  0  0  1  2  0  1  2  1  0  1  0  3  0  0  11
1884  3  0  1  0  0  0  0  1  0  0  2  0  1  1   9
1885  0  0  0  0  0  0  1  0  0  2  0  1  0  1   5
1886  2  1  0  0  1  1  1  0  0  1  0  1  3  0  11
1887  1  1  2  1  0  0  3  2  1  1  0  1  2  0  15
1888  0  1  1  0  0  1  1  0  0  0  0  1  1  0   6
1889  0  0  1  1  0  1  1  0  0  1  2  2  0  2  11
1890  1  2  0  2  0  1  1  2  0  2  1  1  2  2  17
1891  0  0  0  1  1  1  0  1  1  0  3  3  1  0  12
1892  1  3  2  0  1  1  3  0  1  1  0  1  1  0  15
1893  0  1  0  0  0  1  0  2  0  0  1  3  0  0   8
1894  1  0  0  0  0  0  0  0  1  0  1  1  0  0   4
--------------------------------------------------
     16 16 12 12  8 11 17 12  7 13 15 25 24  8 196

Referanser

Hand, D. J., Daly, A. D., Lund, K. J., McConway, & Ostrowsky, E. (1994). A Handbook of Small Data Sets. London: Chapman & Hall.