verify the assumption that the two populations are (i) normally distributed and (ii) have equal variance display the data using a box-and-whisker plot
Data Treatment for Biologists
Paper, Order, or Assignment Requirements
Please follow the instructions in mentioned in the Assignment.
explain the results of each case study in a paragraph after each one.
modules 1-4 is added to help you in knowing how to solve these case studies.
PART B: don’t forget to write about it.
do NOT EVER copy something from anywhere as there are many students are doing the same topic and TURNITIN should be zero!
Research Methods
Data Treatment for Biologists
Part A : Workshop
The aim of this workshop is to give you some experience using standard statistical treatments of data using statistical software. The package we will use for this workshop is Minitab. It is the package used as the standard statistical software by the Mathematics and Statistics courses at RMIT and Chemistry also has a site licence for it. Most of the analyses in this workshop can be done using Excel but Excel is not very user-friendly for statistical analyses. The analyses in Minitab can be done from simple pull-down menus and there is a good on-line help facility. You can copy-and-paste data from Excel into Minitab. For the assessment you should enter your results into the attached pro forma.
Before attempting this assignment you should try the examples in modules 1-4
Case Study 1
18% protein Diet 5% protein Diet
13.3 5.1
16.3 8.7
9.9 8.7
9.3 8.5
16.1 8.1
9.7 6.9
9.7 6.9
14.1 12.3
It is believed that nutritional deprivation affects various components of the immune system, such as the tuberculin skin reactivity. In this study a sample of 8 male rats were fed with a normal diet of 18% protein. Another sample of rats were fed with a diet of only 5% protein. After 4 weeks, the rats were given an interdermal injection of 25µg of purified protein derivative of tuberculin. The above table gives the skin reactivity diameter of erythema and induration (in mm) for the 2 groups
Determine the mean, variance, standard deviation and 95% confidence interval for each data set
verify the assumption that the two populations are (i) normally distributed and (ii) have equal variance
display the data using a box-and-whisker plot
use a t-test to determine if there is a significant difference between the tuberculin reactivity of normal and malnourished rats
in Excel, create a bar graph for each group to compare the means. Include ‘error bars’ showing the confidence intervals (there is a sample spreadsheet showing how to construct ‘error bars’ in Course Documents, data Treatment folder)
Case Study 2
Germinated Did not germinate Total
Old Strain 125 15 140
New Strain 152 8 160
Total 277 23 300
The above table is a comparison of the germination rate of a new plant against an old strain of the same plant
Test whether there is a significant difference between the rates of germination of the strains (at the 95% level)
Case Study 3
Fertilizer Blend
Farm U V W X Y Z
1 1130 1125 1350 1375 1225 1235
2 1115 1120 1375 1200 1250 1200
3 1145 1170 1235 1175 1225 1155
4 1200 1230 1140 1325 1275 1215
A trial of 6 different blends of fertilizers (U-Z) has been carried out on linseed crop on 4 different farms 91-4). The crop yields of linseed are given in the table. Carry out a 2-way ANOVA
(a) is there a significant difference between farms
(b) is there a significant difference between fertilizers
Case Study 4
SBP (y) DBP (x) SBP (y) DBP (x)
112 63 156 100
120 69 124 82
135 70 99 56
142 82 105 65
132 76 124 73
115 67 144 89
119 71 134 76
128 73
Systolic arterial blood pressure (SBP) and diastolic arterial blood pressure (DBP) are tabulated above for 15 men aged 40-65
carry out linear regression on this data
give the 95% confidence intervals of the slope and intercept
test whether there is a significant relationship between SBP and DBP for this group
use the regression equation to estimate the expected SBP of a man aged 40-65 whose DBP is 75
Part B You are to carry out an evaluation of your project , in terms of the data collection and treatment aspects of the project. This is to be presented as a brief summary , set out as follows.
Project Overview
Give the project title (including supervisor). State the aims of the project – what do you want to achieve? Why is the study being carried out?
Define the response(s)
What is being measured? List your types of responses. Are these responses qualitative or quantitative? If qualitative can they be turned into quantitative responses (e.g by giving a score or rating). Are they discrete or continuous?
Define the Factors
What factors (variables) affect your results (responses)?
Rank the factors – known to influence, suspect to influence, unknown effect
Divide the factors into controllable and uncontrollable
Identify sources of error
What are the sources of error in your study? How can they be minimised? You need to consider the effect of sampling – usually you cannot test the whole population so you want to take a sample of the population. How do you select the sample? How big should the sample be?
Notes for Part A:
Analysis: Basic Statistics (Question 1)
Open up Minitab (select from the Start Menu, Programs , under SAS)
When you open the program you will notice it is divided into two areas – the data area (lower screen) and the output area. Enter data from the above table in columns C1 and C2.
Warning: make sure you start entering data in row 1 NOT in the cell immediately below the column heading (C1 etc). This cell is reserved for column labels (you may put a label here like ‘18% diet’). Also make sure you don’t enter a column label in row 1. The whole column will then be formatted as text (C1-T) and cannot be used for analysis. If this happens delete the whole column and start again (clicking on ‘C1’ will highlight the whole column).
To get descriptive statistics click on Stat => Basic Statistics => Display Descriptive Statistics to get the basic statistics dialog box. Highlight C1 and C2 on the left and then click ‘Select’. Alternatively you can click in the Variable box and type C1 C2 . Click ‘Statistics’ then check ‘variance’. Then click OK and the output will appear in the output window. From the output data enter the values in the pro forma.
Confidence Intervals
The confidence intervals for the mean can be obtained as follows: Stat => Basic Statistics => 1-sample t. Select columns 1 and 2.
The confidence interval is of the form (low value, high value). To express ie interval in the form of ‘mean +/- deviation’ calculate the deviation as 0.5*(high – low)
Normally Distributed Data
A normality test can be carried out as follows:- Stat => basic Statistics => Normality test. Select the first column and accept other defaults. Repeat the test for the second column. Examine the probability plot.
Test for Equal Variance – F test
Use Stat => Basic Statistics => 2 variances. Check ‘samples in different columns’ . Select column C1 in the first and C2 in the second (note that for a F test the variable with the larger variance must be the first one selected). Use the defaults but under Graphs select boxplot (box-and-whisker plot)
Hypothesis Testing
We now want to test whether the 2 diets sample deviates significantly from each other We need to formulate the null hypothesis (Ho). In all statistical testing the probability is then calculated of the null hypothesis being true. If there is a low probability (usually < 5% or p = 0.05) of Ho being true we reject it and accept the alternative (H1). The null hypothesis generally considers any deviations as being just due to chance/ experimental error. In question (c ) we are looking a null hypothesis of the analytical result not being significantly different from the certified value i.e the mean value is actually 6.49. In question (d) our null hypothesis is that the two means are equal.
Comparison of Means
For question (d) we apply a t test, to compare two means: Stat = Basic Statistics => 2-sample t. Click on ‘Samples in different columns’ Click ‘First’ box and then double click on C1 in the variables column and similarly for C2 as ‘Second’. Check ‘assume equal variances’. The 95% confidence level given in the output is for the difference between the two means. The probability that this difference is actually zero (or not significantly different from zero) is given at the end of the output.
The Chi Squared Test (Case Study 2)
Enter the data into 2 columns in Minitab. Do not enter the totals – just the counts. You should have a 2×2 table.
Stat=> Tables => Chi squared test (Two-way variable in Worksheet)
Select the 2 columns of data, OK
How is the data analysed? The Chi2 test is calculated as :-
sum(observed cell count – expected cell count)2/expected cell count
but how do we calculate the expected cell count?
Expected cell count = (row total)*(column total)/grand total
The null hypothesis in this case is Ho: the proportions according to the row and column classifications are the same e.g if you were computing voting preferences of men and women the proportion voting liberal would be the same for each sex.
2 way Analysis of Variance (Case Study 3)
Two Way Analysis of Variance
In this study in that there are two variables or factors – farm and fertilizer blend. The data needs to be set out as follows:-
In one column enter all 24 crop yields (1130 …1215)
You also need two coding columns. Make one column the code for farm and give a code (1 – 4) for each farm.
Enter in a third column the code (1-6) for the fertilizer blend. Thus the first value (1130) will have 1130,1,1 in the three columns while the last value (1215) would have 1215,4,6 (i.e farm 4 and blend Z)
Carry out the two way ANOVA:- Stat=> ANOVA => 2-Way. In the response field enter the column for yields and enter the other two variables in the row and column boxes. Check the ‘display means’ boxes.
The output should be a typical ANOVA table (see the notes :Chemometrics Unit 1 for a full explanation of the ANOVA table). The key values are again the p values (p that Ho is true). Because there are two variables there are now null hypotheses for each variable (e.g no significant difference between farms i.e mean [yield] for farm 1 = mean [yield] for farm2 …). As with all our previous testing the p value is the probability that this is true and we reject Ho if p is low ( < 0.05) and hence conclude there is a significant difference
Minitab gives a diagram which can help in interpreting the results, showing each mean and confidence interval. Two results differ significantly if their CI’s don’t overlap. Note, however, that Minitab uses a pooled CI so they are all the same size. The diagram is thus just an indication but is still quite useful.
In 2 way ANOVA the possibility of variable interaction is also tested. An interaction means, for example, that blend differences depend on the farm. If we see blend differences with farm 1 but not farm 2 this would be an interaction effect. The diagram of means and CIs can be an indication of where differences occur.
Linear Regression (Case Study 4)
The analysis can be carried out as follows:-
Enter the data into 2 columns
Stat => Regression => Regression. Enter the Y column in the response box and the X column in the predictors box. Click on options and in the ‘prediction intervals for new responses’ enter ‘75’ (note if you have more than one X for prediction you can enter them in a new column and put the column in this box).
The output gives you the model (the regression equation), values of the intercept (constant) and gradient (predictor) with statistical information on these parameters. A full ANOVA table is also shown . For full interpretation of this output you should consult the ‘Calibration and Modelling of Data’ notes.
The t tests determine whether the gradient or the intercept are significantly non-zero.
(Again, check the p values)
The confidence intervals for the gradient and intercept can be determined as +/- sa*tn-2,.05 and similarly for sb . sa and sb are the standard deviations of gradient and intercept respectively (in Minitab, called the ‘standard error’ in the regression table). t is the critical t value for n-2 (n = number of pairs of data) degrees of freedom and 0.05 significance level. This value can be obtained from t tables.
At the end of the output is the predicted Y when X = 75, along with the confidence (CI) and prediction (PI) intervals. The full meaning of these terms is explained in the regression notes.
Name………………………… Student Number ………………………
Case Study 1
(a)
Basic Statistics 18% Diet 5% Diet
Mean
Standard Deviation
Variance
Confidence Interval (95%)
(b) Are both data sets normally distributed? ……………….. Reason?…………………..
Equal variance?……………………………….. Reason?…………………………..
(c) pastea copy of the box-and-whisker plot
(d)
Test
Null Hypothesis p Significant?
Reactivity difference between normal and malnourished rats
(e) paste a copy of the Excel graph
Case Study 2
Null Hypothesis(Ho) ……………………………………
Alternative Hypothesis(H1) ………………………………………..
p ……………… significant? ……………………………………
Case Study 3:
F p Significant?
Fertilizer Blend
Farm
Case Study 4
Predicted equation (model)
p for hypothesis (gradient = 0)
Significant? i.e. is the gradient non-zero?
Standard deviation of slope (sb)
t (from tables)
Confidence intervals for gradient (+/- tsb)
Standard deviation of intercept (sa)
Confidence Intervals for Intercept (+/- sa)
Predicted SBP for DBP = 75