In the running of a clinical trial, much laboratory data has been collected and hand entered into a data base. There are 50 different lab tests and approximately 1000 values for each test, so there are about 50,000 data points in the data base. To ensure accuracy of these data, a sample must be taken and compared against source documents (i.e. printouts of the data) provided by the laboratories that performed the analyses.
The study manager for the trial can allocate resources to check up to 15% of the data and he wants the QC efforts to be focused on checking outlier values so that clinically improbable or impossible values may be identified and reviewed. He suggests that the sample consist of the 75 highest and 75 lowest values for each lab test since that represents about 15% of the data. However, he would be delighted if there was a way to select less than 15% of the data and thus free up resources for other study tasks.
The study statistician is consulted. He suggests calculating the mean and standard deviation for each lab test and including in the sample only the values that are more than 3 standard deviations from the mean.
Given that the study manager wants the QC efforts to be focused on selecting outlier values, whose method is a better way of selecting the sample? Why? Using what you have learned about measures of central tendency and dispersion, how would your answer change if you knew the data were not normally distributed? Explain your reasoning and answers.
Part 2: Box Plot (1 Point)
Draw a box plot using the Weight data (at screening) from the Lipids data set located in the Group Project assignment in Canvas (Week 9 Module). Provide the values for the 5-number summary, upper and lower fences, and comment on the data (skewness, outliers, etc.) in your post. Copy/paste box plots directly into your comments. If it is necessary to attach a file of your box plot then use a PDF file.In the running of a clinical trial, much laboratory data has been collected and hand entered into a data base. There are 50 different lab tests and approximately 1000 values for each test, so there are about 50,000 data points in the data base. To ensure accuracy of these data, a sample must be taken and compared against source documents (i.e. printouts of the data) provided by the laboratories that performed the analyses.
The study manager for the trial can allocate resources to check up to 15% of the data and he wants the QC efforts to be focused on checking outlier values so that clinically improbable or impossible values may be identified and reviewed. He suggests that the sample consist of the 75 highest and 75 lowest values for each lab test since that represents about 15% of the data. However, he would be delighted if there was a way to select less than 15% of the data and thus free up resources for other study tasks.
The study statistician is consulted. He suggests calculating the mean and standard deviation for each lab test and including in the sample only the values that are more than 3 standard deviations from the mean.
Given that the study manager wants the QC efforts to be focused on selecting outlier values, whose method is a better way of selecting the sample? Why? Using what you have learned about measures of central tendency and dispersion, how would your answer change if you knew the data were not normally distributed? Explain your reasoning and answers.
Part 2: Box Plot (1 Point)
Draw a box plot using the Weight data (at screening) from the Lipids data set located in the Group Project assignment in Canvas (Week 9 Module). Provide the values for the 5-number summary, upper and lower fences, and comment on the data (skewness, outliers, etc.) in your post. Copy/paste box plots directly into your comments. If it is necessary to attach a file of your box plot then use a PDF file.