r studio

# LAB 7

# DATE: 10/19/2023

# Set your directory:

setwd(“/Users/HP/Desktop/sem 1 fall 2023/political science/LAB 7”) # make sure to change it to your directory

# Load the dataset with the population

beads_population = read.csv(“beads.csv”)

# Let’s create a new variable indicating whether the bead is red

beads_population$red = ifelse(beads_population$color==”red bead”, 1, 0)

# Let’s plot the distribution of red and blue beads

hist(beads_population$red) # Is this distribution normal?

#————————————

# Let’s draw a few random samples from the population

# Sample 1

sample_1 = sample(beads_population$red, 100) # randomly selects 100 beads from the population

table(sample_1) # number of red (1) and blue (0) beads

mean(sample_1) # proportion of red beads

hist(sample_1) # plot the distribution of red (1) and blue (0) beads

# Sample 2

sample_2 = sample(beads_population$red, 100) # randomly selects 100 beads from the population

table(sample_2) # number of red (1) and blue (0) beads

mean(sample_2) # proportion of red beads

hist(sample_2) # plot the distribution of red (1) and blue (0) beads

# Sample 3

sample_3 = sample(beads_population$red, 100) # randomly selects 100 beads from the population

table(sample_3) # number of red (1) and blue (0) beads

mean(sample_3) # proportion of red beads

hist(sample_3) # plot the distribution of red (1) and blue (0) beads

#————————————

# According to the central limit theorem, the distribution of sample

# means will be normally distributed. Let’s check if our sample means

# are normally distributed:

means_sample_1_2_and_3 = c(mean(sample_1), mean(sample_2), mean(sample_3))

hist(means_sample_1_2_and_3)

#————————————

# Note that the central limit theorem states that the if we were to select

# an INFINITE number of samples, the distribution of their the means would be

# normally distributed. In the previous example, we only had three samples.

# Let’s draw a larger number of samples (1000 samples):

sampling_distribution_1000 = data.frame(sample_number = paste0(“Sample “, seq(1:1000)) , mean_selected_sample = rep(NA, 1000) )

for (i in 1:1000){

selected_sample = sample(beads_population$red, 100)

sampling_distribution_1000$mean_selected_sample[i] = mean(selected_sample)

print(paste0(“The mean for sample “, i, ” is “, mean(selected_sample)) )

}

hist(sampling_distribution_1000$mean_selected_sample) # Is this distribution normal?

# Let’s draw an even larger number of samples (10000 samples):

sampling_distribution_10000 = data.frame(sample_number = paste0(“Sample “, seq(1:10000)) , mean_selected_sample = rep(NA, 10000) )

for (i in 1:10000){

selected_sample = sample(beads_population$red, 100)

sampling_distribution_10000$mean_selected_sample[i] = mean(selected_sample)

print(paste0(“The mean for sample “, i, ” is “, mean(selected_sample)) )

}

hist(sampling_distribution_10000$mean_selected_sample) # Is this distribution normal? It certainly looks like a normal distribution

# The central limit theorem is true! If we were to select

# an INFINITE number of samples, the distribution of their means would be

# normally distributed.

#————————————

# The central limit theorem also states that the mean of the sampling distribution

# should converge to the population mean as the number of samples goes to infinity.

mean(beads_population$red)

mean(sampling_distribution_10000$mean_selected_sample) # this value is very close to the population mean!

#————————————

# Based on the distribution of means, we can find out what is the likely

# percentage of red beads in the population.

# What is the probability that the population mean is equal to 0.45? Smaller than 1%.

quantile(sampling_distribution_10000$mean_selected_sample, 0.95)

quantile(sampling_distribution_10000$mean_selected_sample, 0.99)

# What is the probability that the population mean is equal to 0.15? Smaller than 1%.

quantile(sampling_distribution_10000$mean_selected_sample, 0.01)

quantile(sampling_distribution_10000$mean_selected_sample, 0.05)

POSC 3003: Introduction to Political Analysis

Independent Data Project

Stage Four – Analyzing the Data

1. In one paragraph, describe your theory.

2. Indicate your hypothesis (i.e., the direction of the relationship between the two variables).

3. Indicate the two variables you believe could make the relationship between your dependent and independent variables spurious (these are the two Z variables you have added to your multiple regression). Importantly, indicate why you believe they could be confounding variables.

4. Add your regression table. Remember to modify the name of each variable and the table’s title. For example, a professional table should not contain “_.”

5. Based on the regression results presented in your table, are you able to reject the null hypothesis in favor of your hypothesis? Make sure to interpret each coefficient (including whether they are statistically significant) and the R-squared.

6. Add your plot with the predicted values. What can we learn from the plot? Provide a full description.

7. Given your empirical results, are you confident in your theory? Explain why.

Quick Links

Legal

Other