p-value

A p-value is the probability of observing data at least as favorable to $H_A$ as our current data set, if in fact $H_0$ were true.

If a p-value is low (usually lower than 5%) then we are able to reject $H_0$.

Calculatig a p-value

The p-value can be calculated as the percentile of the normal distribution given $\bar{x}$, $\sigma$, and $\mu$.:

$$P(\bar{x}>9.7|\mu=8|\sigma=0.5)=0.0003$$

We can also represent this in terms of Z:

$$P(Z>3.4)=0.0003$$

We can also implement this in Julia/R:

# Note the '1-' to account for the > rather than <.
1-pnorm(9.7,8,0.5)
0.0003369292656768552

Simulating for a p-value

We can compare our results to a simulation to calculate a p-value:

function simulate(success_count, fail_count)
    g1 = []
    g2 = []

    for i in 1:success_count
        if rand(1:2) == 1
            append!(g1, 'A')
        else  
            append!(g2, 'A')
        end
    end
    for i in 1:fail_count
        if rand(1:2) == 1
            append!(g1, 'B')
        else  
            append!(g2, 'B')
        end
    end

    return (g1, g2)
end

differences = []
simulation_count = 10000
for i in 1:simulation_count
    g1, g2 = simulate(35, 13)
    append!(differences, (length(findall(g1 .== 'A'))/length(g1))-(length(findall(g2 .== 'A'))/length(g2)))
end

gr()
histogram(differences, bins=:scott, labels=["difference"])
plot!(title = "Frequency of Difference over $(simulation_count) simulations")

svg

# Calculate P value
length(findall(x -> (x >= 0.3) || (x <= -0.3),differences))/length(differences)
0.0229