Imagine that you’re working at an eCommerce company and that they are doing modifications to their pricing page. They are wondering if these changes will increase or decrease the average time a user spends on the page.

Your task is to monitor the pricing page and report any statistically significant change in the average time spent on page.

Sorry for the childish writing, I’m still getting used to the Wacom tablet I just bought.

The information we have in hand is that the population (which is an assumption) average time spent on the site in 5 minutes and 30 seconds, or 330 seconds. We also recorded the average time spent on the page for 10 new visitors. Is the average time spent on the page by the new visitors the same as our population, or is it different? Let’s write our hypotheses:

Now let’s perform the t-test in R:

1 2 3 4 5 6 7 |
data <- c(95,167,556,345,272,361,381,470,239,264) plot(density(data)) data.mean <- mean(data) t.test(x = data, mu = 330, alternative = "two.sided") |

Since we accept H0, we can say that there’s no statistically significant change in the average time spent on the new page vs the old page.

To make sure this analysis is valid, there are 2 conditions that we need to verify:

- Data is independent: we assume it is, but technically a user could come multiple times to the site and therefore it would not be true.
- Normality of residuals:

1 2 3 |
residuals <- data - data.mean qqnorm(residuals) qqline(residuals) |

And so you can see that the residuals don’t look normal at all. This is a problem.

You might want to try some transformations on the data to fix that situation. I’ll let you play with that, but in the meantime we can’t trust the conclusion.

However, the goal was to show you one way to solve such a problem. Enjoy ðŸ™‚