\[\newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bff}{\mathbf{f}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bi}{\mathbf{i}} \newcommand{\bj}{\mathbf{j}} \newcommand{\bk}{\mathbf{k}} \newcommand{\bN}{\mathbf{N}} \newcommand{\bn}{\mathbf{n}} \newcommand{\bo}{\mathbf{0}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bs}{\mathbf{s}} \newcommand{\bT}{\mathbf{T}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\nv}{\mathbf{0}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\rA}{\mathrm{A}} \newcommand{\rB}{\mathrm{B}} \newcommand{\rC}{\mathrm{C}} \newcommand{\rD}{\mathrm{D}} \newcommand{\rE}{\mathrm{E}} \newcommand{\rF}{\mathrm{F}} \newcommand{\rG}{\mathrm{G}} \newcommand{\rH}{\mathrm{H}} \newcommand{\rI}{\mathrm{I}} \newcommand{\rJ}{\mathrm{J}} \newcommand{\rK}{\mathrm{K}} \newcommand{\rL}{\mathrm{L}} \newcommand{\rM}{\mathrm{M}} \newcommand{\rN}{\mathrm{N}} \newcommand{\rO}{\mathrm{O}} \newcommand{\rP}{\mathrm{P}} \newcommand{\rQ}{\mathrm{Q}} \newcommand{\rR}{\mathrm{R}} \newcommand{\rS}{\mathrm{S}} \newcommand{\rT}{\mathrm{T}} \newcommand{\rU}{\mathrm{U}} \newcommand{\rV}{\mathrm{V}} \newcommand{\rW}{\mathrm{W}} \newcommand{\rX}{\mathrm{X}} \newcommand{\rY}{\mathrm{Y}} \newcommand{\rZ}{\mathrm{Z}} \newcommand{\pv}{\overline} \newcommand{\iu}{\mathrm{i}} \newcommand{\ju}{\mathrm{j}} \newcommand{\im}{\mathrm{i}} \newcommand{\e}{\mathrm{e}} \newcommand{\real}{\operatorname{Re}} \newcommand{\imag}{\operatorname{Im}} \newcommand{\Arg}{\operatorname{Arg}} \newcommand{\Ln}{\operatorname{Ln}} \DeclareMathOperator*{\res}{res} \newcommand{\re}{\operatorname{Re}} \newcommand{\im}{\operatorname{Im}} \newcommand{\arsinh}{\operatorname{ar\,sinh}} \newcommand{\arcosh}{\operatorname{ar\,cosh}} \newcommand{\artanh}{\operatorname{ar\,tanh}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\diag}{\operatorname{diag}} \newcommand{\proj}{\operatorname{proj}} \newcommand{\rref}{\operatorname{rref}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\Span}{\operatorname{span}} \newcommand{\vir}{\operatorname{span}} \renewcommand{\dim}{\operatorname{dim}} \newcommand{\alg}{\operatorname{alg}} \newcommand{\geom}{\operatorname{geom}} \newcommand{\id}{\operatorname{id}} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\tp}[1]{#1^{\top}} \renewcommand{\d}{\mathrm{d}} \newcommand{\sij}[2]{\bigg/_{\mspace{-15mu}#1}^{\,#2}} \newcommand{\abs}[1]{\lvert#1\rvert} \newcommand{\pysty}[1]{\left[\begin{array}{@{}r@{}}#1\end{array}\right]} \newcommand{\piste}{\cdot} \newcommand{\qedhere}{} \newcommand{\taumatrix}[1]{\left[\!\!#1\!\!\right]} \newenvironment{augmatrix}[1]{\left[\begin{array}{#1}}{\end{array}\right]} \newenvironment{vaugmatrix}[1]{\left|\begin{array}{#1}}{\end{array}\right|} \newcommand{\trans}{\mathrm{T}} \newcommand{\EUR}{\text{\unicode{0x20AC}}} \newcommand{\SI}[3][]{#2\,\mathrm{#3}} \newcommand{\si}[2][]{\mathrm{#2}} \newcommand{\num}[2][]{#2} \newcommand{\ang}[2][]{#2^{\circ}} \newcommand{\meter}{m} \newcommand{\metre}{\meter} \newcommand{\kilo}{k} \newcommand{\kilogram}{kg} \newcommand{\gram}{g} \newcommand{\squared}{^2} \newcommand{\cubed}{^3} \newcommand{\minute}{min} \newcommand{\hour}{h} \newcommand{\second}{s} \newcommand{\degreeCelsius}{^{\circ}C} \newcommand{\per}{/} \newcommand{\centi}{c} \newcommand{\milli}{m} \newcommand{\deci}{d} \newcommand{\percent}{\%} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Corr}{\operatorname{Corr}} \newcommand{\Tasd}{\operatorname{Tasd}} \newcommand{\Ber}{\operatorname{Ber}} \newcommand{\Bin}{\operatorname{Bin}} \newcommand{\Geom}{\operatorname{Geom}} \newcommand{\Poi}{\operatorname{Poi}} \newcommand{\Hyperg}{\operatorname{Hyperg}} \newcommand{\Tas}{\operatorname{Tas}} \newcommand{\Exp}{\operatorname{Exp}} \newcommand{\tdist}{\operatorname{t}} \newcommand{\rd}{\mathrm{d}}\]

Suhteellisen osuuden testaus¶

Oletetaan, että \(X\sim\Bin(n,p)\), missä \(p\) on onnistumisen todennäköisyys \(n\)-toistokokeen yksittäisessä toistossa. Aikaisemmin on osoitettu, että \(\hat{P} = \frac{1}{n}X\) on suhteellisen osuuden \(p\) harhaton estimaattori. Muuttujan \(\hat{P}\) odotusarvo ja varianssi ovat

\[\rE(\hat{P})= p \ \ \textrm{ ja } \ \ \Var(\hat{P}) = \frac{p(1 - p)}{n}\]

Keskeisen raja-arvolauseen nojalla

\[\hat{P} \stackrel{.}{\sim} \rN\left(p, \frac{p(1 - p)}{n}\right),\]

kun otoskoko \(n\) on riittävän suuri ja standardoitu suhteellinen osuus

\[Z = \frac{\hat{P} - p}{\sqrt{p(1 - p)/n}} \sim \rN(0, 1).\]

Usein kiinnostuksen kohteena on suhteellinen osuus \(p\) ja siinä tapahtuvat muutokset. Asetetaan nollahypoteesiksi \(H_0: p=p_0\). Tämän hypoteesin testaamiseen sopii testisuure

\[Z = \frac{\hat{P} - p_0}{\sqrt{p_0(1 - p_0)/n}} \sim \rN(0, 1)\]

ja testisuureen realisoitunut arvo otoksessa on

\[z = \frac{\hat{p} - p_0}{\sqrt{p_0(1 - p_0)/n}}\]

Eri vaihtoehtoisia hypoteeseja vastaavat kriittiset alueet ja \(p\)-arvot on tiivistetty seuraavaan taulukkoon. Siinä testisuureelle \(Z\) realisoitunutta arvoa merkitään symbolilla \(z\), ja merkinnöillä \(z_{\alpha}\) ja \(z_{\alpha/2}\) tarkoitetaan lukuja, joille \(\Phi(z_{\alpha}) = 1 - \alpha\) ja \(\Phi(z_{\alpha/2}) = 1 - \frac{\alpha}{2}\).

\[\begin{split}\begin{array}{c c c}\hline H_1 & \text{kriittinen alue} & p\text{-arvo} \\\hline p < p_0 & (-\infty, -z_{\alpha}) & \Phi(z) \\ p > p_0 & (z_{\alpha}, \infty) & 1 - \Phi(z) \\ p \not= p_0 & (-\infty, -z_{\alpha/2}) \cup (z_{\alpha/2}, \infty) & 2\min\{\Phi(z), 1 - \Phi(z)\} \\\hline \end{array}\end{split}\]

Esimerkki 6.3.1

Yrityksen markkinaosuus on ollut aikaisemmin 35%. Yritys toteuttaa kyselyn, jossa 200 vastaajasta 82 sanoi käyttävänsä tämä yrityksen palveluja. Voidaanko tämän tuloksen perusteella päätellä, että markkinaosuus on kasvanut.

Nyt nollahypoteesiksi valitaan, että markkinaosuus ei ole muuttunut ja vaihtoehtoinen hypoteesi on, että osuus on kasvanut

\[H_0: p=0.35,\quad H_1: p> 0.35\]

Tämän otoksen (=kysely) perusteella laskettu estimaatti markkinaosuudeksi on \(\hat{p}=82/200=0.41\) ja testisuure oletettaessa nollahypoteesi todeksi on

\[Z = \frac{\hat{P} - 0.35}{\sqrt{0.35(1 - 0.35)/200}} \sim \rN(0, 1)\]

Tämän otoksen (=kysely) perusteella laskettu estimaatti markkinaosuudeksi on \(\hat{p}=82/200=0.41\) ja siitä laskettu testisuureen arvo \(z= 1.779\).

Koska \(H_1: p> 0.35\), lasketaan \(p-\)arvo jakauman oikeasta reunasta ja \(p=1-\Phi(1.779)=0.0376\). Tehtävänannossa ei ole mainittu mitään merkitsevyystasosta. Jos merkitsevyystasoksi valittaisiin \(\alpha = 0.05\) olisi johtopäätöksenä nollahypoteesin hylkääminen ja markkinaosuus olisi kasvanut. Merkitsevyystasolla \(\alpha = 0.01\) nollahypoteesi jäisi voimaan.

Käytettäessä kriittistä aluetta johtopäätöksen tekemiseen saadaan \(\alpha = 0.05\) mukaan kriittiseksi alueeksi \(C=[1.645, \infty)\), sillä \(P(z<1.645)=0.95\). Alaraja \(1.645\) saadaan taulukosta, Matlabilla norminv(0.95) tai R:llä qnorm(0.95). Testisuureen arvo \(z=1.779\) osuu kriittiselle alueelle, joten merkitsevyystasolla 0.05 nollahypoteesi hylätään.

Ohjelmilla \(p-\)arvon saa toistamalla laskut. Matlabilla

   z = (0.41-0.35)/sqrt(0.35*(1-0.35)/200) % testisuureen arvo
   p = 1- normcdf(z) % p-arvo

ja R:llä

   z <- (0.41-0.35)/sqrt(0.35*(1-0.35)/200) # testisuureen arvo
   p <- 1- pnorm(z) % p-arvo

R:llä mosaic-paketista löytyy funktio binom.test, joka suorittaa suhteelliseen osuuteen liittyvän testin. Komennolla

   binom.test(x=82, n=200, p=0.35, conf.level=0.95,
              alternative="greater")

saadaan tuloksena mm. testisuureen arvo, sille laskettu luottamusväli ja testin johtopäätös. Saatu \(p\)-arvo poikkeaa hieman tässä esimerkissä lasketusta arvosta.