## Alan Agresti, Barbara Finlay

## Chapter 3

## Descriptive Statistics - all with Video Answers

## Educators

Chapter Questions

According to the Bureau of the Census (Current Population Reports), in 1994 in the United States there were 23.6 million households with one person, 31.2 nillion with two persons, 169 miltion with three persons, 15.1 million with four persons, 6.7 million with five persons. 2.2 million with six persons, and 1.4 million with seven or more persons

a) Construct a relative frequency distibution.

b) Constuct a histoga am. What is its shape?

c) Using a scone of 8 for the final category, find the mean number of persons per household

d) Report and interpret the median and mode of household size

According to News America Syndicate in 1986 the number of followels of the world's major religions were 835 million for Clyistianity, 420 million for Islam, 322 million for Hinduism. 300 million for Confucianism, 210 million for Buddhism, 79 million for Shinto. 50 inillion for Taoism. and 12 million for Judaism

a) Constiuct a relative liecjuency distributoon for these data

b) Consunct a bai graph for these data.

c) Can jou calculate a mean. median. or mode fot these data? If so, do so and interpret.

Rele1 to Table 3.1. Use software to construct a histogram for these data. using its default metbod of lorining intervals Describe the shape of the distnbution. and constiuct the corresponding relative fiequency distribution.

Table 311 shous the number (in millions) of the foreigu-born population of the United Stales in 1990 . by place of birth.

a) Construct a relative hequency disuibution.

b) Plot the data in a bar graph.

c) Is "Place of birth" quantitative of qudlitaive" How, if at all, can you descrbe these data using numencal measures?

A sesearcher in an alcoholism treatment center. interested in suminanzing the length of eray in the center for first-time patients, randomly selects ten records of individuals insttutionalized within the previous two years. The lengths of stay in the center. in days, are as follows. 11, 6, 20.9, 13, 4, 39, 13, 44, and 7

a) Construct \& stem and leaf plot.

b) Find the mean, and interpret.

c) Find the median, and interpret.

d) Find the standard deviation, and interpret.

e) For a similar study 25 years ago at the same institution, lengths of stay for ter sampled individuals were $32,18,55,17.24,31,20.40,24$, and 15 days. Compare resuls to those in the uew study using (i) a back-to-hack stem and leaf plot, (ii) the mean, (iii) the median. (iv) the standard deviation. Interpret any differcnces you find.

f) Actually, the new study also selected one other record That patient is still institutionalized after 40 days Thus. that patient's length of stay is at least 40 days. but the actual value is unknown Can you calculate the mean or median for the complete sample of size 11 including thus partial obscrvation? Explain. (An obsen ation such as this is said to be cersored, meaning that the incasured value is "cut short" of its true, unknown value.)

The 1994 General Socjal Survey asked respondents "How often do jou lead the newspaper?" The possible responses were (every day, a few times a ueek once a week, less than once a week. never). and the counts in those categories were (969.452.261, 196, 76).

a) Identify the nedian response.

b) Identify the mode.

c) Consider the variable, $Y=$ nunber of tinues reading the neu spaper in a weck. meakured as described abovc. Can you calculare $\bar{Y}$ ? Why? What would you need to do to approximate its value?

Table 3.12 summanzes responses of 1250 subjects in the 199 Jeneral Social Survey to the question. "About how ofien did you have sex dur ing the last 12 months $\gamma$ "

a) Construct a bar graph, and imerpret

b) Report the median and the mode. Interpret.

c) Treat this scale in a quantitative noannes by assigning the scores $0,1.1 .0,2.5,4.3$. 10.8. and 17 to the categories, representing approximare monthly fiequency Calculate the sample mean and interpret.

The 1991 General Social Survey asked respondents, "How many sex partncrs have you had in the last 12 months ?" Table 3.13 shous results for 637 respondents.

a) Calculate and incerpret the median and the mode.

b) For the highest 11 valucs, we know only an interval within which the observation fell. To approximate hese values, we could use midpoint scores. For instance. for inten al 5 10. usc $(5+10) / 2=7.5$. We must choose an arbitrary score over 100 for the unnal unterval. Using 120 for that observation, calculate the nean response. Compare to the median, and interpret.

c) Suppose the hghest two obscrvations were misrecorded, and the actual values were 4 for each. Recompute the niean and median, and use this example to descnbe potential effects of outliers on chese measures.

For 1992, the statewide manber of abortions per 1000 women 15 to 44 years of agc, for states in the Pacific Iegion of the United States, were: Washington, 33. Oregon, 16: Califomia. 304; Alaska. 2: and Hawaii, 11 (Sratistical Abstracr of the United States. 1994).

a) Calculate the mean

b) Calculate the inedian Why is it so different from the mean?

For 1993. Table 91 in Chaptei 9 show 5 data on the statewide violent cime rate per 100,000 population. In this exercise. do not use the observation for D.C

a) Using the intervals $0-100,100-200,200-300$, and so forth, tally the 11 equencies and construct a frequency distribution

b) Find the relative frequencies.

c) Sketch a histogram. Hou would you describe the shape of the distnbution?

d) Drop the final digit of each crime rate Then, construct a stem and leaf plom on this set of modificd values. How does this plok compare to the hustogram in (c)?

Refer to the precoding problem. Table 3.14 show s part of a cumputer prntout for analyzing the dara, the first colunin refers to the entire data sct. and the recoud colomn deletes the observation for D.C

a) Report and interpiet tie mean and median of the iirst set of crime rates. Explain whal their relative values suggest about the shape of the distribution

b) For each statistic reported, evaluate the effect of including the ourlying observation for D.C

In 1992 in the United States, the median family income was $$\$ 38909$$ for white families. $$\$ 21.161$$ for black families, and S23.901 for Hispanic famihes (U.S Bureau of the Census, Curient Population Reports. P-60-184) In constant 1992 dollars, the median family incomes in 1975 were $$\$ 35.619$$ for white lanilies, $$\$ 21.916$$ for black families, and $$\$ 23,844$$ for Hispanic famalies Interpres the medians in 1992 and the changes in their values between 1975 and 1992 .

Table 3.15 sbows 1994 female ecogomic actuity for countries in South Amcrica.

a) Construct a back-to-back stem and leaf plot of these values contrasted with those from Easterm Europe in Table 36. What is your interpretation?

b) Compare the means for the two sets of nations, and interpret.

c) Compare the medians, and interpret

According to the U.S Bureau of the Census. Curren Population Reports, in 1994 the median houschold income was $$\$ 32.368$$ for whites and $$$ 18,660$$ fou blacks, whereas the mean houschold income was $$\$ 40.708$$ for uhites and $$\$ 25.409$$ for blachs Does this sug-gest that the distitbution of income is symunetric, or skewed to the right, or skewed to the left? Explain.

Refes to the previous exercise. The results refer to 57.9 million white households and 8.0 million black househoids.

a) Find the overall mean income.

b) If the mean income equals $$\$ 30,291$$ for 59 million Hispanic families. find the overall mean fol the three groups combined.

For touns with popularion size 2500 to 4599 in the U . Northeast in 1994. the mean salary of chiels of poluce was $$\$ 37,527$$, and the nedian was $$\$ 30.500$$ (The Muricipal Year Book 1995. Washington. D.C. lnternational City/County Maungement Association. 1995) Docs this sugecst that the distribution of salary uas skewed to the left. symmetric, or skewed to the nght? Explain.

According to the National Association of Home Builders. the U.S nationwide median selling pnce or hotnes sold in 1995 was $$\$ 118,000$$.

a) Would jou expect the mean to be larger, smalles. or equal to $$\$ 118.000^{\circ}$$ Explain

b) Which of the following is the most plausible value fo the standard deviation:

(i) -15.000 .

(ii) 1.000 , (jii) 45,000 ,

(iv) $1.000,000$ ? Wh?

The 1990 General Social Surve) asked respondents, "During the past 12 months, how many people have you known personally that were victions of homicide." Table 3.16 shows a computer pruntout from analyzing responses for 1370 subjects

a) Reporl the selative frequency disinbution

b) Skelch a histogram. Is the distribution bell-shaped, skewed to the right. or skewed to the left?

c) Calculate the mean. median, and mode, and interpret their values.

d) Rcport and interpret the standard deviation. Does the Empincal Rule apply to this distribution Why or why not?

The Human Development Index (HDJ) has three components. life expectancy at birth. cducational artainment, and income. It ranges from 0 to 1 , with higher values representing gieater developinent. In 1992, the HDI ratings for eight Centıal American countries were .884 for Belize, .884 for Cosra Rica, .579 for El Salvador, 591 for Guateinala, 578 for Honduras. 842 for Mexico, .611 for Nicaragua. and .856 for Panaina

a) Construct a stem and leaf plot. Drop the final digit. and split the values into two parts; thar is. hatre two lines for responses with first digit 8 . putting entres with second digit 0 to 4 on one line and 5 to 9 on the second. have two lines for 7, two lines for 6. and two lincs for 5 . What is the shape of the distribution?

b) Calculate and interpret the mean, modian, and range.

According to Statistical Abstract of the United Stales. 1995, average salary (in dollars) of secondary school classroom teachers in 1994 in the Unued Slates varied among stales with d five-number summary of:

$$

\begin{array}{cc}

100 \% \text { Max } & 51,700 \\

75 \% \mathrm{Q3} & 38,500 \\

50 \% \text { Med } & 33,900 \\

25 \% \mathrm{Q1} & 29,800 \\

0 \% \mathrm{Min} & 25,300

\end{array}

$$

a) Find and interpiet the range and the interquartile range

b) Consuruct a box plot

c) Based on (b), predict the diuection of skew for this distribution. Explain

d) If the distnbution, though skewed, is approximately bell-shaped; which of the follow: ing valucs would you expect for the standard deviation.

(i) 100 . (ii) 1000 . (iii) 6000 , (ii) 15,000 ? Explain.

Consider the data in Table 3.8 on the number of people you know who have commitled suicide. The mean equals .145 , and the standard deviation equals .457 From the results reported in the table, what percentage of measurements fall within one standard deviation of the mean? Is the Empinical Rule appropriate for this distribution? Why or why not?

Why is the median sometimes preferred over the mean as a measure of central lendency? Give an example to illustrate your answer.

Why is the mean cometimes preferred over the median? Give an example to illustrare your ansuer

Give an exauple of a vanable for wiljich the mode applies. but not the mean or median.

A group of high school students takes an exain The mean scorc for the boys is 65 . and the nuedian is 75 . Both the mean and the median score ior the gints is 70 . How can you explain the large dillerence belween the tho sunumary measures for the boys"

Dunng the spring semester of 1995 at the University of Flonda, coinputer usage of students having accounts on a mainframe computer at the universitv was sumunarjzed by a mean of 1921 and a standard deviation ol 11,495 kilobytes of drive usage.

a) Does the Enupincal Rule apply to this distribution? Why?

b) Would you expect this distribution to be symunetric. skewed to the right, or skewed to the lefi Explain

c) What could cause the standad deviation to be so large compared to the mean? (Data supplied by Dr Michael Conlorl. University ol Florida.)

Refer to Problem 3.26. The five-number summany of these data was minmum $=4, Q 1=$ 256 , median $=530 . Q 3=1105$, and maximum $=320,000$. What does this suggest about the shape of the distribution? Why?

Residential electrical consumption in March 1994 in Gainesville. Furida, had a mean of 780 and a standard deviation of 506 kilowatt-bours ( Kuh ). The minumum usage was 3 K wh and the maximuin was 9390 K wh. (Data supplied by N. Todd Kamhoot. Gainesville Regional Utilitues.)

a) What shape do you expect this distnbution to have? Why?

b) Do you expect thus distribution to have any oulliers? Explain.

Residential water consumption in March 1994 in Gaincsville, Flonda, had a mean of 7.1 and a standard deviation of 6. (thousand gallons) What shape do you expect this distnbution to havc? Why? (Data supplied by N Todd Kamhoot, Gainesville Regional Utilities.)

For each of the following, sketch roughly what you expect a histogran to look like, and explain whether the mean or the median would be grcater. Also skeich box plots for cases

(a) and (c) that are consistent with the hustograms.

a) The selling price of neu homes in 1997

b) The number of children ever bom per woman age 40 or over

c) The score on an easy cxam (mean $=88$, standard deviation $=10$. naximum possible $=$ 100)

d) The numbes of cars owned per family

e) Number of months in which subject drove a car last year

For each of the follow ing vanables. indicale whether you would expect its relative frequency histogram to be bell-shaped, U-shaped, skewed to the nght. or skewed to the left. For parts (a). (b), and (g), sketch a box plot that would be plausible for that variable

a) Exain scorc (scores fall between 0 and 100 . with a mean of 90 and a standard deviation of 10 )

b) IQ

c) Number of times arrested in past year

d) Time needed to complete difficult exam (maximum time is 1 hour)

e) Assessed value of home

f) Age at death

g) Wcekly church contribution (median is $$\$ 10$$ and uean is $$\$ 17$$ )

h) Number of years lived in present home (mode $=0$ to 1 ycar)

i) Attitude toward legalization of abortion

Give examples of social science variables having a distnbution that you would expect to be

a) Approximately symmetric

b) Skewod to the right

c) Skewed to the left

d) Bimodal

e) Skewed to the right, uith a mode and median of 0 but a positive mean

A recent Roper organization survey asked. "Hou far have environmental protertion laws and regulations gone?" For the possible responses not fal enough. abour right, and too far, the percentages of responses were $51 \%, 33 \%$, and $16 \%$

a) Which response is the mode?

b) Can you compute a meau or a median for these data? If so, do so: if not. explain why not

A company conducts a srudy of the number of miles traveled osing public transportarion by its cmployees dining a typical day. A random sample of ten employees yields the following valucs tin milesi:

$$

0.0 .4,0.0 .0 .10 .0 .6 .0

$$

a) Calculate and isterpret the mean, median, mode, range. variance, and standard deviation of these measurements

b) The next person sampled lives in a different city and travels 90 miles a day on public transport Recompute the mean, inedian. and slandard deviation. and note the effect of this outlying observation

To measure variation:

a) Why is the staudard deviation s usually prelened over the range?

b) The IQR is somctimes preferred to s when there are some extreme outliels Why?

In the mid-1980s. the General Sucial Survey asked ıespondents hou many close fi iends they had. For a sample of size 1467. the mean was 7.4 and the standard deviation was 110 . The distribution had a median of 5 and a mode of 4 Based on these statistics, what u culd you surmise about the shape of the distributuon? Why?

In I994 the General Social Survey asked, "On the avcrage day, about how many hours do you personally watch television ${ }^{\prime \prime}$ of 1964 responses, the mode was 2 , the inedian was 2 , the inean was 28 , and the standard deviation was 24. Based on these statistics. u hat would you surmise about the shape of the distribution?

For an exam given to a class, the students' scores ranged from 35 to 98 , with a incan of 74. Which of the following is the most realastic value for the standard deviation? 1,12 . 60, -107 Whyn

The sample mean for a data sel equals 80 Which of the following is an impossible value for the standard deviation $200,0-20$.

According to a recent report from the U.S National Center for Health Statisucs, feinales with dge between 25 and 34 years have a bell-shaped distribution on height, with mean of 65 inches and standard deviation of 35 inches

a) Give an interval within which about $95 \%$ of the heights fall.

b) What is the beight for a female who is three standard deviations below the mean in height Would tuis be a rather unusual height? Why?

In a large northern city, monthly' payinents to peoplc on welfare last year were observed to have approxintately a bell shupe with mean $$\$ 700$$ and standard deviation $$\$ 100$$ Give a range of valucs within which all or nearly all the pay ments iell.

Hon the WWW ddla on nuinber of tines a week rcading a ncwspapet, leferred to in ProbIem 1.7. Figure 3.18 shows a computer pnutout of the stein and lead plot and the bo $\lambda$ plot. a) Frosn the box plot, identily the mininrunn. lower quarule. median, upper quartile, and maximuin

b) Identify thesc five numbers using the stem and leaf plot.

c) Do the dald appear to contain any outlicrs? If so, identify

d) Based on the box plot, indicate the approximate value of the mean. The standard desiation is one of the following values-.5,3,10,20. Which do you thuk it is, and why?

Suppose the distribution of the prices of new honies built in the United States in 1996 was approximately bell-shaped, with a mean of $$\$ 120.000$$ and a standard deviation of $$\$ 40,000$$

a) Describe the distribution using properces of the standard deviation

b) If your new housc was pnced half a standard deviation above the mean in 1996. how much did it cost?

c) If the disunbution is not actually bell-shaped. what shape nould you expect it to have? Why?

In 1993, the live-number summary for the statewide percentagc of pcople without health insulance had a minimem of $8.7 \%$ (Wisconsin). $Q 1=119$. Med $=13.4, Q 3=17.8$. and merumum of $239 \%$ (Louisiana) (Statistical Abstract of the United States, 1995).

a) Construcl a box plot for these data.

b) Do you think that the distribution is symmetric. skewed to the right. or skewed to the left? Why?

c) Which of the following is the most plausible value for the standard deviation ol this distribution: $0.4,13,22$ ? Why?

Refer to Problem 3.20. Construct a hox plot for these data Are there any apparent outlicrs?

The distrubution of high school graduation rates in the linited Stales in 1993 had a minimum value of 64.3 (Mississippi), lower quartilc of 739 , median of 76.75 . upper quantile of 80.1. and maximum value of 86.6 (Alashaj (Staristical Abstract of the United Slates, 1995)

a) Report and inerpret the 50th percentile.

b) Report the range and the interguartile range.

c) Sketch a box plot. Are there any outliers?

d) Provide a guess for the standard Jeviation. Justify

In your library, find the percentage of the vole that Bill Clintou received in each slate in the 1996 presideptial election.

a) Prepare a stem and leaf plot. Are there any apparent outliers?

b) Construct a box plot Are there any outliers?

c) Construct back-10-back stem and leaf plots or side-by-side box plots for Northeastem and West Coast states versus other states Interpret.

Refer to Problein 3.10

a) Using the data set without D.C., find the quartules and the intcrquartile range

b) Accolding to the definition of an outlier in terms of the IQR , are any of the obser ations outliers?

c) Construct a box plot for the distribution.

d) Repeat the analyses. including the D.C. obser ation, and compare results.

What is the difference belween the descriptive measures syinbolized by

a) $\bar{Y}$ and $\mu$ ?

b) $s$ and $\sigma$ ?

For the WWW data file (Problem 1.7). use computer software to conduct graphical and numerical sumunaries for a) distance from hoine town, b) weekly hours of TV watching. c) weekly number of tines reading a newspaper, and d) nomber of HIV-AIDS victims known. Describe the shapes of the distrbutions, and summarize your findings.

Refer to the data file yon created in Problem 1.7 For varialles chosen by your instructor, conduci descriptive statistical analyses Prepare a report. interpreting and summanzing you findings.

Refer to the dala in Table 9.1 on peverty rates. Using methods of this chapler. summarize these data Prepare a report. graphically displaying the data and summarizjing the central tendency and variation. In you1 report, discuss wherher there anty outliers. and if there are. analyze their influence on the resulls.

The number of therapeutic abortions in 1988 in Canada, per 100 live births, is shown in Table 3.17 Using methods of this chapter. present a descriptive stalistical analysis of these data, interpreting your results.

Refel to Problem 3.19. Table 3.18 shows the HDI ratungs for African countries Using graphical and numerical methods of this chapter. sumnmarize HDI for these countries, and compare to the disinbution of HDI for Central American countries.

Obtain data on statewide murder rates from the latest edition of Staistical Abstruct of the L'uited States

a) Analyze the data using the graphical and numcrical methods of this chaptet.

b) Use graphical and numencal methods to compare the inuides rate distribution to the one for the data in Table 3.1.

During the ctrihe of professional baseball players in 1994 , two quite different numbers werc reported for the central tendency of players annual salanes. One was $$\$ 1.2$$ million and the other was $$\$ 500.000$$. One of these was the median und one was the mean. Which value do you think was the incan? Why?

In 1986. the U S. Federal Reserve sampled about 4000 households to extimate overall net worth of a fammly. Excluding some outlices of extrenrely wealthy individuals. they reported the suinmanes $\$ 44,000$ and $\$ 145,000$ One of these was the inean. and one was the median. Which do you think was the median? Why?

According to a recent repon trom the U.S. National Center for Health Statistics. Ior males with age 25-34 years. $2 \%$ of their heighis are 64 inches or less. $8 \%$ are 66 incles or less$27 \%$ are 68 inches or less, $39 \%$ are 69 inches or less, $54 \%$ arc 70 inchcs or less, $68 \%$ are 71 inches or less, $80 \%$ are 72 inches of less, $93 \%$ are 74 inches or less. and $98 \%$ are 76 inches or less. These are called cumularive percenages.

a) Find the median height.

b) Nearly all the heights fall between 60 and 80 inches. with less than I\% falling outside that range. If the heights alc approximatcly bell-shaped. gve a rough apyroximation for the slandard deviation of the heights. Explain your reaconing.

Grade point averages of graduating seniors at the University of Rochester are approximately bell-shaped in distribution ranging froin 20 to 4.0 with a mean of about 3.0 . Using the face that all or nearly all measurements for this form of distnbution occur willuin three standard deviations of their mean. give an approxination for the value ol the standard deviation).

For the following two multiplc-choice items, welect the conect response(s)

In Canada in 1981. for the categories Catholic. Protestant, Fastern Orthodux. Jewish. Nanc. Other for religious affiliation. the tclative frequencies werc $473 \%, 41.2 \%, 15 \%, 1.2 \%$, $7.3 \%, 1.5 \%$ (Canada Yeor Book, 1992).

a) The medjan religion is Protestant.

b) The distribution is bimodal

c) Only $2.7 \%$ of the suhjects fall within one standard deviation of the mean.

d) The mode is Catholic.

e) The "Other" response is an outlies.

The 1991 General Social Survey asked whether having sex before manage is always uiong, almost always wrong. wiong only sometimes, not wrong at all. The response counrs in these four calegor ies were $274,98,186,435$. This distribution is

a) Skewed to the right.

b) Approximately bell-sluped.

c) Binodal

d) Shape does not make sense, since the variable is nominul

Ten familics are randomly selected in Florida and another ten families ue randonly selecled in Alabana Table 3.19 provides summary infurmation on incan family income The mean is higher in Alabama both in rural aseas and in mban areus Which state has the larges overall mean income? (The reason for this apparesil paradox is that mean urben incomes are larger than mean rural incomes for both states and the Flonda sample has a higher proportion of urban residents than the Alabama saınple.)

Refer to Problem 3.10. Explain why the mean of these 50 measurements is nor necessarily the same as the violent crime rate for the entire U.S. population.

The mican and standald deviation of a sample may change if data are rescaled For a sample with nean $\bar{Y}$. adding a conslant $c$ to each observation changes the mean to $\bar{Y}+c$, and the standard deviation $s$ is unchanged. Multiplying each obser varion by $c$ changes the mean to $c \bar{Y}$ and the standard deviation $t 0|c| s$.

a) Scores on a difficult exam have a mean of 57 and a standard deviation of 20 . The teacher bonsts all the scores by 20 points hefore awarding grades Report the meun and standard deviation of the boosted scores.

b) Suppose that annual iacome has a mean of $$\$ 39.000$$ and a standaid deviation of $$\$ 15,000$$. Values are converted to Bntish pounds for presentation to a Britidh audience If one British pound equals $$\$ 1.50$$. report the mean and slandard det iation in Bntists currency

The results of the study descrbbed in Problem 3.34 are to be reported in a Freach newspaper. The ten measurements are converted to kilometer units ( 1 nile $=1.6$ kilometers). Report the incan and standard deviation of the converted incasurements.

The crude death rate is the number of deaths in a year. per size of the population. multiplicd by 1000 According to the U.S. Bureau of the Census in 1995 Mexicu had a crude death rate of 4.6 (i.e . 4.6 dealhs per 1000 population) while the United Stales had a crude dealh rale of 8 4. Explain how this overall death rate could be higher in the United States even if the Uniled States had a luwer death rate than Mexico for people of each specific age.

"The sample means for $k$ sets of data with sample sizes $n_1, n_2, \ldots, n_1$ are $\bar{Y}_1, \bar{Y}_2, \ldots, \bar{Y}_1$. Show that the overall sample mean for the combined data set is

$$

\bar{Y}=\frac{n_1 \bar{Y}_1+n_2 \bar{Y}_2++n_1 \bar{Y}_4}{n_1+n_1+\cdots+n_4}

$$

Interpret $\bar{Y}$ as a weighted average of $\bar{Y}_1, \bar{Y}_2, \ldots, \bar{Y}_{\dot{k}}$.

Show that $\Sigma\left(Y_i-\bar{Y}\right)$ must equal 0 for any collection of measurements $Y_1, Y_{-} \ldots, Y_n$.

The Russian inathematician Tchebysheff proved that fol any real number $h>1$, the proportion of the measurements that fall more than $k$ standard deviations lirom the mean can be no greater than $1^{\prime} k^2$ Moreover, this holds for ony distnbution. nel just bell-shaped oncs

a) Find the upper bound for the proportion of measurements falling (i) more than two standard deviauons from the mean, (ii) more than three standard deviations from the mean. (iii) more than ten standard deviations from the mean.

b) Compare the upper bound for $k=2$ to the appruximate proportion falling more than Iwo slandard deviations fron the mean in a bell-shaped distnbution Why is there a difference?

The least squares property of the mean stales that the data fall cioser to $\bar{Y}$ than to any other real number $c$, in the sense that the sum of squares of deviations of the data ahout their mean is smaller than the sum of squares of thcir deviations about $c$. That is,

$$

\sum\left(Y_i-\bar{Y}\right)^2<\sum\left(Y_i-c\right)^2

$$

If you liave studied calculus, prove this property by treating $f(c)=\Sigma\left(Y_1-r\right)^2$ as a function of $c$ and deriving the value of $c$ thal provides a minimum

