### Representing data by distributions and charts

The goal of this lab is to learn to create frequency distributions from data and to represent the results graphically.
SAVE EACH WORKSHEET - THEY WILL BE NEEDED IN THE NEXT LAB.

PROBLEM 1. In this problem you will make a frequency distribution and present the distribution in a graphical forms. In cell A3 enter the word "DATA". In cells A4 to A33 enter the following 30 scores on a 10-point quiz:

 6, 8, 6, 10, 4, 8, 1, 3, 8, 6, 5, 1, 9, 8, 3, 6, 7, 6, 8, 5, 3, 6, 3, 8, 5, 3, 2, 9, 3, 6,

a. In cell B3 enter the label "Bin". Then in cells B4 to B13 enter the possible scores, 1 to 10 respectively. (The "bin" identifies the values into which the data will be sorted.) To sort the data into the bins we use the HISTOGRAM tool. From the Excel Ribbon select the Data tab, then Data Analysis from the Analysis section, and finally HISTOGRAM from the list that appears. This brings up a Histogram Dialog Box. Enter the data, A4..A33, for the Input Range, B4..B13 for the Bin Range, and D3 for the Output Range. Click OK and the frequency distribution should appear in columns D and E. (Another method is demonstrated in the YouTube Video.)

b. To represent the distribution as a bar (column) graph highlight the Frequency values, E4..E13, and select a column graph from Charts available on the Insert tab. (See Lab 1.)
Move the chart if it covers the frequency distribution. To make the columns "touch" right click one of the columns and choose Format Data Series from the pop-up menu and change the Gap width to 0.

In a text box explain, for an arbitrary bin value say x, how to determine from the table the percentage of scores that are at least x.
Print the worksheet.

NOTE: If not many categories are involved, a Pie graph is used instead of a Column graph. In the next problem you are to make both types.

PROBLEM 2. The distribution of sales (in \$) for a week for XYZ company are as follows.

 Sales Amt(\$) #Sales 0-999 2 1000-1999 7 2000-2999 14 3000-3999 7

a. Start with a new sheet. Place the label "Sales Amt" in A3 and the four sales ranges given above in cells A4..A7. Place the label "#Sales" in cell B3 and the distribution count in cells B4..B7. Make a column graph of the distribution as you did in Problem 1(b). (Note that, in this problem, the step 1(a) is unnecessary because the 30 sales have already been grouped into "bins".)

b. Make a Pie graph for the same distribution and place this chart below the column graph. Add the amount labels and percentages to the pie graph. It is non-obvious how to do this. First, right click the pie graph and select "Add Data Labels" from the pop-up menu. The default labels are all wrong. Right click a data label and select "Format Data Labels" from the pop-up menu. Make the correct selection.

c. Using information from the graph to determine the percentage of sales with a value under \$2000. Express you answer as a sentence in a text box.
Turn in one printout that includes all the parts of Problem 2.

PROBLEM 3. In this problem unemployment rates for the years 1970-1999 will be analyzed. Start with a new sheet.
a. Enter the following data in cells A4 to A33.

 4.9, 5.9, 5.6, 4.9, 5.6, 8.5, 7.7, 7.1, 6.1, 5.8 7.1, 7.6, 9.7, 9.6, 7.5, 7.2, 7.0, 6.2, 5.5, 5.3 5.6, 6.8, 7.5, 6.9, 6.1, 5.6, 5.4, 4.9, 4.5, 4.2

b. Construct a grouped frequency distribution table using intervals of length 0.5 starting at 4.0. Do this the same way you did Problem 1.a. except you should place the UPPER LIMITs of the intervals in the bins -- that is, start in cell B4 with 4.5, then place 5.0 in cell B5, etc. The Bin/Frequency feature of the spreadsheet then groups the 30 numbers into classes (the first class being (4.0,4.5], the second being (4.5,5.0], etc.

c. Create a column graph for the grouped unemployment data in column E. In this case use the midpoints of the intervals for the X-axis labels of the column graph. You should place the midpoints in column C next to the Bin values.

NOTE: The mid-point of the first class (4.0,4.5] is 4.25, of the second class is 4.75, etc. The reason for using the mid-points is that, if for example there are three points in the interval (7.0,7.5], there is no way of knowing from the bin distribution what the original numbers were (these have been lost from the original data by virtue of placing them into classes); therefore one "assumes" that each of them is equal to 7.25. Note that this was already done in Problem 2 before you even started.