Representation of Data

BACK

Representing data

Once a research question has been developed, we collect data. The next step is to classify and organise the data. This is then followed by summarizing the data using the measures of central tendency and spread. More importantly the data that has been collected, classified, organised and summarised needs to be represented and interpreted graphically. The following different types of graphical representations will be explained in this section: pie charts, histograms, single bar graphs, line graphs and broken line graphs.

Stem & Leaf

In this section, we will consider the stemplot (or stem-and-leaf plot) which can be used to arrange, analyse and interpret numerical data.


Stemplots (Stem-and-Leaf Plot)

A stemplot is a device used to group a small data set (up to about 50 data values).  It arranges the data set in ascending order while retaining all the original data values.  This enables us to find the first quartile, median and the third quartile readily.  The stemplot is useful to obtain information about the centre, spread, shape and outliers of the distribution.


Constructing a Stemplot

In a stemplot (i.e. stem-and-leaf plot), each data value is considered to have two parts, a stem and a leaf.  The leading digit(s) of a data value form the stem, and the trailing digit(s) becomes the leaf.
Three examples of a stemplot follow:
  • Data values 64, 69 and 73 are recorded as shown below:

Note that 6 | 4 represents the data value 64.
  • Data values 348, 365 and 479 are recorded as shown below:

Note that 3 | 48 represents the data value 348.
  • Data values 34.8, 35.2 and 35.9 are recorded as shown below:

Note that 34 | 8 represents the data value 34.8.

Note:
To construct a stemplot, we:
  • enter the stems to the left of a vertical dividing line and the leaf to the right of the vertical dividing line for each data value;
  • record each data value as listed in the data set to construct an unordered stemplot.  Then we construct an ordered stemplot from the unordered version by arranging the leaves in ascending order.


Example 5


Solution:
Lowest score = 20
Highest score = 73
A stemplot for the scores that range from 20 to 73 is as follows:
This stemplot is not ordered.
An ordered stemplot is obtained by arranging the leaves in order, as shown below.

Note:
For each value of the data, the stem is the tens digit and the leaf is the units digit.


Example 6


Solution:
a.  Lowest score = 3
     Highest score = 126
An ordered stemplot for the scores that range from 3 to 126 is given below.  Stems 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 are formed by the tens digits; whereas, the stems 10, 11 and 12 are formed by the hundreds and tens digits.  The leaves are formed by the unit digits of the scores.

We notice that the scores 3, 115 and 126 are separated from the main body of the data.  So, 3, 115 and 126 are outliers.  The stemplot for the data consisting of outliers can be displayed as follows:
So, the median of the data set is 41.

 Pie charts

Pie charts are best to use when you are trying to compare parts of a whole. They do not show changes over time. A pie chart is divided into sectors, usually labeled as percentages. Each sector represents a particular proportion of the data. Information can be easily read from a pie chart but contrary to that, it is difficult to identify the trend. This means that although it may be possible to use a pie chart to show the monthly rainfall of a particular town, it would be difficult to identify trends in the rainfall pattern.

Exercise 1: Understanding pie charts


Consider the pie chart below and answer the questions that follow:
Image
  1. Which level did most of the learners obtain?
  2. What was the percentage of learners who obtained this level?
  3. Few learners achieved level 7. What was the percentage of learners at this level?
  4. If there were 120 learners who wrote the examination, how many learners achieved level 4?
  5. Write the ratio of learners who achieved level 3 to those at level 2 in its simplest form.


  1. Level 2
  2. 40 percent of learners obtained this level
  3. 2%
  4. 15% of 120 learners = 18 learners who obtained level 4
  5. 20 : 40 = 1 : 2.

Bar graphs

Bar graphs are used to display data that compared in categories. For example, every month for five months, a new store keeps count of how many customers visit the store. We can represent this on a bar graph as shown below:
Image

Histograms

Histograms are different from bar graphs in that they usually represent continuous data. Data that is displayed on a histogram is also grouped. Consider the following table of the foot lengths of learners in a class, and the frequency of those lengths.
Foot length
Frequency
22 - 23,9 cm
3
24 - 25,9 cm
10
26 - 27,9 cm
8
28 cm and longer
4
We can represent this information on a histogram as follows:
Image
Notice that unlike in the bar graph in the previous section, in the histogram, the bars are drawn next to each other - there are no spaces between them. This is to indicate that the data is continuous.

Example 1: Drawing bar graphs and histograms

Question

  1. The school tuckshop keeps track of how many hot dogs, sandwiches, salads and burgers they sell at one break time. They have the data given in the table below. Draw a bar graph to represent this data.
    Item
    Frequency
    Hot dogs
    15
    Sandwiches
    35
    Salads
    10
    Burgers
    12
  2. Lwanda measures the length of his school books (in cm) and draws up the frequency table below. Draw a histogram to represent this data.
    Length of Book
    Frequency
    20 - 23,9 cm
    4
    24 - 26,9 cm
    7
    27 - 29,9 cm
    5
    Longer than 30 cm
    3

Answer

  1. Image
  2. Image
Note from the previous worked example that when we draw both bar graphs and histograms, it is important to use an appropriate interval for the vertical axis. For example, using an interval of 100 would not be appropriate if our largest frequency is only 15, and using an interval of 1 would not be appropriate if we had a maximum frequency of 500 - it would make our graph very large and hard to read!

Line graphs

In data handling we use line graphs to show the relationship between two quantities. The horizontal axis often represents time, as these kinds of graphs are particularly useful for showing changes over time.
For example, we can plot the manner in which the temperature of water in a pot being heated, increases, where the temperature is taken every 30 seconds.
Image
Time (30 second intervals)
0
30
60
90
Temperature (°C)
20
40
60
80
Image

Example 2: Representing data on a line graph

Question

This table below shows the average number of minutes that Jabu spent watching TV from January to November last year.
Month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Daily TV viewing time (min)
108
103
108
120
115
122
116
105
110
105
104
Image
  1. Plot this data on a set of axes.
  2. Can you observe any trends or patterns in the data? Give some possible reasons for these trends.
  3. Would you be able to represent this data in a bar graph?
  4. What is the advantage of using a line graph to show this information?

Answer

  1. The points are plotted and connected with line segments.
Image
  1. You can see that Jabu's viewing time increases in April and again in June and slightly in September (perhaps due to school holidays). We also see decreases in his viewing time during February, May, August, October and November. These could be times when he was preparing for tests and exams.
  2. Yes, it would be possible to represent this data in a bar graph; the number of minutes would be plotted as a bar for each month.
  3. A line graph helps us to see trends because we can easily see the increasing or decreasing slope of each line segment in the graph.

Exercise 2: Representing data


There are 300 learners at a school sports day. There are four sports teams, represented by red, blue, green and yellow. Someone records the colours of the T-shirts the learners are wearing.
Colour of T-shirt
Red
Blue
Green
Yellow
Frequency
75
93
78
54
Represent this data in a pie chart.

Image

The following list gives the weights of the learners in a class in kilograms.
64; 83; 74; 77; 65; 55; 58; 61; 63; 98; 97; 53; 54; 102; 78; 82; 86; 95; 67; 72
  1. Draw a frequency table to order the data, grouping it into 10 kg intervals.
  2. Use the frequency table to draw a histogram of the data.

  1. Interval (in kg)
    Frequency
    50 - 59
    4
    60 - 69
    5
    70 - 79
    4
    80 - 89
    3
    90 - 99
    3
    100 - 109
    1
  2. Media File: Image

The following table gives the maximum temperature (in °C) for each month in a year.
Month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Max Temp (°C)
27
28
25
21
20
17
18
19
20
24
25
27
  1. Draw a line graph of this data.
  2. Describe the trends you see in the data.

  1. Media File: Image
  2. The maximum temperature drops towards the middle of the year, as we would expect during winter.

Answer the questions below about the following pie chart. The pie chart shows the favourite fruit juice flavours of a group of 120 learners.
Image
  1. Calculate how many learners chose each type of juice.
  2. In what way does the pie chart work better than a bar graph to represent this data?
  3. What information would a bar graph give you that this pie chart does not?

  1. 45% of 120 learners = 54 learners who chose fruit cocktail. 30% of 120 = 36 learners who chose litchi. 12,5% of 120 learners = 15 learners who chose grape. 12,5% of 120 = 15 learners who chose apple juice.
  2. The pie chart is a simple, visual representation that works well for representing percentages. A pie chart allows us to see at a glance the relative proportions of the learners who prefer each flavour.
  3. The number of learners who prefer each flavour.

Look at the bar graph below and answer the questions that follow.
Image
  1. Does this graph tell us how many Grade 10 learners there are in total?
  2. Can we assume that none of the learners who take Accounting take Geography?
  3. A pie graph of this data would not make sense. Explain why.

  1. No. It may look like there are 140 learners in total but we have no way of knowing if that is correct or just an arbitrary number. Also, learners take more than one subject, so we can't use the numbers of learners per subject to determine how many learners there are altogether.
  2. No, we cannot assume this.
  3. Learners do not only take one subject, therefore the data cannot be split into discrete percentages per subject and represented using a pie chart.

1 comment: