《Displayr:市場研究中的數據可視化(英文版)(71頁).pdf》由會員分享,可在線閱讀,更多相關《Displayr:市場研究中的數據可視化(英文版)(71頁).pdf(71頁珍藏版)》請在三個皮匠報告上搜索。
1、Table of ContentsS TA NDA R D TECHNIQ UES 3 1 Good colors 4 2 Sort 15 3 Avoid overplotting 17 4 Label 23 5 Represent accurately 25 6 Appropriate type 31 7 De-clutter 42 8 Attract attention 45F OR MATTING 48 9 Create mnemonics 49 10 Redundant encoding 53 11 Reduce eye movement 61 12 Show norms 67 13
2、Emphasize 71 14 Reduce color 75 15 Form rectangles 78 16 Create symmetry 81R ES HA PING 85 17 Small multiples 86 18 Banking to 45 91 19 Decompose 95 20 Force contrasts 99 21 Order by context 106 22 Diagonalize 110 23 Simplify the data 113 24 Supernormalize 116Software 134Summary 135About the author
3、136IIIThis book is intended as a resource to help you create visualizations to allow users quickly to discover and remember the key stories in data.This book is designed to help present and make the most of research findings,using foundational(and some maverick)ideas as well as insights into how the
4、 target audience/reader/viewer processes and takes in information.The book focuses primarily on market research examples using survey data.A few iconic non-market research examples are included when they are the most effective way of communicating a key idea.Twenty-four techniques for improving visu
5、alizations are illustrated with almost one hundred examples of the good,the bad,and the downright ugly.Goal and overview of this book21STA N DA R D T ECHNI QUESFO R M A T TI NGR ESHA PI NGAlthough many of the examples in this book can be provided as interactive visualizations,the focus will be on ex
6、amples that can be shown in static documents(e.g.,PowerPoint).The constraints of statistical reporting on visualization is best appreciated by an example.The heatmap below would be a better visualization with the numbers removed interactively by the viewer(e.g.,by moving a mouse over a region of the
7、 heatmap).However,most market research visualizations ultimately end up in a PowerPoint slide.A PowerPoint slide with no numbers is of minimal use,because often the end user needs numbers;merely knowing that darkest blue stands for Aldi and low prices is not enough.143The standard techniques discuss
8、ed in this section are widely known and practiced.If you are experienced in data visualization,skip this section and go to Formatting.StandardTechniques3CHOOSING QUALITATIVE COLORSChoosing the best colors for a visualization involves:Choosing between a qualitative,sequential,or diverging color schem
9、e.Choosing complementary colors that look good together and permit easy discrimination by viewers,including those who are color-blind.Goodcolors0165Qualitative,sequential,and diverging color schemes Game of ThronesStranger ThingsA qualitative color scheme uses a distinct color for each series of dat
10、a,and there is no ordering of the colors.For example,in this stream graph,the role of colors is to allow the viewer to easily disambiguate the different TV shows.Our goals with a qualitative color scheme are to have colors that are distinct but complementary.The Walking DeadThe CrownTrue DetectiveTh
11、e CrownStranger ThingsSilicon ValleyHouse of CardsHomelandHandmaids TaleGame of ThronesThe Walking Dead87Sequential color schemes order the colors in a meaningful way.Typically,they are created from two colors,with gradations between them.In the choropleth below,darker blues denote higher levels of
12、approval for President Trump in March 2018,and gradations are between light blue and dark blue.A diverging color scheme is created by combining two sequential color schemes.In the example below,gray is used to represent people with an approval of 50%,with redder colors for higher approval and bluer
13、for lower approval.In this example,the color scheme also has a qualitative component,with red signifying majority approval and being the color of the Republican Party.109Some colors work better together than others.Numerous color theories have been developed to assist in choosing colors that work we
14、ll together.Fortunately,there are lots of great online tools that can be used to help choose complementary colors,such as https:/coolors.co and https:/.These tools even allow you to select colors complementary to any that you are required to use(e.g.,your brands colors).Choosing qualitative colorsOt
15、her things to do include:Avoid using pure,bright,or strong colors if they will appear as large swathesin a visualization.Such colors are best used in small areas.Use light grays and other muted colors as background.1Choose colors that work for those with color-blindness.Avoid placing bright colors m
16、ixed with white next toeach other.2 Use colors that are in some way associated with the data.If representingCoca-Cola,try to use a red;if Pepsi,a blue;trees as green,etc.1 Edward R.Tufte(1990),Envisaging information,Graphics Press LLC.2 Eduard Imhof(2007),Cartographic Relief Presentation,Esri Press
17、Classics.9Around 4%of people have some degree of color-blindness.Most are men.There are various types of color-blindness which all have some effect on color choice,but the one with the biggest impact is protanopia.If you can see green and red clearly,you can easily tell them apart in the visualizati
18、on on the left.If however you suffer from protanopia,they both appear as the nearly indistinguishable green-brown shown on the right.The first step in addressing color-blindness in data visualization is to use colors that are more readily seen by people with color-blindness.The rigorous way to do th
19、is is to review color palettes using specialty tools like the webpage simpler hack is to use blues instead of greens and oranges instead of reds:the blue is still largely visible,and the lightness of the orange turns into a lighter green-brown color.The second technique to deal with color-blindness
20、is to use redundant coding,as discussed in Chapter 10,Redundant encoding.Green and redWith protanopiaOrange and blueWith protanopia1211Most of these tools are designed to allow you to choose five or fewer colors.If you need to have more qualitative colors,the most straightforward solution is to use
21、the color palette(s)that come with your visualization software.If you wish to create a new color palette with a large number of colors but do not have access to a designer,you can:Use one of the tools to choose an initial small palette.Ideally,try to choose one with many colors on the same side of t
22、he color wheel(this instruction will make sense when you are in one of the apps).Choosing complementary colors from the other side of the palette.Choosing lighter or darker versions of the same color.Adding in various shades of gray.Game of ThronesStranger ThingsThe Walking DeadThe Walking DeadThe C
23、rownTrue DetectiveThe CrownStranger ThingsSilicon ValleyHouse of CardsHomelandHandmaids TaleGame of ThronesChoosing sequential and diverging color schemes is a little more challenging than choosing qualitative color schemes.The goal is to choose colors that have a natural gradation,where the degree
24、of gradation is defined by perception as opposed to technical measures such as the percentage of blue or transparency.A widely used resource for choosing sequential and diverging color schemes is http:/colorbrewer2.org.Choosing sequential and diverging color schemesWhen choosing sequential color sch
25、emes and diverging color schemes,key decisions are:What color to start with and which to end with.Ideally these are colors that have some meaning in the domain of interest.What color to have in the middle(if using a diverging color scheme).For example,light gray was used in the earlier example;somet
26、imes white is appropriate and other times black,depending on the background.Whether to use a stepped color scheme(e.g.,only five unique colors)or as many colors as there are unique data values.Whether to shade based on order,or with a mapping from the actual values of the data to the colors.For exam
27、ple,if the largest number is 100 and the second largest is 5,you can represent 5 as a marginally lighter blue,or by a blue that is close to 5%of the perceptual strength of the blue represented by 100.What value to assign to the start and end colors.For example,in the choropleth below,the pure blue w
28、as set to 0%and the coral(pink)to 100%,which allows the viewer to see that there are no states with extreme values.Alternatively,the pure blue could be set to the lowest observed value and the pure choral to the highest value,which would make the gradations between the states clearer.121413There are
29、 many ways of representing colors in software;this section describes the most common ones.The key thing to appreciate is that if the colors are named in one system,it is usually possible to convert to another naming.RED GREEN BLUE(RGB)Most software programs represent colors using the RGB system,wher
30、e each color is represented as mixtures of the colors red,green,and blue.For example,in PowerPoint,the Custom tab for color shows that the pink is made up of 255 Red,102 Green,and 204 Blue(see screenshot at right).These numbers are measured on a 256-point scale,from 0 to 255.Some software convert th
31、ese to proportions,where,for example,a value of 0.5 is equivalent to 128.In addition to the RGB specification,there is also often a Transparency setting,which is provided as a percentage.Alternatively,it can be specified as an alpha value,which is 100%minus the transparency.Color naming systemsHEXAD
32、ECIMAL(“HEX”)When colors are represented in computer code they are usually represented as strings(i.e.,text).For example,the pink above is represented as#ff66cc or as#ff66ccff.This is just another way of writing the RGB scheme,using hexadecimals(base 16 math rather than the normal base 10).That is,i
33、n hexadecimal,ff means 255,66 means 102,cc means 204 and the last two characters,which are not always required,show the alpha value,which in this case is 100%(i.e.,0%transparency).WORD REPRESENTATIONS(E.G.,“BLUE”)Sometimes software can understand word representations of colors,such as red and blue.H
34、SL AND HSVThe HSL and HSV models are alternatives to RGB,designed with the goal of representing colors in a way that is more intuitive when making design choices,as they separate the notion of the qualitative color(be it red,blue,or green)from notions of saturation and lightness(i.e.,extent of mixin
35、g with white and black).CMYK CMYK(stands for cyan,magenta,yellow,and black)is the color model used in physical printing(e.g.,laser printers).PMSUsed by professional designers,the Pantone Matching System is a commercial color system in which many pre-made colors are identified and printed in books an
36、d on cards for easy reference and precise comparison.1615CHOOSING QUALITATIVE COLORSSorting the data in a visualization typically from highest to lowest improves almost all visualizations.Sort02Many visualizations show the data sorted alphabetically by the category labels,as shown below.Where there
37、is a natural order for the visualization(e.g.age categories),this is typically the best presentation.However,in almost all other situations,it is better to sort the data from highest to lowest.Sorting is useful for several reasons.Sometimes the viewer is interested in the ranking of the items,so sor
38、ting saves them time.Sorting also reduces the potential error when drawing conclusions about order.In the visualization above,it seems that Coke Zero is bigger than Pepsi Max,and that Diet Coke is bigger than Pepsi.In the visualization below this conclusion is unmissable.With more data,this benefit
39、increases.Sorting helps the user by providing redundant encoding,a topic revisited in Chapter 10,Redundant encoding.The principle of sorting is also applicable when there are multiple series,in which case the data can be sorted either by one of those series,or by the difference between series(exampl
40、es of this are shown in Chapter 20,Force contrasts).Coca-ColaCoke ZeroDiet CokeDiet PepsiPepsiPepsi MaxCoca-ColaCoke ZeroDiet CokeDiet PepsiPepsiPepsi Max1817CHOOSING QUALITATIVE COLORSOverplotting is where data or labels in a visualization overlap,making it impossible to interpret the visualization
41、 correctly.Avoidoverplotting03The scatterplot below shows age by income.This visualization exhibits the telltale sign of overplotting,which is that the data appears in neat rows and columns.There is no way to determine from this visualization if,say,there is only one person aged 60 with income of$50
42、,000 or more.2019The simplest solution to overplotting is to replace the gray dots with partially transparent dots or circles,and then add small random numbers to the data.This is known as jittering.See the example below.An alternative option is to use tiles,where the area of the tiles is proportion
43、al to the data.2221Overplotting also occurs with labels.The overplotting in the pie chart below makes it difficult to discern much(an improved version is shown in Chapter 10,Redundant encoding).A variety of solutions to overplotting have been developed.The two traditional solutions are to shorten de
44、scriptions(e.g.,substitute one or two letters for a longer description,with a key on the side of the visualization),and to use legends(e.g.,show the colors on the pie chart with a legend explaining their meaning).Neither of these solutions are desirable,as they make the visualization harder to read(
45、see Chapter 11,Reduce eye movement).21The bubble chart below presents an example of a better solution to the problem of overplotting.The bubbles are permitted to overlap,with transparency making this overlap intelligible.The labels have been automatically positioned by the software,so they dont over
46、lap and are still close to the relevant bubbles.Where it is not possible to position the label in an optimal position,lines are drawn connecting the labels to their bubbles.Another modern strategy is to use hover effects,where few or even no labels are shown but appear when a mouse pointer is moved
47、over the visualization.22242304CHOOSING QUALITATIVE COLORSA common mistake when presenting data in a visualization is to provide insufficient information,leaving the viewer unable to understand what the data means.This is fixed by adding labels.LabelIn the stacked bar chart below,the numbers are not
48、 defined,so beyond establishing that Matt seems to be winning in satisfaction,it is unclear what the data means.An improved version is shown in the next chapter.Often context can provide an adequate explanation to compensate for a paucity of labels.In this book,we have used minimal labeling because
49、we wanted to emphasize visual aspects of visualization;but in other contexts we tend to label much more,because a visualization that cannot be read is of little value to anyone.A simple checklist for ensuring that the visualization is labeled appropriately is:Label axes.Show the units(e.g.,kg,liters
50、,$,USD).Show where the data comes from.Have a title that summarizes the data.MattDazzaBorisNelsonPeterChrisNealTimSatisfaction2625It is possible inadvertently to create visualizations that misrepresent data.A diligent viewer can usually discern the real story,but the imperative is to create visualiz
51、ations that can be accurately read by even the lazy viewer.Representaccurately053 Edward R Tufte(1983),The Visual Displayr of Quantitative Information,Graphics Press,p.5The visualization directly below is highly misleading.The data to the right of the road shows that fuel economy is 27/18-1=53%bette
52、r in 1985 than 1978,but matching line is almost eight times longer.3 An instinctive reading of such a visualization will always be wrong.The only hope of a clear reading is if the viewer ignores the visual.A more prosaic example is shown below.A glance at this stacked bar chart would lead to the con
53、clusion that Matt is leading in terms of satisfaction.A more careful reading registers that Matt has the second lowest satisfaction rate:the longer bar means he addresses more customer support issues.MattDazzaBorisNelsonPeterChrisNealTimSatisfactionNumber of support issuesSatisfiedUnsatisfied0102030
54、402827The standard solution to this problem is to express the data as percentages,as shown below.Now we can see that Boris has the lowest satisfaction level,and Matt the second lowest.The revised visualization is better but still misleading.One problem is that it fails to reveal how many support iss
55、ues each person dealt with.This is important:the more support issues somebody resolves,the higher the chances that at least one user will report being unsatisfied.This is addressed below both in the row labels and the inclusion of more informative titles and a footer.A second problem with the chart
56、above is that unsatisfied data is redundant in an unhelpful way.This is also rectified in the visualization below,which shows just the proportion of satisfied users.MattDazzaBorisNelsonPeterChrisNealTimSatisfactionProportion of support issues0%50%100%SatisfiedUnsatisfiedMatt(39)Dazza(13)Boris(13)Nel
57、son(11)Peter(7)Chris(5)Neal(5)Tim(4)100%100%100%100%100%100%98%of users are satisfied with supportNumber of support issues addressed is shown in(brackets)There are no significant differences between team members97%92%Another type of misrepresentation is shown in the chart below.The bar for Boris imp
58、lies that he is much worse than the data shows.The problem here is that the horizontal axis intersects the vertical axis at 80%,rather than the more appropriate 0%.Unfortunately,problems with market research data are usually not this clear.In the example above,the data is on a ratio scale,which is t
59、o say that it is meaningful to compare the ratio of one number to another(e.g.,97%is about 97%/92%-1=5.4%higher than 92%).However,most market research visualizations do not have ratio-scale data,which makes the size of bars more arbitrary.Matt(39)Dazza(13)Boris(13)Nelson(11)Peter(7)Chris(5)Neal(5)Ti
60、m(4)98%of users are satisfied with supportNumber of support issues addressed is shown in(brackets)There are no significant differences between team members80%100%95%90%85%3029Consider the chart above,which shows net promoter scores(NPS)for two banks over four months.At first glance this seems a sens
61、ible visualization,with column heights proportional to the values.However,the NPS is measured on a scale of-100%to 100%,so arguably a more accurate representation is the one below.In this case the increased accuracy seems unhelpful.There are many other similar situations in market research(e.g.,when
62、 showing average importance scales,it is rarely ideal to set the axes to start at the lowest possible average and finish at the highest).60%40%20%0%-20%-40%-60%-80%-100%NET Promoter ScoreJanuaryFebruaryMarchAprilBank A Bank B32%12%32%32%35%29%29%23%NET Promoter ScoreJanuaryFebruaryMarchAprilBank A B
63、ank B32%12%32%32%35%29%29%23%The circle packing(or bubble cloud)visualization shown below uses the sizes of the circles to communicate the different values.This raises a problematic issue:how does one define the size of a circle?To the mathematically-minded,it is obvious that size should mean area.B
64、ut most peoples ability to read size from the area of an image accurately is poor.4 Consequently,it is not uncommon to find visualizations where area is represented by the height of the circle.To the math whiz,this is obviously an error.However,the mere existence of such errors demonstrates that the
65、 more mathematically-oriented view may itself be incorrect:if many people equate size with height,encoding the values into the area will lead to more errors.The key point here is that when using visual elements where their area could be used by the viewer to make inferences,it is advisable to provid
66、e redundant encodings,as area on its own is not sufficient.Area is perhaps best interpreted as a way of showing relative orderings of values,as opposed to being thought to be able to precisely communicating actual values.The area size issue(in particular,with circles)4 Jacques Bertins(1967),Smiologi
67、e Graphique.Les diagrammes,les rseaux,les cartes,Translation 1983.Semiology of Graphics by William J.Berg303231Some visualizations are intrinsically better for showing certain types of data than others.This chapter covers a few very basic rules of thumb regarding how to select an appropriate visuali
68、zation.Much of the rest of this book explores this theme in more detail.Appropriate type06Perhaps the only widely agreed-upon rule is that line charts are usually best for comparisons of trend data,making the visualization below on the right preferable to the one on the left.Use line charts to show
69、and compare trendsNET Promoter ScoreJanuaryFebruaryMarchAprilBank ABank B32%12%29%23%35%29%32%NET Promoter ScoreJanuaryFebruaryMarchAprilBank A Bank B32%12%32%32%35%29%29%23%3433Bar charts and column charts are ideally suited for comparing values,where the data has a ratio scale(i.e.,where there is
70、a meaningful 0 value for the bar to commence).Column charts are perhaps a bit more accurately read,5 but in practice bar charts tend to be more generally useful as they make it relatively easy to create charts with long labels,without the need to wrap.Bar and column charts are good for comparisons5
71、Jacques Bertins(1967),Smiologie Graphique.Les diagrammes,les rseaux,les cartes,Translation 1983.Semiology of Graphics by William J.BergCoca-ColaCoke ZeroDiet CokeDiet PepsiPepsiPepsi Max44%18%16%11%9%3%Pie charts are useful for showing cumulative proportions.For example,the nested pie chart below sh
72、ows that Chrome 48 is by far the biggest browser,and that Chrome accounts for more than half the market.Despite this obvious strength,there is far from complete agreement about the strengths of pie charts and the closely related donut charts(pie charts with a hole in the middle).Some visualization e
73、xperts criticize them heavily.This is when it is useful to go back to the original scientific research,as the conclusions attributed to it that pie charts are never useful.However,this criticism is frequently unwarranted.(Sometimes)use pie charts to show cumulative proportions3635Cleveland and McGil
74、ls study6 makes a compelling case that pie charts like the one on the left below are typically inferior to column charts like the one on the right,as peoples estimates of the size of the bars are considerably more accurate than estimates of pie slices.Presumably back in 1984 when the study was publi
75、shed it was commonplace for people to create pie charts like the one on the left,and we can credit the study with having led to the eradication of such visualizations.A more relevant comparison for today is to compare the charts below.As the values are shown on the visualizations,and the categories
76、are sorted,it is difficult to believe that the finding of the original study would recur if the study was repeated with these more modern designs.6 William S.Cleveland and Robert McGill.“Graphical Perception:Theory,Experimentation,and Application to the Development of Graphical Methods”,Journal of t
77、he American Statistical Association,Vol.79,No.387(September),pp.531-554.Does that mean that one visualization is as good as the other?Not really.The column chart is usually preferable when the goal is to compare one category with another.However,if we are instead trying to make conclusions about cum
78、ulative proportions,the pie chart is usually the better choice.The pie chart allows us to see readily that together D and B account for more than 50%.This cannot be seen on the column chart;instead,the user is forced to look at the value labels and do the math.One case in which the pie is better tha
79、n the bar is if comparing two proportions.Where the data align to halves,quarters,or eighths,the pie chart will likely outperform a bar chart,as shown in the image below.These two visualizations show the same information,but only the pie chart makes the conclusion obvious.It is hard to compare these
80、 lengths preciselyIt is easy to see that 25%is removed3837The previous section qualified the recommendation to use pie charts to show part/whole relationships with“sometimes”.This is because,among some in the visualization community,it is an article of faith that pie charts are bad.Most who hold thi
81、s view do not appreciate the irony that their views are based on poor data.Most books dealing with visualization advise against pie charts,either citing the original research or seeking to demonstrate the unsuitability of pie charts by presenting poorly constructed pie charts.For example,The Big Boo
82、k of Dashboards presents the example shown below,7 which is unsorted,shows no value labels,uses poor colors,and is not the type of pie chart that even a semi-numerate pie-chart devotee would use.In situations where a pie chart is desirable but not practical due to the audience,an acceptable alternat
83、ive is often to use a waffle chart.8Use waffle charts when presenting to pie chart-haters7 Steve Wexler,Jeffrey Shaffer,and Andy Cotgreave(2017),Big Book of Dashboards:Visualizing Your Data Using Real-World Scenarios,p.32.8 For example,waffle charts are used throughout Cole Nussbaumer Knaflic(2015):
84、Storytelling with data,and pie charts are described as being“evil”(p.61).The waffle has two advantages over the pie:the elements are countable which provides a useful form of redundant encoding and perhaps most importantly,nobody famous has criticized them.Preferred ColaCoca-ColaCoke ZeroPepsi MaxDi
85、et CokePepsiDiet Pepsi4039Most visualizations involve a transformation of data,from a representation as a table of numbers into a visual with some other coordinate system.For example,the pie chart converts numbers into angles and slices of a circle,and the bar chart converts numbers into a two-dimen
86、sional space with numbers as rectangles.Of the widely-used visualizations,heatmaps change the structure the least,retaining.the tables original structure of rows and columns,with the only change being replacement of numbers by a color.Consequently,heatmaps are usually the best visualization for comm
87、unicating a basic understanding of patterns when the data consists of a large table of comparable numbers.As an example,the visualization below,showing visits to a website over a year,plots 360 data points in a simple way that allows us to see that the visits have been steadily rising,with most of t
88、he traffic on weekdays.Start with heatmaps when viewing large tables of comparable numbersWebsite visitsWeek commencingWeekWhen in doubt,use line,column,or bar chartsColumn charts,bar charts,and line charts,are safe default charts.They are widely understood and for many problems they are the optimal
89、 visualization,as their encodings of data are,typically,most accurate.For example,while the heatmap above does a great job of communicating the basic story in the data,its encoding color is not one in which humans excel at finding patterns.9 Basic questions,such as the percentage of growth to have o
90、ccurred over the past year,cannot really be read from such a visualization.The line chart below provides us with much more precision about the quantities of the data above(and shows two extra years of data at the beginning).Note,though,that this line chart is better than the heatmap only because of
91、the knowledge we gained by first plotting the heatmap(i.e.,that the variation we see in the line chart between days is due to the effect of weekends).9 Jacques Bertins(1967),Smiologie Graphique.Les diagrammes,les rseaux,les cartes,Translation 1983.Semiology of Graphics by William J.BergDaily website
92、 trafficWebsite visitors4241Other rulesThere are other rules of thumb,but there are many exceptions to these rules.Other rules include:Use radar charts(spider charts)to compare cycles,such as usage by hour across days.Use heatmaps to show data where there are patterns in two dimensions(e.g.,by month
93、 and within month).Use violin plots to compare distributions(see Chapter 16,Create symmetry).41By reducing the amount of clutter on a visualization,we increase the chance that the user can see the key patterns in the visualization.De-clutter074443A core principle of visualization is to avoid distrac
94、ting the user.We want the signal to be clear.This requires us to minimize the amount of noise.Statisticians sometimes quantify the amount of clutter as the data-ink ratio,where data is anything that cannot be erased without altering the meaning.11 For example,most of the ink in the table below is un
95、informative,whereas the next table has a much higher data-ink ratio.11 Edward R.Tufte(1983),The Visual Displayr of Quantitative Information,Graphics Press.This same technique can also be applied to charts.In the visualization below,for example,there are at least two kinds of noise.The background is
96、pure noise chart junk,to use the pejorative favored by statisticians.And most of the columns themselves are noise:we could erase everything except their tops and still interpret the data correctly.By contrast,the line chart shown at right is much cleaner.Only the beginning and end value is shown for
97、 each series,as the intermediate numbers add nothing.Lines are used instead of bars,thus revealing much more whitespace.The background is plain white so that nothing distracts the viewer.NET Promoter ScoreJanuaryFebruaryMarchAprilBank ABank B323212NET Promoter ScoreJanuaryFebruaryMarchAprilBank A Ba
98、nk B32%12%32%32%35%29%29%23%4645Visualizations are often created with the intent of attracting attention.However,this technique is often at odds with other standard techniques(e.g.de-cluttering).Attract attention08A visualization can only be interpreted if it is noticed,which is why graphic artists
99、often put a lot of work into making visualizations more appealing.A famous and controversial example of this is shown below.Statistician and artist Edward Tufte describes this visualization as“unsavory chockablock with clich and stereotype,coarse humor,and a content-empty third dimension.contempt bo
100、th for information and for the audience.who would trust a chart that looks like a video game?”12 His conclusion is that the simpler visualization is the better,like the one shown below,in which all chart junk has been removed.12 Edward R.1990Tufte,Envisaging Information,p.344847However,as discussed
101、in the next chapter,the sexist visualization is the superior one a lot smarter than Tufte acknowledges.A simple way to gain attention is to use pictures and images to construct pictographs,such as the cola market-share chart shown to the left.When creating such visualizations there is a conflict bet
102、ween the accuracy with which the data can be perceived and the goal of attracting attention.Among the ultraconservative(such as Tufte)there are also issues of credibility.Nevertheless,it is possible to create visualizations that are both attractive and have desirable perceptual properties.Even picto
103、graphs can be laudable for example,the chart to the right.It is hard to imagine a more traditional visualization doing a better job.The use of rows of 10 make the data countable,which reinforces the data,crystalizing the somewhat abstract concept of a portion of a percent.The part/whole relationship
104、 of a percentage is communicated by the gray.The overall design makes the comparison between the numbers straightforward.A bar chart of the same data,like the one below,has much less impact.SARSPertussisSmallpoxMalariaInfluenza A11.0%2.7%1.0%0.3%0.1%Case fatality rateThe previous eight chapters have
105、 reviewed the standard and widely known techniques used for creating visualizations.This section focuses on lesser-known techniques for improving visualizations through formatting.Formatting485049A mnemonic is something that assists viewers to recall information.A visualization that helps users reca
106、ll the story in the data is preferable to a forgettable one.Create mnemonics09In addition to attracting attention,Diamonds were a girls best friend operates as a mnemonic.It plants in our memory that diamond prices trace the line from the reclining womans buttock,up to her knee,and down to her inste
107、p.In doing so it associates the trajectory of diamond prices,(something we didnt know prior to seeing the visualization)with the shape of a bent leg(something we know well).Because our memories work through association,this improves our chance of remembering the visualization.Sexual associations are
108、 known to be particularly strong,13 which further improves the quality of this visualization as a mnemonic.13 Joshua Foer(2012),Moonwalking with Einstein:The Art and Science of Remembering Everything,Penguin Books.5251Another classic example by the same designer,Nigel Holmes albeit one that clearly
109、fails in terms of accurately representing the data is shown below.A study has found that visualizations like this are more likely to be recalled after three weeks.14 Considerable skill and time is required to create visualizations like these,but a more straightforward approach is the liberal use of
110、icons to create pictographs,such as in the simple visualization to the right.Although it is paint-by-numbers creativity compared to the earlier examples,it still is likely to be more memorable than a more conventional visualization.1514 Scott Bateman,Regan L.Mandryk,Carl Gutwin,Aaron Genest,David Mc
111、Dine,Christopher Brooks(2010),Useful Junk?The Effects of Visual Embellishment on Comprehension and Memorability of Charts.ACM Conference on Human Factors in Computing Systems(CHI).15 This visualization was inspired by a similar waffle chart in Cole Nussbaumer Knaflic(2015):Storytelling with data,Wil
112、ey.The data is from https:/.au/warren-buffett-berkshire-hathaway-historical-returns-2017-5.When a visualization attracts attention,it is more likely to be remembered.People cant recall things they dont notice.However,the technique of creating mnemonics is about more than just attracting attention:it
113、 involves creating some relevant association between the data and the visualization.In Diamonds were a girls best friend the association relates to the shape.In Monstrous costs the association relates to the monstrous implication.With the visualization above,associations are created by the image of
114、a wallet and the color of money.Some general advice from the designer of the first two visualizations is to use only images that are instantly recognizable.Generally useful ideas include:16 Sports.If the goal is to show performance,sports imagery such as jumping over,getting to a line first,hurdling
115、,diving,scoring a goal can be successful.Tools.Where the goal is to show force of some kind such as crack down,apply pressure,cut,saw,squeeze,or measure precisely a relevant tool works well.Domestic appliances.Refrigerators freezing prices,vacuum cleaners sucking away profits,beds for sick economies
116、,plants for growth,windows for looking through to the future,etc.16 Nigel Holmes(1984),Designers Guide to Creating Charts&Diagrams,Watson-Guptill Publications.5453Encoding refers to how data is represented in a visualization.Redundant encoding is when the same information is represented in multiple
117、ways.With a few exceptions,it is generally desirable to use redundant encoding.Redundantencoding10The heatmap below shows a rudimentary example of redundant encoding.The data is encoded both by the numbers shown in the cells and by the colors.In other words,the information appears twice.The benefit
118、in this is straightforward:color allows us to see the pattern easily,and numbers allow us to quantify the patterns.In academic visualizations,dot plots like the one below are often presented as being good practice.They maximize the data-ink ratio,which,as discussed in Chapter 7,De-clutter,is a way o
119、f reducing extraneous information.Dot plots are rarely used outside academic and government statistics.The bar chart is much more popular.With the dot plot,the viewer must deduce that the dots are in two-dimensional space.With a typical bar chart,such an interpretation is also possible if the viewer
120、 looks at the ends of the bars.However,there are two additional encodings that are not present in a dot plot:the width of the bar and the area of the bar.It is also possible to add more redundant encodings to a bar chart.In this example,the color of the bars also encodes the key findings,as do the o
121、rder of the bars,the position of the labels,and the labels themselves.By encoding equivalent information in lots of ways,the bar chart makes it almost impossible for the viewer to miss its key patterns,which makes it the default chart of choice.Preferred ColaPreferred ColaCoca-ColaCoke ZeroDiet Coke
122、Diet PepsiPepsiPepsi Max44%18%16%11%9%3%5655It has been argued that the redundant encoding is itself an example of clutter and should be avoided.17 However,there is no data to support this conclusion:the data shows either that redundant encoding is desirable and“almost always a benefit”18 or that it
123、 makes no difference at all.19In Chapter 5,Represent accurately,we examined the stacked bar chart on the previous page,concluding that showing the both the unsatisfied and satisfied data on the same visualization was poor because the information was redundant.Redundancy in that context meant somethi
124、ng different.The problem with using a stacked bar chart to represent only two categories is that the two series represent the same information in the opposite way.As a result,the two series cancel each other out,and do not add any more information.Even with the addition of a legend,the chart can onl
125、y be interpreted with some effort and time on the part of the user.17 Edward R.Tufte(1983),The Visual Display of Quantitative Information,Graphics Press.18 Colin Ware(2012),Information Visualization:Perception for Design,3rd Edition,Morgan Kaufmann,Kindle Edition,p.159.19 Russell Chun(2017),“Redunda
126、nt Encoding in Data Visualizations:Assessing Perceptual Accuracy and Speed”,Visual Communication Quarterly,Volume 24(3),pp.135-148.MattDazzaBorisNelsonPeterChrisNealTimSatisfactionProportion of support issues0%50%100%SatisfiedUnsatisfiedThe chart above is a car crash presumably it is examples like t
127、his which have given pie charts such a poor reputation among visualization pundits.By contrast,the donut chart below is extremely effective,in part because it has been designed to avoid the overplotting of labels.However,it also benefits from the use of redundant encoding.In the pie chart above,data
128、 is encoded in three ways:by the slice angles at the origin,by the slice area,and by the length of the arcs on the outside of each slice.In the donut chart below,the same information is also encoded by order,font size,and value labels.5857We can introduce further redundancy by coloring the slices pr
129、oportional to the values in the data,as done below.This visualization best communicates the overall dominance of Chrome 48.But it is not as pretty as the one above,which raises the question of how well it will attract attention.57Pie ChartThe use of cans in the visualization scatterplot below is als
130、o an example of redundant encoding,as these cans,like most branded products,employ extensive redundant encoding(i.e.,the branding is communicated by name,colors,font,and overall design).6059One strategy for redundant encoding that typically also pays dividends in terms of attracting attention is the
131、 use of icons or images for countable data.In addition to the redundancy achieved via the branding,the pictograph below can be counted by the viewer.Coca-Cola per day:12.5The strategy of using countable pictographs is achievable with many different types of visualizations,from pie and bar charts all
132、 the way through to small multiples of treemaps as in Otto Neuraths visualization of the First World War,in which the number of soldiers and casualties of the armies can be compared based on area and number of icons.6261In an ideal world,a visualization can be instinctively understood with a glance.
133、Usually,however,the user must repeatedly scan the visualization looking for key information.For example,they may have to look up the meaning of a color in a legend.All else being equal,the fewer eye movements required for a viewer to interpret a visualization,the better.Reduce eye movement11Consider
134、 the mosaic chart(also known as a Marimekko or“mekko”)chart below.What does the gold cell in the left column mean?To work this out the viewer must read to the top to work out that it means“at home”,then trace the light orange to the right to deduce it is diet Pepsi.Because there is no number in the
135、cell,they then need to compare its area to the other areas to deduce its size.It is not surprising that mosaic charts typically need to be explained.at homeout and aboutOccasion30%33%32%23%19%14%5%4%8%14%17%Coca-ColaCoke ZeroDiet CokeDiet PepsiPepsiPepsi Max6463By contrast,the treemap below arranges
136、 the tiles in such a way that there is room to fit in labels,so the user needs less eye movement to deduce the 1%of consumption is of Diet Pepsi“at home.”An added advantage of this is that color is no longer needed to assist in looking up the brand,so it can instead be used to show both occasion(col
137、or)and consumption(intensity of color).That is,in this example,the treemap has more redundant encoding than the mosaic chart.The pictograph below also reduces eye movement.By using an image of President Trump to create the image,one need not even label the visualization,eliminating the need for the
138、viewers eyes to move from the image to the title and back.43%approvalThe pictograph below shows data for a seven-point rating scale.In this case,the number of images performs the role traditionally performed by the user having to look up numbers on an axis or methodological note describing the numbe
139、r of scale points.A simple mechanism for reducing eye movement is to reduce the size of any visualization.The smaller the visualization,the less time taken to scan it.The goal to strive for is to make all distinctions as small as possible but still clear to the viewer.20Foreign policy4.5Domestic pol
140、icy6.2Lifestyle2.3Foreign policy4.5Domestic policy6.2Lifestyle2.36665Lean back away from the screen and look at the image below.21 What do you see?A woman?Now lean in and look at the image again.What do you see?You will likely notice her mouth is a gremlin and the right-side of her hair a toucan.We
141、can view an image as a whole only if it falls within about a 6 focal arc from our eye.22 If we are 50 cm(20 inches)from an image that means the image needs to be less than 5.2 cm(2 inches)wide and high.If the image is larger we need to move our eyes to assemble the image within memory.Small visualiz
142、ations are better.20 Edward R.Tufte(1977),Visual Explanations,Graphics Press.21 From Colin Ware(2012),Information Visualization:Perception for Design,3rd Edition,Morgan Kaufmann,Kindle Edition,p.294.22 From Colin Ware(2012),Information Visualization:Perception for Design,3rd Edition,Morgan Kaufmann,
143、Kindle Edition.6867Many visualizations can be improved by showing norms (e.g.,typical,average,median,or“normal”results).Show norms12The most widely used norm in visualization is to show a line of best fit,such as in the visualization below.This allows the viewer to get an appreciation for both the o
144、verall trend and the difference of individual data points from that trend.687069Density plots,like the one shown below,are commonly used for displaying the distribution of data.Although they can do a good job of showing the overall shape of a distribution,along with modes and the shape of tails,they
145、 cannot communicate simple statistics,such as medians and quartiles.One solution is to overlay heatmap shading.In the example,we can see that the typical number of days since the trial started is a little over 50,which is not at all clear from the standard density chart.In the decision tree below,wh
146、ich shows trust in the European Union Parliament by UK people prior to Brexit,there is no normative data.To interpret the tree,the viewer needs to read and remember all its content.By contrast,when the tree is represented as a sankey diagram,with the branches colored according to the average values,
147、interpretation becomes much more straightforward.In this example,the average level of trust is shown in gray and the width of the branches shows the proportion of people.So,we can quickly see that the key divide in the UK relates to age,with older and unhappier people having lower levels of trust in
148、 the EU parliament.7271Use visual cues and comments to make the intended focus clear to the viewer(e.g.,summaries,callout boxes,highlighting,arrows,color).Emphasize13It is almost painful to count the threes below but much easier when they are emphasized.23The simplest way to add emphasis to a visual
149、ization is by adding commentary.23 Adapted from Cole Nussbaumer Knaflic(2015):Storytelling with data,Wiley,p.103.PriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 mi
150、ssing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%49%c63%A68%A60%A71%A40%61%A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelSignificance tests on chartsRecommended TargetMa
151、rket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missingAldi is better on price,but much worse on everything else than Coles and Woolworths
152、7563950684736586630375768603726586028465891078307563950684736586630375768603726586028465891078307473Adding additional graphical devices to emphasize key contrasts is also effective,as in the three examples below.PriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolwo
153、rthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%49%c63%A68%A60%A71%A40%61%A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;to
154、tal sample size=500;157 missing;95%confidence levelSignificance tests on chartsRecommended TargetMarket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size
155、=500;157 missingPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%49%c63%A68%A60%A71%A40%61%
156、A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelSignificance tests on chartsRecommended TargetMarket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69
157、%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missingPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500
158、;157 missing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%49%c63%A68%A60%A71%A40%61%A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelSignificance tests on chartsRecommended T
159、argetMarket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missingWith more technical audiences,emphasis can be added by using automated stati
160、stical tests,as in the two examples below.PriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%
161、49%c63%A68%A60%A71%A40%61%A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelSignificance tests on chartsRecommended TargetMarket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%
162、63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missingPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to
163、 450;total sample size=500;157 missing;95%confidence levelPriceAccessRangeFreshQuality78%B C40%25%36%45%49%c63%A68%A60%A71%A40%61%A70%A64%A69%AAldiAColesBWoolworthsCSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing;95%confidence levelSignificance te
164、sts on chartsRecommended TargetMarket(s):MarketResearchersPriceAccessRangeFreshQuality78%40%25%36%45%49%63%68%60%71%40%61%70%64%69%AldiColesWoolworthsSupermarket Associations merged SUMMARYsample size=from 343 to 450;total sample size=500;157 missing7675Remove as much color as you can.Where possible
165、,make everything gray(exceptions to this rule to follow).Anything you would ordinarily do in black will look better in dark gray.Any peripheral information should be in a light gray so that it does not detract.Reduce color14The word cloud above sets the font sizes proportional to approval of Preside
166、nt Trump as measured in March 2018.An unfortunate aspect of this word cloud is that the most interesting result that the District of Columbia has the lowest approval is the hardest insight to find.This is due partly to our eyes focusing on larger words over smaller,but it is also due to how color ha
167、s been used.In a word cloud,color is used to help the viewer disambiguate words.However,this results in a riot of color that makes it hard to focus on the meaning of the words.The circle packing visualization is much more effective,in part because it is substantially less colorful.It is also because
168、 circle packing uses a norm(gray is 50%approval)and lots of redundant encoding,with approval ratings indicated by the size of the circles,the color,and their order.7877A subtler example of reducing color is the labeled bubble charts below.In some ways the version at the top is more appealing.However
169、,the black framing around the edge of the visualization detracts attention from the bubbles,which is mitigated by making them all gray and aligning them in rectangles(the focus of the next chapter).More examples of reducing color are presented in the remaining chapters.A visualization can often be i
170、mproved by placing subjectively located elements and spaces so that they form rectangles.While this is a useful hack,a professional designer will be able to improve on this,as they have a much richer repertoire of design principles from which to draw.Form rectangles158079What makes the column chart
171、below24 pleasing?Yes,the colors and fonts are nice.Less obvious is that the elements have been laid out to form rectangles.24 http:/.au/2015/06/investing-in-brand-beats-advertising-study-branding-agency/The same visualization with elements aligned into rectangles and colors directing eye movement is
172、 now more appealing and less distracting.25The commentary below makes the resulting visualization look quite messy.The mess is a problem on two levels.First,who wants a mess?Second,what we perceive as mess distracts us from looking at the data.25 Adapted from Cole Nussbaumer Knaflic(2015):Storytelli
173、ng with data,Wiley,p.81-82808281Forcing visualizations to have symmetry can be surprisingly effective.Create symmetry16The stacked area chart below is interesting to look at but tells us little other than the peak times of popularity for Game of Thrones and Stranger Things.This is because any spikes
174、 in the lower series make the upper series harder to interpret.Stream graphs improve on area charts by making the visualization symmetrical around its middle.This does not affect our ability to interpret the cumulative data(which is now shown at the top and the bottom)but does make all the other ser
175、ies easier to read.For example,it is now clear from this visualization that The Walking Dead has relatively continuous appeal throughout the data.The reason that the stream graph is often better than the area chart is that by making the chart symmetrical,the height of the spikes is halved,which in t
176、urn makes the degree of distortion on the other series smaller.Game of ThronesStranger ThingsThe Walking Dead8483These benefits are also the reason why many market researchers to summarize rating scale data using stacked column charts with positive and negative series.83Stacked column chartEase ofge
177、tting tothe storeMilkCheeseCheckoutareaDrinks(cola,water,coffee,etc.)Snacks(chips,biscuits,etc.)FrozenfoodsCleaningproductsFruit&vegBakeryMeat42%20%39%23%34%23%34%25%34%25%34%22%31%21%31%26%28%32%27%34%25%34%38%37%42%41%41%44%48%43%40%39%41%PassivePromotersDetractorsDefinitely recommendDefinitely no
178、t recommendThe visualization below seeks to improve upon density plots(see Chapter 12,Show norms)by overlaying a form of box plot over them,where the one dot shows the mean,the bar the median,and the thick black line shows the quartiles.The additional information somehow makes the visualization less
179、 appealing.However,by mirroring the density plot and optionally flipping it onto its side,we obtain the more popular violin plot.Here the symmetry is ultimately redundant:it provides no real information but nevertheless makes the visualization more appealing.8685This section focuses on techniques th
180、at improve visualizations by fundamentally altering the shape that appears in the data.Reshaping85A small multiple design creates a separate visualization for each series of data.They are best for revealing the range of variation in the data and for comparing series of data.Small multiples178887The
181、visualization below shows the TV data from the previous chapter.Even though each data series is now given less than 10%of the space from the previous visualization,the visualization is substantially more revealing.The data for Game of Thrones is almost as detailed here,but the data for all the other
182、 series is displayed significantly better.87Small multiples of area charts are shown below.Even though each part of the visualization is very small,the key patterns are much easier to see.This is also aided by reduced color,added emphasis,and sorting.This visualization now makes it easy to compare t
183、he banks in terms of their overall levels,as well as their trends.The visualization also reveals the downside of small multiples:the number,size,and neatness of the labels.Both the stacked area chart and the stream graph suffer from obvious problems when it comes to representing the data for differe
184、nt series,due to the distorting effect caused by the stacking.No such distortions exist in the line chart shown below.Nevertheless,this visualization is still very difficult to read.9089Radar charts,also known as spider charts,are problematic when there are more than a small number of series being p
185、lotted.In the example below,which plots only seven series,we can extract insight,but it takes effort.By contrast,in the small multiples version it is much easier to compare the different brands strengths and weaknesses.The use of the small multiples has allowed us to improve the visualization by sor
186、ting,showing the norm(the averages,represented by the gray),reducing colour,and emphasizing a result that is of interest to the target viewer.In the examples on the left,small multiples have been used as a way of disentangling more complicated visualizations.They can also be used more generally.For
187、example,the table below shows the output from a model looking at brand preferences,where the norm,0,is shown by the change of red to blue.This visualization allows the viewer to see readily that Apple is the most divisive of brands,with people who variously dislike it,are ambivalent,and like it more
188、 than any other brand,whereas Google is widely liked but less loved than Apple.Each of the examples in this chapter illustrates the effectiveness of small multiples.It is a powerful technique.However,the use of small multiples does involve trade offs.As discussed earlier,the small size can make the
189、labels messy and difficult to read.A second limitation is that they trade an improvement in ease of seeing the pattern for a decrease in ease of comparing series.In situations where there is minimal overlap between series and no need to use color to disambiguate series,small multiples will tend to d
190、etract.For example,data in the multiple-line chart shown in the next chapter would be less accessible if rendered as small multiples.9291The vertical or horizontal orientation of visualizations should be modified so that the average“slope”is around 45.Banking to 4518The line chart below shows the nu
191、mber of sunspots per month for the last 270 years.The same data is shown below,except that the visualizations aspect ratio height to width has been changed.The new visualization is vastly superior.We can still see all the patterns evident in the larger visualization,but one new pattern is now much e
192、asier to spot:the rate at which sunspot activity increases is typically much faster than that at which it decreases that is,most of the spikes are skewed,with the peak to the left.2626 Both the idea of banking,and this application,come from William S.Cleveland(1994),The Elements of Graphing Data,Hob
193、art Press.9493For some reason,our brains find it easy to grasp patterns that are near to 45 angles.In the sunspot data,the patterns relate to cyclicality rather than individual data points;by changing the aspect ratio so that the average is around 45,we make the trend much easier to see.While it is
194、possible to use algorithms to attempt to bank the data optimally,27 eyeballing seems both easier and likely better,due to the difficulty in defining which aspect of the data to bank.In the sunspot data,for example,we do not want to bank the actual line but rather its cycle aspect.Also,the“45 rule”is
195、 not based on rock-solid science,28 so the quick and easy approach of changing the aspect ratio to accommodate the eye is sufficient.27 Jeffrey Heer,Maneesh Agrawala(2006),“Multi-Scale Banking to 45 Degrees”,IEEE Trans.Visualization&Comp.Graphics(Proc.InfoVis),12(5),701708.28 Justin Talbot,John Gert
196、h,Pat Hanrahan(2012).“An Empirical Model of Slope Ratio Comparisons”,IEEE Trans.Visualization&Comp.Graphics(Proc.InfoVis).The line chart above shows attitudes to various institutions over time.29 It is redrawn below with the data banked to 45,less color,clear emphasis,and a smaller size.29 European
197、Social Survey,Data file edition 2.1.NSD-Norwegian Centre for Research Data0204060810121433.544.555.566.5Average trust in the institution(from 0 to 10)countrys parliamentthe legal systemthe policepoliticianspolitical partiesthe European Parliamentthe United NationsESS round by Trust insample size=fro
198、m 11235 to 15570;total sample size=15667;4432 missing0204060810121433.544.555.566.5 AVERAGE TRUST(0.10)countrys parliamentthe legal systemthe policepoliticianspolitical partiesthe EuropeanParliamentthe United NationsESS round by Trust insample size=from 11235 to 15570;total sample size=15667;4432 mi
199、ssing0204060810121433.544.555.566.5 AVERAGE TRUST(0.10)countrys parliamentthe legal systemthe policepoliticianspolitical partiesthe EuropeanParliamentthe United Nations9695Where there are multiple different patterns of interest,it is often useful to break down the data into parts,so that the viewer
200、can concentrate on each part separately.Decompose19The most well-known type of decomposition in data visualization is the seasonal decomposition.In the example below,Australian beer production30 is shown by month in the plot at the top.The three plots underneath are the components of beer sales,whic
201、h collectively add up to the data shown at the top.By isolating the different components of the original data,interesting results the precise point where growth stopped,and the various step points of decline become easier to see.30 Australian Bureau of Statistics.Cat.8301.0.55.001.969897The visualiz
202、ation below is a decomposition of the banking net promoter score previously examined in Chapter 17,Small multiples.In this visualization,each column represents the most recent months scores,as these were of greatest interest to the users of the visualization because they were used to determine bonus
203、es.The sparklines below each column show the trend data.Statistically significant results have been emphasized,showing Bank Cs most recent month as significantly higher than the preceding month,and Bank A as the worst and in freefall.Net Promoter Score(NPS)The decompositions shown so far have all be
204、en quite technical in their design,but the basic principle can be used in more attractive visualizations,such as the one shown below.The visualization below decomposes President Trumps approval in terms of overall trend and trend by state.President Trump Approval10099Contrasts are the key difference
205、s relevant to the viewer.Visualizations are improved by making these contrasts obvious.Force contrasts20The line chart on the right shows the same data as the one on the left,but each years data has been shown as a separate line,making the comparisons between different years easier.The visualization
206、 on the right makes two contrasts much clearer to the viewer:The main growth occurred from 2013 to 2014.There is strong seasonality,with Nike interest peaking in March/April,August,and at Christmas.The visualization below is a palm trees visualization,showing the concerns that Americans have about d
207、ifferent countries when travelling abroad.The height of each palm tree shows the average concern across the countries,with cost being the biggest concern,followed by safety.The length of the fronds shows the extent of association for each country.For example,Cost is a concern regarding all countries
208、 except Mexico,and Safety is a concern in Mexico,Egypt and,to a lesser extent,China.102101Although these palm trees are an interesting visualization,we must work reasonably hard to extract insight from it,because it shows poor contrasts.Below the visualization has been created after first transposin
209、g the data,making the contrasts easier to see.In the visualization below,contrasts between the countries are much more accessible.The height of each tree shows us the levels of concerns about each country:Egypt and China,followed by Mexico,are the countries where people have the greatest concerns.Th
210、e shapes formed by the fronds allow us to contrast the countries.Concerns about Australia and Great Britain are almost identical,and largely relate to Cost,whereas France is constrained both by Cost and concerns about Friendliness and Not being understood.The clustered bar chart below is an excellen
211、t example of how not to show contrasts.The key comparison between consulting and fieldwork is shown by the difference in length of the red and green bars.That is,the key thing that the viewer should look at is communicated by the absence of a visual element:instead,attention is focused on a wall of
212、bright,uninteresting color.By contrast,the dumbbell plot below highlights and draws focus to the differences,which are also shown in a way that accommodates color-blindness.Another way to emphasize contrasts is to use a slope chart,where steepness of slope is proportional to size of difference.10410
213、3However,a practical problem with slope graphs is that often they suffer from severe overplotting problemsA solution to the overplotting is to plot the ranks rather than the original values.The resulting visualization is known as a bump chart.These can be made more exciting,and original numbers can
214、be shown as well as rank.The heatmap below shows a years worth of website visitation data.It reveals a growth in website traffic,with the overall level of blue linked to each month increasing from left to right.Website visits106105The same website data is shown in the heatmap here,but the data has b
215、een reorganized to focus on days of the week rather than days of the month.This visualization makes obvious something that is hidden in the monthly visualization:the strong day-of-week effects,with low visitations on Saturday and especially Sunday.An alternative decomposition of the beer production
216、data in the previous chapter is provided by the cycle plot below.This plot emphasizes the level of sales by quarter,showing us that the peak is in Q4(which seems a bit odd,as Q1 is the hottest quarter in Australia).This pattern cannot readily be discerned from the seasonal decomposition.Whether or n
217、ot this is better than the earlier seasonal decomposition depends on the question of interest:The process of choosing a contrast is determined by the story to be communicated.Website visitsWeek commencingAustralian Beer ProductionWeekRearrange the values or series of data,using additional variables
218、that explain or contextualize the results.Order by context21108107Where data is geographic in nature,context can be added by plotting the values directly onto a map,as shown below.Chapter 12,Show norms,illustrated the use of packed circles to show President Trumps approval by state.In the visualizat
219、ion below,the circles have been replaced by squares,ordered to align roughly with the geographical positions of the states.This is known as a state bin,and it solves a problem of accuracy of representation in choropleths-that the emphasis given to a state is determined by that states geographical si
220、ze.In the visualization below,the net promoter score for different departments of a supermarket are communicated using a traffic-light system and overlaid on an image representing a supermarket.In some instances,the associations are literal such as with frozen foods while in others the correspondenc
221、e between the data and the visual elements seems tenuous at best.The point to providing this kind of pseudo-context is that it offers the viewer a framework to coordinate the data.Further,associating data with a visual map is another useful strategy for aiding memory.31Another type of context is to
222、order the data chronologically or by process,such as with funnels like the one below.31 For example,a standard strategy used in memory competitions is to associate specific items with rooms in a“memory palace”or some other geographic area.Joshua Foer(2012),Moonwalking with Einstein:The Art and Scien
223、ce of Remembering Everything,Penguin Books.110109Ordering by context can be effectively combined with small multiples,as was shown in Chapter 19,Decompose,and also in the coplot(condition plot)below.3232 From http:/ rows removed.Reordering the data so that key patterns appear as diagonal lines makes
224、 the visualization more accessible.Diagonalize22112111Consider the visualization below.How does Coca-Cola compare to the other brands?The visualization below shows the same data,but with color reduced,redundant encoding,and diagonalized,it is dramatically easier to see how Coca-Cola differs.Diagonal
225、ization,33 like banking to 45,is powerful and easy to do:rearrange the rows and columns so that something approximating a diagonal appears.33 Jacques Bertins(1967),Smiologie Graphique.Les diagrammes,les rseaux,les cartes,Translation 1983.Semiology of Graphics by William J.BergThe technique can also
226、be applied with tables and heatmaps,although in huge tables it can be useful to use algorithms to assist in the process.34 The heatmap below is for a correlation matrix.The pattern that appears is typically one of“steps”,where each step can be interpreted as a factor from factor analysis.The loading
227、s from a factor analysis(principal components analysis)are shown below.Here,diagonalization has been combined with the use of bars(redundant encoding)to clarify both the factor structure and the ambiguity of the structure for the fourth component.34 Ordering by the first eigenvalue from corresponden
228、ce analysis or factor analysis often provides a satisfactory solution.114113Visualizations can be simplified and usually improved by reducing the quantity of information displayed through aggregating,smoothing,or filtering.Simplify the data23The visualization above shows the results of 25 years of o
229、pinion polls about the Australian prime minister.The simpler visualization below has been de-cluttered and banked to 45,and the data has been smoothed.116115Although the heatmap below is simple to interpret,we must work to extract meaning from it.We need first to recognize the pattern and then try t
230、o deduce what the pattern means by looking at the numbers and the associated row and column labels.By contrast,the moon plot below is simpler to interpret because it shows much less information.Visualizations work best when they resemble familiar shapes and patterns.The more we can exaggerate the vi
231、sualizations to match those shapes,the better.Supernormalize24118117Supernormalization is:1.A meta-technique,subsuming all the other techniques.2.An invented word.3.Newish to be treated as a useful framework rather than a rock-solid theory.4.Defined as:creating visualizations that use shape and colo
232、r in ways that tap into our instinctive skills at interpreting visual stimuli.5.The last of the techniques presented in this book.117The Dutch Nobel laureate Niko Tinbergen noticed that when chicks hatch,they seemed to instinctively recognize their mother.He conducted a seemingly heartless experimen
233、t to understand this phenomenon.He took the mothers away and showed the chicks a cardboard cutout with a gulls face painted on.The chicks still pecked at it.120119Tinbergen then showed the chicks a two-beaked monster.They pecked at that,too.Tinbergens experiment tells us both that birds are born wit
234、h an instinctive understanding of“mother”(or perhaps“food”),and that this understanding is somewhat fuzzy.His excellent visualization of the results(which both reduces eye movement and attracts attention)illustrates a key finding:a red rod with three white marks was pecked at more than a plaster cas
235、t of a herring gulls head.Why?While evolution gifts chicks an instinctive understanding of their mother,the rod is both adequate and less detailed,and so easier to learn.This basic idea should not come as a surprise.Toys and cartoons are usually supernormal in much the same way.122121When viewing an
236、ything,if we can quickly get the gist of what we are looking at,we can also then quickly work out how best to explore and understand what we are looking at.35 Any experienced driver can sit in a new car and usually work out very quickly how to drive it,because everything follows a logic that is clea
237、r to them.They know where to look.By contrast,if you put a computer-illiterate person in front of a computer,they have no framework with which even to deduce where to look.When we see Michelangelos The Creation of Adam we“get”it.An art expert might provide a much richer explanation,but we can instin
238、ctively interpret much of its design.33 Colin Ware(2012),Information Visualization:Perception for Design,3rd Edition,Morgan Kaufmann,Kindle Edition.By contrast,our instincts tell us little about how to interpret Mondrians Composition with Gray and Light Brown.Our instincts tell us so little that we
239、cannot even work out which way is up.As discussed at the beginning of this book,a great visualization is one that can be quickly understood.Without casting aspersions on Mondrian,to be successful in data visualization we need to be more like Michelangelo.A good visualization is one that makes it eas
240、y for the viewer to extract meaning.This requires that it use shapes and colors in familiar ways.At a crude level we can achieve this by using standard visualizations,such as the line,bar,and pie charts people have been trained to view.However,more generally,we should create visualizations that cont
241、ain patterns and shapes,and use colors in ways that tap into the viewers instincts.If viewers can work out the gist of what they are looking at,they can much more quickly extract meaning.Techniques like diagonalization,redundant coding,ordering by context,and small multiples all work by leading to t
242、he creation of visualizations that are easier to interpret,because the shapes are recognizable and comprehensible.124123The line chart from Chapter 17,Small multiples,is reproduced above.What makes it such a poor visualization?A basic problem is that it does not look like anything familiar.At best i
243、t looks like some colorful broken spaghetti.We humans didnt evolve looking for patterns in spaghetti,so it is no surprise that we find the visualization unsatisfactory.By contrast,the shapes of the small multiples are more recognizable.We instinctively understand the concept of solid objects having
244、height,which makes the small multiple of are charts easily understandable.The profile of the tops and bottoms of each of the bars are familiar shapes that we know how to interpret.The lines that we see on tops and bottoms of these small multiples,which are banked to about 45,reflect the hills and mo
245、untains of the natural world.Our brains have evolved to understand such shapes,as it was necessary for our basic survival:Can we run up that hill when escaping the lion?Or is it too steep to climb?126125The traditional radar chart shown below looks like a witchs hat.Nothing in our evolution has prim
246、ed our brains for looking at patterns in witches hats.We have,however,evolved a good understanding of how to distinguish between shapes of objects.For example,without the ability to tell apart different shapes of leaves,we would have great difficultly in avoiding poisoning ourselves.So,when we const
247、ruct small multiples of radar charts,we can be confident that the viewer will be able to interpret them by tapping into their instinctive ability to tell shapes apart.Evolution has gifted us excellent spatial awareness.It is for this reason that the standard correspondence analysis biplots,such as t
248、he one shown below,cause so much difficulty.Our instincts tell us that things placed close together are associated in some way.It is obvious to somebody new to correspondence analysis that the plot below implies that Food and China are associated.It is precisely for this reason that the moon plot is
249、 superior to the biplot.The viewer can safely rely on proximity without needing to understand linear algebra and the concept of linear projection.128127The use of color and shape in the visualization below is unnatural.The color scale of gray to bright pink to white to green does not exist in nature
250、.With the right cues,we can interpret such a color scale with some ease(for example,see the heated density plot in Chapter 12),but the resulting visualization here is more Mondrian than Michelangelo.It takes effort to extract any meaning.The basic conception of this visualization is not the problem.
251、The colors are natural colors,which we can instinctively understand as they relate to important distinctions in nature such as those between flowers,leaves,and bark.And the basic idea of having different tiles organized in such a visualization(treemaps)is also often very effective.We have evolved to
252、 be excellent at understanding proportions and size.As children,nobody needs to explain to us when half a cake has been eaten,or that a cupcake is smaller than a cake.We get it instinctively.We instinctively understand the concept of proportionality,which is why we can interpret pie charts.The visua
253、lization below uses essentially the same design and coloring as the earlier one,but to much greater effect.Color is used to disambiguate large regions,much as occurs in nature.The degree of color is also much more consistent with how we perceive such intensity in nature(e.g.,to signify the depth of
254、water and the amount of rain in a cloud).130129Bill Gates loves this visualization(below)because“it shows that while the number of people dying from communicable diseases is still far too high,those numbers continue to come down.”Visualization writer Stephen Few hates it:“This is an important messag
255、e and a noble goal.But how well does the graph above tell this story?Not very well,actually.”36 He hates it so much he created his own version(below),the superiority of which he explains as follows:“By using bar graphs,weve made it easier to interpret and compare the data,so that its easy to focus o
256、n the stories contained in the data,rather than struggling to decode an inappropriate and ineffectively designed display.”The original and the revised visualization are trying to achieve very different things.If the viewer has a commitment to understanding the detail of causes of death,then the seco
257、nd visualization is the better one,because it represents the data more accurately.However,if the goal is to educate people,the original visualization is much better.By using color and shape as they appear in nature,the visualization engages the viewer and makes them much more likely to take the time
258、 to extract information.Dont take anyones word for this:Spend ten seconds looking at each and see what you learn.Theres a good chance you just get bored by the second one and move on.36 https:/ diagrams,such as the one above showing switching between different car brands,are fabulous to look at,but
259、what do they mean?We can instinctively work out that they show proportionality,much like a pie chart,and that Ford,Rover,and GM,are“big”.However,working out the meaning of all the“chords”of the chord diagram is harder.It looks like the Great Pit of Carkoon from Return of the Jedi.Fascinating yes,but
260、 not somewhere we expect to find a pattern.132131Charles Joseph Minards 1869 chart(now known as a Sankey diagram)is sometimes described as the greatest visualization of all time.However,it needs to be explained to be understood.We do not instinctively get it.What do you see?A tree branch?Not an obje
261、ct from which insight is commonly extracted.Minards visualization shows Napoleons ill-fated invasion of Russia in 1812.Once that has been explained it is appreciated more readily.The width of the line is proportional to the size of the Grande Arme and has been superimposed over its route,affording u
262、s a compelling and precise image of the armys decimation through attrition over time and space rather than actual war.William Playfairs Universal Commercial has a yet more ambitious goal:to summarize 3,500 years of the world economy.It breaks many rules governing the accurate representation of data
263、but nevertheless succeeds in communicating a vast amount of information.Each countrys history appears as a mountain with smoothed silhouette/profile,allowing us to discern the pattern easily.For example,we can see that the USAs economy halved during the Revolutionary War,but by 1804 it was stronger
264、than ever before.The diagonalization facilitates an easy comparison across the world,making clear just how dark the Dark Ages really were,and also directing our attention to many largely forgotten early European economies,such as the Hanseatic League and Flanders.134133Creating visualizations like t
265、hose of Minard and Playfair is rarely practical in real-world market research,but the same principles of story-telling apply.An ugly but magnificent visualization is the one below from LinkedIn,detailing industry performance during the Great Recession.37 The small multiples have been reordered to fo
266、rm a wave pattern,something our brains recognize instantly and can use to search for information.It is easy to work out from this visualization which industries declined and then grew because we understand the pattern instinctively,allowing us to look in the right places.37 https:/ visualizations ar
267、e ones which tap into our instincts.The viewer should not have to work to find patterns.The patterns should jump out at us and make it easy for us to draw conclusions.Each of the preceding chapters has illustrated various tools for improving visualizations,but the ultimate technique is the one descr
268、ibed in this chapter,of creating visualizations that are in line with the types of patterns we have evolved to see in nature.All of the computer-generated visualizations in this book can be created using either:Displayr(),or Q(www.q-)in conjunction with PowerPoint Software134136135STA N DA R D T ECH
269、NI QUESFO R M A T TI NGR ESHA PI NGSummaryThe goal when creating visualizations is to allow people quickly to discover and remember the key stories in data.We do this by creating supernormal shapes,using the techniques below.Tim Bock is the founder of Displayr .Tim is a data scientist,who has consul
270、ted,published academic papers,and won awards,for problems/techniques as diverse as neural networks,mixture models,data fusion,market segmentation,IPO pricing,small sample research,and data visualization.He has conducted data science projects for numerous companies,including Pfizer,Coca Cola,ACNielse
271、n,KFC,Weight Watchers,Unilever,and Nestle.He is also the founder of Q ,a data science product designed for survey research,which is used by all the worlds seven largest market research consultancies.He studied econometrics,maths,and marketing,and has a University Medal and PhD from the University of New South Wales(Australias leading research university),where he was an adjunct member of staff for 15 years.About the author