{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "YPX5n8G3GbPl",
   "metadata": {
    "id": "YPX5n8G3GbPl"
   },
   "source": [
    "# Lecture 6 - Intro to Visualization: When and Why; Visualization Aesthetics"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29cffd01",
   "metadata": {
    "id": "29cffd01"
   },
   "source": [
    "## Announcements\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4915ef5",
   "metadata": {
    "id": "e4915ef5"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdce7509",
   "metadata": {
    "id": "bdce7509"
   },
   "source": [
    "#### Goals\n",
    "* Understand the importance of visualization as a tool for understanding data.\n",
    "* Know some of the different settings in which visualization is used.\n",
    "* Understand some principles of how to make good visualizations\n",
    "    * Maximize data-ink ratio\n",
    "    * Minimize lie factor\n",
    "    * Minimize chartjunk\n",
    "    * Use scales and labeling well\n",
    "    * Use Color Well\n",
    "    * Use Repetition Well"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19a59567",
   "metadata": {
    "id": "19a59567"
   },
   "source": [
    "## Big Idea: Why visualize?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23291585",
   "metadata": {
    "id": "23291585"
   },
   "source": [
    "Consider **Anscombe's Quartet**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fc713cd-fc44-40a1-86ac-80b467cd64df",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "executionInfo": {
     "elapsed": 4397,
     "status": "ok",
     "timestamp": 1674497377647,
     "user": {
      "displayName": "Scott Wehrwein",
      "userId": "11327482518794216604"
     },
     "user_tz": 480
    },
    "id": "5f5e0f2d",
    "outputId": "48ae85c3-3546-48f5-fe69-9d049bd33b20"
   },
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "sns.set_theme(style=\"ticks\")\n",
    "\n",
    "# Load the example dataset for Anscombe's quartet\n",
    "df = sns.load_dataset(\"anscombe\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f5e0f2d",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "executionInfo": {
     "elapsed": 4397,
     "status": "ok",
     "timestamp": 1674497377647,
     "user": {
      "displayName": "Scott Wehrwein",
      "userId": "11327482518794216604"
     },
     "user_tz": 480
    },
    "id": "5f5e0f2d",
    "outputId": "48ae85c3-3546-48f5-fe69-9d049bd33b20"
   },
   "outputs": [],
   "source": [
    "# if you want to look at the raw data, this makes it into a nicer shape:\n",
    "# df[\"idx\"] = df.groupby(\"dataset\").cumcount()\n",
    "# df.pivot(index=\"idx\", columns=\"dataset\").swaplevel(0, 1, axis=1).sort_index(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06c623f9",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 302
    },
    "executionInfo": {
     "elapsed": 368,
     "status": "ok",
     "timestamp": 1674497426284,
     "user": {
      "displayName": "Scott Wehrwein",
      "userId": "11327482518794216604"
     },
     "user_tz": 480
    },
    "id": "06c623f9",
    "outputId": "7dbeef8f-20c4-48bd-917e-42955d7803b8"
   },
   "outputs": [],
   "source": [
    "df.groupby(\"dataset\").describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b622c1ec",
   "metadata": {
    "id": "b622c1ec"
   },
   "source": [
    "Hey, they're all the same! ...right? Let's confirm by visualizing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c940709",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 599
    },
    "executionInfo": {
     "elapsed": 3882,
     "status": "ok",
     "timestamp": 1674497484955,
     "user": {
      "displayName": "Scott Wehrwein",
      "userId": "11327482518794216604"
     },
     "user_tz": 480
    },
    "id": "8c940709",
    "outputId": "988a8bb7-dfcf-4501-d469-571b28ce6d5b"
   },
   "outputs": [],
   "source": [
    "# Show a scatter plot with a regression line for each dataset\n",
    "sns.lmplot(x=\"x\", y=\"y\", col=\"dataset\", hue=\"dataset\", data=df,\n",
    "           col_wrap=2, ci=None, palette=\"muted\", height=4,\n",
    "           scatter_kws={\"s\": 50, \"alpha\": 1})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04fa5a10",
   "metadata": {
    "id": "04fa5a10"
   },
   "source": [
    "Hmm, that didn't come out how I thought it would.\n",
    "\n",
    "**Takeaway:** visualization is often the best (and sometimes the only) way to understand a dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48741ff6",
   "metadata": {
    "id": "48741ff6"
   },
   "source": [
    "## When should you visualize?\n",
    "\n",
    "* When **exploring** data \n",
    "  * for me, this often looks like `df.plot.*`\n",
    "  * Goal: show you what's going on; answer questions for yourself.\n",
    "* When **presenting** data \n",
    "  * for me, this often looks like `sns.*(...)` along with a bunch of matplotlib code to fine-tune the appearance.\n",
    "  * Goal: show your reader what's going on; tell a story about the data, clearly and faithfully.\n",
    "* When providing **interactive** visualization tools for consumers of your data; examples:\n",
    "  * https://www.mountwashington.org/experience-the-weather/current-summit-conditions.aspx\n",
    "  * https://pudding.cool/projects/vocabulary/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1ed6eb2",
   "metadata": {
    "id": "d1ed6eb2"
   },
   "source": [
    "## What makes a good visualization?\n",
    "Two concerns:\n",
    "* Telling the truth\n",
    "* Telling it clearly and with style\n",
    "\n",
    "This is like asking what makes a good painting - it requires a sense of aesthetics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57dc70be-8261-4fdf-94a1-c2dd77fac1cd",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "Some principles to live by, based on the work of visualization pioneer Edward Tufte:\n",
    "#### Maximize data-ink ratio\n",
    "\n",
    "The **data-ink** ratio is the amount of \"ink\" used to represent data divided by the total amount of \"ink\" in the graphic:\n",
    "\n",
    "$$ \\frac{\\textrm{ink used to represent data}}{\\textrm{total ink in the graphic}}$$\n",
    "\n",
    "![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/di.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "640a224c-c27e-4e5b-bbd3-0f6e026bb682",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "#### Minimize lie factor\n",
    "The **lie factor** is the ratio between the size of the effect in your graphic and the size of the effect in the data:\n",
    "\n",
    "$$ \\frac{\\textrm{size of effect in the graphic}}{\\textrm{size of effect in the data}}$$\n",
    "\n",
    "![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/fuel_economy.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f13d09e-46a9-46f1-86f8-fb7e23dce8ba",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "#### Minimize chartjunk\n",
    "**Chartjunk** is loosely defined as extraneous visual elements that do not further the purpose of the graphic.\n",
    "\n",
    "![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/cj.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c4f67d7-dc58-46e1-a669-9b2724ca306b",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "#### Use scales and labeling well\n",
    "\n",
    "* Fill the available space with data (without increasing the lie factor)\n",
    "* Use clear labels"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaa03f39-ebc0-4e10-8aa5-c438fb3ada53",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "#### Use color and shading well\n",
    "\n",
    "* Colors can be used to differentiate categorical or numerical values.\n",
    "* For numerical/continuous, use perceptually uniform colormaps.\n",
    "* Avoid large areas of bright colors; small areas of sharp color contrast can be powerful visual elements.\n",
    "\n",
    "![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/color.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab9d7154",
   "metadata": {
    "id": "ab9d7154"
   },
   "source": [
    "#### Use repetition well\n",
    "\n",
    "* **Reuse** the cognitive effort your reader puts in to understand one plot\n",
    "* Small multiples - many small charts of the same thing, e.g., for different categories\n",
    "    * Example: `sns.pairplot`\n",
    "* Multiple time series on a single set of axes\n",
    "\n",
    "![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/sm.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "J49QD4HPwXC3",
   "metadata": {
    "id": "J49QD4HPwXC3"
   },
   "source": [
    "Activity: analyze a [plot](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_26s/lectures/L06/vis_examples.pdf)!\n",
    "\n",
    "Write:\n",
    "* Your plot number\n",
    "* The names of your group members\n",
    "* Analysis of the plot with respect to at least three of the above principles\n",
    "    * Maximize data-ink ratio\n",
    "    * Minimize lie factor\n",
    "    * Minimize chartjunk\n",
    "    * Use scales and labeling well\n",
    "    * Use Color Well\n",
    "    * Use Repetition Well\n",
    "* Be prepared to share the most pertinent principle with the class in 1 minute or less."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
