Aimia from R's perspective

Analytics Off-site Meeting - April 2015

Ali Arsalan Kazmi
Insights Executive

What is this presentation about?

Tools

Tools

Tools determine

  • What can be done
  • How can it be done
  • By When can it be done

Tools - Considerations

  • Over-used
  • Under-used
  • Incorrect use


Are we over-utilising, under-utilising, or incorrectly using tool(s)?

R

A Data Analyst's workflow

A Data Analyst's workflow

Reproducibility...

Reproducibility - Why is it Important?

  • Quality Checking
  • Utilisable when Disseminating knowledge
  • Dynamic/Reactive Documents

Reproducibility

Reproducibility

  • Differing intra/inter-sheet Organisation
  • Difficult to Replicate
  • Difficult to check Quality
  • Difficult for a new-comer to follow

Reproducibility - The R way

  • Command history available

Reproducibility - The R way

  • Command history available
    • Enables documentation of all steps

Reproducibility - The R way

  • Command history available
    • Enables documentation of all steps
    • Quality Checks can be included in script

Reproducibility - The R way

  • Command history available
    • Enables documentation of all steps
    • Quality Checks can be included in script
    • Comments to elaborate on the logic of script

Reproducibility - The R way

  • KnitR and RMarkdown

Reproducibility - The R way

  • KnitR and RMarkdown
    • A single document containing analyses, script, and results

Reproducibility - The R way

  • KnitR and RMarkdown
    • A single document containing analyses, script, and results
    • Preserves contextual narrative for any analyses

Reproducibility - The R way

  • KnitR and RMarkdown
    • A single document containing analyses, script, and results
    • Preserves contextual narrative for any analyses
    • Document adjusts to any changes in data/script automatically

Reproducibility - The R way

  • KnitR and RMarkdown
    • A single document containing analyses, script, and results
    • Preserves contextual narrative for any analyses
    • Document adjusts to any changes in data/script automatically
    • Output: HTML, PDF, Tufte-style and LateX style documents, Word

Reproducibility - The R way

Visualisation...

Visualisation

  • Typically: Excel
  • Tableau

Visualisation

In Excel...

  • Inflexible
  • Difficult to Automate
  • Time-consuming and Inefficient
  • Basic Graphs are not Ideal

In R...

  • Very Flexible
  • Easily Automated
  • Efficient with large data sets
  • Charting package - ggplot2 - is based on Grammar of Graphics
  • Greater Charting capability
  • Interactivity

Visualisation

In Excel...

In R...

Visualisation

  • Dashboard Demo

Thank you