Today MagToday Mag
  • Home
  • World
  • Politics
  • Bussines
  • Tech
  • Science
  • Health
  • Sport
  • Style
  • More
    • Food
    • Travel
    • Real Estate
    • Crypto
    • Jobs
What's Hot

Aging & Style: National Lyme Disease Awareness Month – KCTV 5

May 15, 2022

Food bank demand is on the rise as bills soar

May 15, 2022

Man Utd news: Ten Hag snubs Ajax end-of-season trip as ‘there is a lot to do’ at United | Football | Sport

May 15, 2022
Facebook Twitter Instagram
  • World
  • Bussines
  • Health
  • Real Estate
  • Style
  • Travel
  • Sport
Facebook Twitter Instagram
Today Mag Today Mag
  • Home
  • World
  • Politics
  • Bussines
  • Tech
  • Science
  • Health
  • Sport
  • Style
  • More
    • Food
    • Travel
    • Real Estate
    • Crypto
    • Jobs
Contact Us
Today MagToday Mag
Home»Science»What is Datasaurus Dozen and its relevancy in data science
Science

What is Datasaurus Dozen and its relevancy in data science

TodayMagBy TodayMagFebruary 6, 2022No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Summary statistics are useful because they condense a huge number of observations into a single figure that is simple to understand and share. This property explains why averages and correlations are so widely used, from introductory statistics courses to newspaper stories to scholarly papers. The caveat is that they are frequently insufficient to describe the entire picture, as exemplified by the “datasaurus dozen,” a collection of datasets.

There’s a reason data scientists spend so much time using visualisations to explore data. It’s risky to rely solely on data summaries like means, variances, and correlations because vastly diverse data sets can produce identical conclusions. This is a notion that has been proven in statistics lectures for decades with Anscombe’s Quartet: four scatterplots with the same mean and variance and the same correlation between them, although being qualitatively distinct. (You can validate this in R by using data (Anscombe) to load the data.) What you might not realise is that bivariate data with a given mean, median, and correlation can be generated in any shape you want – even a dinosaur.

What is Datasaurus Dozen? 

Alberto Cairo created the initial datasaurus as a toy example to highlight the necessity of charting data. There are only two variables in the dataset (x and y), and the summary statistics aren’t particularly noteworthy.

Justin Matejka and George Fitzmaurice, in their research paper “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing”, analyse 13 datasets (the Datasaurus and 12 others) that all contain the same summary statistics (x/y mean, x/y standard deviation, and Pearson’s correlation) to two decimal places, yet their appearances are vastly different. The paper is significant as it explains the method that data scientists used to produce a particular dataset, as well as others like it.

Also Read: A Deep Dive into PyeCharts, A Python Tool For Data Visualization

Methodology 

The important idea underlying the authors’ method is that, while it’s difficult to create a dataset with specific statistical features from the start, it’s extremely simple to take an existing dataset, tweak it a little, and keep those statistical properties. The researchers do this by picking a random point, shifting it a little, and then confirming that the set’s statistical attributes haven’t wandered outside of acceptable bounds (in this particular case, we are ensuring that the means, standard deviations, and correlations remain the same to two decimal places.)

A completely distinct dataset emerges when this tiny “perturbation” process is repeated enough times. However, as previously said, these datasets must be visually unique and visibly different in order to be useful tools for emphasising the necessity of displaying your data. This is accomplished by skewing the random point movements toward a specific shape.

Source: https://www.autodesk.com/research/publications/same-stats-different-graphs 

How was the datasaurus generated? 

Researchers designed 12 shapes to direct the dots towards creating the Datasaurus Dozen. Each of the subsequent charts, and indeed all of the intermediate frames, have the same summary statistics as the original Datasaurus. Of course, the strategy isn’t restricted to a particular format; any grouping of line segments might be used as a target. From this, the researchers can observe how the data points morph from one shape to another as it is iterated through the datasets consecutively while preserving the same summary statistical values to two decimal places throughout the process.



Source link

Data Datasaurus Dozen relevancy science
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
TodayMag
  • Website

Related Posts

Why the Large Hadron Collider matters

May 15, 2022

Science Fell in Love’s Ayame Is Determined to Take Things Slow

May 9, 2022

Sharda University professor suspended over question in Political Science exam

May 7, 2022

The best way to flirt, according to science

May 5, 2022
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

‘Sheer Tenacity’: Taiwan Skier Falls on Slope, Picks Self Up | World News

February 13, 2022

Brexit LIVE: Hannan picks out shadowy civil servants pulling rug from under Boris’ EU plan | Politics | News

January 23, 2022

What to wear this weekend: FEMAIL picks out the best ‘it’ dresses for effortless style this season

December 23, 2021

Good Weekend’s style picks of the week

December 17, 2021
Latest Posts

Subscribe to Updates

Get the latest sports news from SportsSite about soccer, football and tennis.

Your source for the serious news. This demo is crafted specifically to exhibit the use of the theme as a news site. Visit our main page for more demos.

We're social. Connect with us:

Facebook Twitter Instagram Pinterest YouTube
Categories
  • World
  • Bussines
  • Health
  • Real Estate
  • Style
  • Travel
  • Sport
Useful Links
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Services
  • Sitemap

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

© 2022 TodayMag
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Services
  • Sitemap

Type above and press Enter to search. Press Esc to cancel.