--- title: Summary Statistics sidebar_label: Summary Statistics pagination_prev: demos/index pagination_next: demos/frontend/index --- import current from '/version.js'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import CodeBlock from '@theme/CodeBlock'; export const bs = ({borderStyle:"none", background:"none", textAlign:"left" }); Summary statistics help people quickly understand datasets and make informed decisions. Many interesting datasets are stored in spreadsheet files. [SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing data from spreadsheets. This demo uses SheetJS to process data in spreadsheets. We'll explore how to extract spreadsheet data and how to compute simple summary statistics. This demo will focus on two general data representations: - ["Arrays of Objects"](#arrays-of-objects) simplifies processing by translating from the SheetJS data model to a more idiomatic data structure. - ["Dense Worksheets"](#dense-worksheets) directly analyzes SheetJS worksheets. :::tip pass The [Import Tutorial](/docs/getting-started/examples/import) is a guided example of extracting data from a workbook. It is strongly recommended to review the tutorial first. ::: :::note Tested Deployments This browser demo was tested in the following environments: | Browser | Date | |:------------|:-----------| | Chrome 119 | 2024-01-06 | ::: ## Data Representations Many worksheets include one header row followed by a number of data rows. Each row is an "observation" and each column is a "variable". :::info pass The "Array of Objects" explanations use more idiomatic JavaScript patterns. It is suitable for smaller datasets. The "Dense Worksheets" approach is more performant, but the code patterns are reminiscent of C. The low-level approach is only encouraged when the traditional patterns are prohibitively slow. ::: ### Arrays of Objects The idiomatic JavaScript representation of the dataset is an array of objects. Variable names are typically taken from the first row. Those names are used as keys in each observation.
Spreadsheet | JS Data |
---|---|
![`pres.xlsx` data](pathname:///pres.png) | ```js [ { Name: "Bill Clinton", Index: 42 }, { Name: "GeorgeW Bush", Index: 43 }, { Name: "Barack Obama", Index: 44 }, { Name: "Donald Trump", Index: 45 }, { Name: "Joseph Biden", Index: 46 } ] ``` |
{sw} |
{sw} |
$M[x;m+1]$ | $= \frac{1}{m+1}\sum_{i=1}^{m+1} x_i$ |
$= \frac{1}{m+1}\sum_{i=1}^{m} x_i + \frac{x_{m+1}}{m+1}$ | |
$= \frac{m}{m+1}(\frac{1}{m}\sum_{i=1}^{m} x_i) + \frac{x_{m+1}}{m+1}$ | |
$= \frac{m}{m+1}M[x;m] + \frac{x_{m+1}}{m+1}$ | |
$= (1 - \frac{1}{m+1})M[x;m] + \frac{x_{m+1}}{m+1}$ | |
$= M[x;m] + \frac{x_{m+1}}{m+1} - \frac{1}{m+1}M[x;m]$ | |
$= M[x;m] + \frac{1}{m+1}(x_{m+1}-M[x;m])$ | |
$new\_mean$ | $= old\_mean + (x_{m+1}-old\_mean) / (m+1)$ |