docs.sheetjs.com/docz/docs/07-csf/07-features/index.md
2023-05-18 05:21:08 -04:00

10 KiB

Spreadsheet Features

import DocCardList from '@theme/DocCardList'; import {useCurrentSidebarCategory} from '@docusaurus/theme-common';

Even for basic features like date storage, the official Excel formats store the same content in different ways. The parsers are expected to convert from the underlying file format representation to the Common Spreadsheet Format. Writers are expected to serialize SheetJS workbooks in the underlying file format.

The following topics are covered in sub-pages:

    {useCurrentSidebarCategory().items.map((item, index) => { const cP = item.customProps; const listyle = (cP?.icon) ? { listStyleImage: `url("${cP.icon}")` } : {}; return (
  • {item.label}{cP?.summary && (" - " + cP.summary)}
  • ); })}

Row and Column Properties

Format Support (click to show)

Row Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS

Column Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM

Row and Column properties are not extracted by default when reading from a file and are not persisted by default when writing to a file. The option cellStyles: true must be passed to the relevant read or write function.

Column Properties

The !cols array in each worksheet, if present, is a collection of ColInfo objects which have the following properties:

type ColInfo = {
  /* visibility */
  hidden?: boolean; // if true, the column is hidden

  /* column width is specified in one of the following ways: */
  wpx?:    number;  // width in screen pixels
  width?:  number;  // width in Excel "Max Digit Width", width*256 is integral
  wch?:    number;  // width in characters

  /* other fields for preserving features from files */
  level?:  number;  // 0-indexed outline / group level
  MDW?:    number;  // Excel "Max Digit Width" unit, always integral
};

Row Properties

The !rows array in each worksheet, if present, is a collection of RowInfo objects which have the following properties:

type RowInfo = {
  /* visibility */
  hidden?: boolean; // if true, the row is hidden

  /* row height is specified in one of the following ways: */
  hpx?:    number;  // height in screen pixels
  hpt?:    number;  // height in points

  level?:  number;  // 0-indexed outline / group level
};

Outline / Group Levels Convention

The Excel UI displays the base outline level as 1 and the max level as 8. Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base outline level is 0 and the max level is 7.

Why are there three width types? (click to show)

There are three different width types corresponding to the three different ways spreadsheets store column widths:

SYLK and other plain text formats use raw character count. Contemporaneous tools like Visicalc and Multiplan were character based. Since the characters had the same width, it sufficed to store a count. This tradition was continued into the BIFF formats.

SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel count throughout the file. Column widths, row heights, and other measures use pixels. When the pixel and character counts do not align, Excel rounds values.

XLSX internally stores column widths in a nebulous "Max Digit Width" form. The Max Digit Width is the width of the largest digit when rendered (generally the "0" character is the widest). The internal width must be an integer multiple of the width divided by 256. ECMA-376 describes a formula for converting between pixels and the internal width. This represents a hybrid approach.

Read functions attempt to populate all three properties. Write functions will try to cycle specified values to the desired type. In order to avoid potential conflicts, manipulation should delete the other properties first. For example, when changing the pixel width, delete the wch and width properties.

Implementation details (click to show)

Row Heights

Excel internally stores row heights in points. The default resolution is 72 DPI or 96 PPI, so the pixel and point size should agree. For different resolutions they may not agree, so the library separates the concepts.

Even though all of the information is made available, writers are expected to follow the priority order:

  1. use hpx pixel height if available
  2. use hpt point height if available

Column Widths

Given the constraints, it is possible to determine the MDW without actually inspecting the font! The parsers guess the pixel width by converting from width to pixels and back, repeating for all possible MDW and selecting the value that minimizes the error. XLML actually stores the pixel width, so the guess works in the opposite direction.

Even though all of the information is made available, writers are expected to follow the priority order:

  1. use width field if available
  2. use wpx pixel width if available
  3. use wch character count if available

Sheet Visibility

Format Support (click to show)

Hidden Sheets: XLSX/M, XLSB, BIFF8/BIFF5 XLS, XLML

Very Hidden Sheets: XLSX/M, XLSB, BIFF8/BIFF5 XLS, XLML

Excel enables hiding sheets in the lower tab bar. The sheet data is stored in the file but the UI does not readily make it available. Standard hidden sheets are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which cannot be revealed in the menu. It is only accessible in the VB Editor!

The visibility setting is stored in the Hidden property of sheet props array.

Value Definition VB Editor "Visible" Property
0 Visible -1 - xlSheetVisible
1 Hidden 0 - xlSheetHidden
2 Very Hidden 2 - xlSheetVeryHidden

If the respective Sheet entry does not exist or if the Hidden property is not set, the worksheet is visible.

List all worksheets and their visibility settings

wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
// [ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]

Check if worksheet is visible

Non-Excel formats do not support the Very Hidden state. The best way to test if a sheet is visible is to check if the Hidden property is logical truth:

wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
// [ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
Live Example (click to show)

This test file has three sheets:

  • "Visible" is visible
  • "Hidden" is hidden
  • "VeryHidden" is very hidden

Screenshot

Live demo

function Visibility(props) {
  const [sheets, setSheets] = React.useState([]);
  const names = [ "Visible", "Hidden", "Very Hidden" ];

  React.useEffect(async() => {
    const f = await fetch("/files/sheet_visibility.xlsx");
    const ab = await f.arrayBuffer();
    const wb = XLSX.read(ab);

    /* State will be set to the `Sheets` property array */
    setSheets(wb.Workbook.Sheets);
  }, []);

  return (<table>
    <thead><tr><th>Name</th><th>Value</th><th>Hidden</th></tr></thead>
    <tbody>{sheets.map((x,i) => (<tr key={i}>

      <td>{x.name}</td>

      <td>{x.Hidden} - {names[x.Hidden]}</td>

      <td>{!x.Hidden ? "No" : "Yes"}</td>

    </tr>))}</tbody></table>);
}

VBA and Macros

Format Support (click to show)

VBA Modules: XLSM, XLSB, BIFF8 XLS

VBA Macros are stored in a special data blob that is exposed in the vbaraw property of the workbook object when the bookVBA option is true. They are supported in XLSM, XLSB, and BIFF8 XLS formats. The supported format writers automatically insert the data blobs if it is present in the workbook and associate with the worksheet names.

The vbaraw property stores raw bytes. SheetJS Pro offers a special component for extracting macro text from the VBA blob, editing the VBA project, and exporting new VBA blobs.

Round-tripping Macro Enabled Files

In order to preserve macro when reading and writing files, the bookVBA option must be set to true when reading and when writing. In addition, the output file format must support macros. XLSX notably does not support macros, and XLSM should be used in its place:

/* Reading data */
var wb = XLSX.read(data, { bookVBA: true }); // read file and distill VBA blob
var vbablob = wb.vbaraw;

Code Names

Excel will use ThisWorkbook (or a translation like DieseArbeitsmappe) as the default Code Name for the workbook. Each worksheet will be identified using the default Sheet# naming pattern even if the worksheet names have changed.

A custom workbook code name will be stored in wb.Workbook.WBProps.CodeName. For exports, assigning the property will override the default value.

Worksheet and Chartsheet code names are in the worksheet properties object at wb.Workbook.Sheets[i].CodeName. Macrosheets and Dialogsheets are ignored.

The readers and writers preserve the code names, but they have to be manually set when adding a VBA blob to a different workbook.

Macrosheets

Older versions of Excel also supported a non-VBA "macrosheet" sheet type that stored automation commands. These are exposed in objects with the !type property set to "macro".

Under the hood, Excel treats Macrosheets as normal worksheets with special interpretation of the function expressions.

Detecting Macros in Workbooks

The vbaraw field will only be set if macros are present. Macrosheets will be explicitly flagged. Combining the two checks yields a simple function:

function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
  if(!!wb.vbaraw) return true;
  const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
  return sheets.some((ws) => !!ws && ws['!type']=='macro');
}