docs.sheetjs.com/docz/docs/03-demos/32-extensions/41-ghidra.md

9.7 KiB

title sidebar_label pagination_prev pagination_next sidebar_custom_props
Sheets in Ghidra Ghidra demos/cloud/index demos/bigdata/index
summary
Generate spreadsheets from Ghidra-generated bitfield tables

import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';

Ghidra is a software reverse engineering platform with a robust Java-based extension system.

SheetJS is a JavaScript library for reading and writing data from spreadsheets.

The Complete Demo uses SheetJS to export data from a Ghidra script. We'll create an extension that loads the V8 JavaScript engine through the Ghidra.js1 integration and uses the SheetJS library to export a bitfield table from Apple Numbers to a XLSX workbook.

:::note Tested Deployments

This demo was tested by SheetJS users in the following deployments:

Architecture Ghidra Date
darwin-arm 11.1.2 2024-10-13

:::

Integration Details

Ghidra natively supports scripts that are run in Java. JS extension scripts require a JavaScript engine with Java bindings.

Ghidra.js1 is a Ghidra integration for RhinoJS, GraalJS and V8. The current version uses the Javet V8 binding.

Loading SheetJS Scripts

The SheetJS NodeJS module can be loaded in Ghidra.js scripts using require:

const XLSX = require("xlsx");

:::caution pass

SheetJS NodeJS modules must be installed in a folder in the Ghidra script path!

:::

Bitfields and Sheets

Binary file formats commonly use bitfields to compactly store a set of Boolean (true or false) flags. For example, in the XLSB file format, the BrtRowHdr record2 encodes row properties. Bit offsets 91-96 are interpreted as flags marking if a row is hidden or if it is collapsed.

Assembly Implementation

Functions that parse bitfields typically test each bit sequentially:

            CASE_1c
41 0f ba e5 1c  BT         R13D,0x1c
73 69           JNC        CASE_1d

;; .... Do some work here (bit offset 28)

            CASE_1d
41 0f ba e5 1d  BT         R13D,0x1d
73 69           JNC        CASE_1e

;; .... Do some work here (bit offset 29)

:::note pass

The assembly is approximated by the following TypeScript snippet:

/* R13 is a 64-bit register */
declare let R13: BigInt;
/* NOTE: asm R13D is technically a live binding */
let R13D: number = Number(R13 & 0xFFFFFFFFn);

if((R13D >> 28) & 1) {
  // .... Do some work here (bit offset 28)
}

if((R13D >> 29) & 1) {
  // .... Do some work here (bit offset 29)
}

:::

Array of Objects

A bitmask or bit offset can be paired with a description in a JavaScript object.

For example, in the BrtRowHdr record, bit offset 92 indicates whether the row is hidden (if the bit is set) or visible (if the bit is not set). The offset and description can be stored as fields in an object:

const metadata_92 = { Offset: 92, Description: "Hidden flag" };

Each object can be stored in an array:

const metadata = [
  { Offset: 91, Description: "Collapsed flag" },
  { Offset: 92, Description: "Hidden flag" },
  // ...
];

This is an "Array of Objects". The SheetJS json_to_sheet method3 can generate a SheetJS worksheet object from the array:

const ws = XLSX.utils.json_to_sheet(metadata);

The SheetJS book_new method4 generates a SheetJS workbook object that can be written to the filesystem using the writeFile method5:

const wb = XLSX.utils.book_new(ws, "Offsets");
XLSX.utils.writeFile(wb, "SheetJSGhidra.xlsx");

Java Binding

Ghidra.js exposes a number of globals for interacting with Ghidra, including:

  • currentProgram: information about the loaded program.
  • JavaHelper: Java helper to load classes.

Ghidra.js automatically bridges instance methods to Java method calls. It also handles the plugin and file extension details.

Launching the Decompiler

ghidra.app.decompiler.DecompInterface is the primary Java interface to the decompiler. In Ghidra.js, JavaHelper.getClass will load the class.

Java

import ghidra.app.script.GhidraScript;
import ghidra.app.decompiler.DecompInterface;
import ghidra.program.model.listing.Program;

public class SheetZilla extends GhidraScript {
  @Override public void run() throws Exception {
    DecompInterface ifc = new DecompInterface();
    boolean success = ifc.openProgram(currentProgram);
    /* ... do work here ... */
  }
}

Ghidra.js

const DecompInterface = JavaHelper.getClass('ghidra.app.decompiler.DecompInterface');
const decompiler = new DecompInterface();
decompiler.openProgram(currentProgram);

Identifying a Function

The getGlobalSymbols method of a symbol table instance will return an array of symbols matching the given name:

/* name of function to find */
const fname = 'MyMethod';

/* find symbols matching the name */
// highlight-next-line
const fsymbs = currentProgram.getSymbolTable().getGlobalSymbols(fname);

/* get first result */
const fsymb = fsymbs[0];

The getFunctionAt method of a function manager instance will take an address and return a reference to a function:

/* get address */
const faddr = fsymb.getAddress();

/* find function */
// highlight-next-line
const fn = currentProgram.getFunctionManager().getFunctionAt(faddr);

Decompiling a Function

The decompileFunction method attempts to decompile the referenced function:

/* decompile function */
// highlight-next-line
const decomp = decompiler.decompileFunction(fn, 10000, null);

Once decompiled, it is possible to retrieve the decompiled C code:

/* get generated C code */
const src = decomp.getDecompiledFunction().getC();

Complete Demo

In this demo, we will inspect the _TSTCellToCellStorage method within the TSTables framework of Apple Numbers 14.2. This particular method handles serialization of cells to the NUMBERS file format.

The implementation has a number of blocks which look like the following script:

if(flags >> 0x0d & 1) {
  const field = "numberFormatID";
  const current_value = cell[field];
  // ... check if current_value is set, do other stuff
}

Based on the bit offset and the field name, we will generate the following row:

const mask = 1 << 0x0d; // = 8192 = 0x2000
const name = "number format ID";
const row = { Mask: "0x" + mask.toString(16), "Internal Name": name };

Rows will be generated for each block and the final dataset will be exported.

System Setup

  1. Install Ghidra, Xcode, and Apple Numbers.
Installation Notes (click to show)

On macOS, Ghidra was installed using Homebrew:

brew install --cask ghidra
  1. Add the base Ghidra folder to the PATH variable. The following shell command adds to the path for the current zsh or bash session:
export PATH="$PATH":$(dirname $(realpath `which ghidraRun`))
  1. Install ghidra.js globally:
npm install -g ghidra.js

:::note pass

If the install fails with a permissions issue, install with the root user:

sudo npm install -g ghidra.js

:::

Program Preparation

  1. Create a temporary folder to hold the Ghidra project:
mkdir -p /tmp/sheetjs-ghidra
  1. Copy the TSTables framework to the current directory:
cp /Applications/Numbers.app/Contents/Frameworks/TSTables.framework/Versions/Current/TSTables .
  1. Create a "thin" binary by extracting the x86_64 part of the framework:
lipo TSTables -thin x86_64 -output TSTables.macho

:::info pass

When this demo was last tested, the headless analyzer did not support Mach-O fat binaries. lipo creates a new binary with support for one architecture.

:::

  1. Analyze the program:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -import TSTables.macho

:::note pass

This process may take a while and print a number of Java stacktraces. The errors can be ignored.

:::

SheetJS Integration

  1. Download sheetjs-ghidra.js:
curl -LO https://docs.sheetjs.com/ghidra/sheetjs-ghidra.js
  1. Install the SheetJS NodeJS module:

{\ npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz}

  1. Run the script:
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -process TSTables.macho -noanalysis -scriptPath `pwd` -postScript sheetjs-ghidra.js
  1. Open the generated SheetJSGhidraTSTCell.xlsx spreadsheet.

  1. The project does not have a website. The source repository is publicly available. ↩︎

  2. BrtRowHdr is defined in the MS-XLSB specification ↩︎

  3. See json_to_sheet in "Utilities" ↩︎

  4. See book_new in "Utilities" ↩︎

  5. See writeFile in "Writing Files" ↩︎