forked from sheetjs/docs.sheetjs.com
357 lines
9.7 KiB
Markdown
357 lines
9.7 KiB
Markdown
|
---
|
||
|
title: Sheets in Ghidra
|
||
|
sidebar_label: Ghidra
|
||
|
pagination_prev: demos/cloud/index
|
||
|
pagination_next: demos/bigdata/index
|
||
|
sidebar_custom_props:
|
||
|
summary: Generate spreadsheets from Ghidra-generated bitfield tables
|
||
|
---
|
||
|
|
||
|
import current from '/version.js';
|
||
|
import CodeBlock from '@theme/CodeBlock';
|
||
|
|
||
|
[Ghidra](https://ghidra-sre.org/) is a software reverse engineering platform
|
||
|
with a robust Java-based extension system.
|
||
|
|
||
|
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
|
||
|
data from spreadsheets.
|
||
|
|
||
|
The [Complete Demo](#complete-demo) uses SheetJS to export data from a Ghidra
|
||
|
script. We'll create an extension that loads the [V8](/docs/demos/engines/v8)
|
||
|
JavaScript engine through the Ghidra.js[^1] integration and uses the SheetJS
|
||
|
library to export a bitfield table from Apple Numbers to a XLSX workbook.
|
||
|
|
||
|
:::note Tested Deployments
|
||
|
|
||
|
This demo was tested by SheetJS users in the following deployments:
|
||
|
|
||
|
| Architecture | Ghidra | Date |
|
||
|
|:-------------|:---------|:-----------|
|
||
|
| `darwin-arm` | `11.1.2` | 2024-10-13 |
|
||
|
|
||
|
:::
|
||
|
|
||
|
## Integration Details
|
||
|
|
||
|
Ghidra natively supports scripts that are run in Java. JS extension scripts
|
||
|
require a [JavaScript engine](/docs/demos/engines/) with Java bindings.
|
||
|
|
||
|
Ghidra.js[^1] is a Ghidra integration for [RhinoJS](/docs/demos/engines/rhino),
|
||
|
[GraalJS](/docs/demos/engines/graaljs) and [V8](/docs/demos/engines/v8#java).
|
||
|
The current version uses [the Javet V8 binding](https://www.caoccao.com/Javet).
|
||
|
|
||
|
### Loading SheetJS Scripts
|
||
|
|
||
|
The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be
|
||
|
loaded in Ghidra.js scripts using `require`:
|
||
|
|
||
|
```js title="Loading SheetJS scripts in Ghidra.js"
|
||
|
const XLSX = require("xlsx");
|
||
|
```
|
||
|
|
||
|
:::caution pass
|
||
|
|
||
|
SheetJS NodeJS modules must be installed in a folder in the Ghidra script path!
|
||
|
|
||
|
:::
|
||
|
|
||
|
### Bitfields and Sheets
|
||
|
|
||
|
Binary file formats commonly use bitfields to compactly store a set of Boolean
|
||
|
(true or false) flags. For example, in the XLSB file format, the `BrtRowHdr`
|
||
|
record[^2] encodes [row properties](/docs/csf/features/rowprops). Bit offsets
|
||
|
91-96 are interpreted as flags marking if a row is hidden or if it is collapsed.
|
||
|
|
||
|
#### Assembly Implementation
|
||
|
|
||
|
Functions that parse bitfields typically test each bit sequentially:
|
||
|
|
||
|
```nasm title="x86_64 sample assembly with mnemonics"
|
||
|
CASE_1c
|
||
|
41 0f ba e5 1c BT R13D,0x1c
|
||
|
73 69 JNC CASE_1d
|
||
|
|
||
|
;; .... Do some work here (bit offset 28)
|
||
|
|
||
|
CASE_1d
|
||
|
41 0f ba e5 1d BT R13D,0x1d
|
||
|
73 69 JNC CASE_1e
|
||
|
|
||
|
;; .... Do some work here (bit offset 29)
|
||
|
```
|
||
|
|
||
|
:::note pass
|
||
|
|
||
|
The assembly is approximated by the following TypeScript snippet:
|
||
|
|
||
|
```typescript title="Approximate TypeScript"
|
||
|
/* R13 is a 64-bit register */
|
||
|
declare let R13: BigInt;
|
||
|
/* NOTE: asm R13D is technically a live binding */
|
||
|
let R13D: number = Number(R13 & 0xFFFFFFFFn);
|
||
|
|
||
|
if((R13D >> 28) & 1) {
|
||
|
// .... Do some work here (bit offset 28)
|
||
|
}
|
||
|
|
||
|
if((R13D >> 29) & 1) {
|
||
|
// .... Do some work here (bit offset 29)
|
||
|
}
|
||
|
```
|
||
|
|
||
|
:::
|
||
|
|
||
|
#### Array of Objects
|
||
|
|
||
|
A bitmask or bit offset can be paired with a description in a JavaScript object.
|
||
|
|
||
|
For example, in the `BrtRowHdr` record, bit offset 92 indicates whether the row
|
||
|
is hidden (if the bit is set) or visible (if the bit is not set). The offset and
|
||
|
description can be stored as fields in an object:
|
||
|
|
||
|
```js title="Sample metadata for BrtRowHdr offset 92"
|
||
|
const metadata_92 = { Offset: 92, Description: "Hidden flag" };
|
||
|
```
|
||
|
|
||
|
Each object can be stored in an array:
|
||
|
|
||
|
```js title="Array of sample metadata for BrtRowHdr"
|
||
|
const metadata = [
|
||
|
{ Offset: 91, Description: "Collapsed flag" },
|
||
|
{ Offset: 92, Description: "Hidden flag" },
|
||
|
// ...
|
||
|
];
|
||
|
```
|
||
|
|
||
|
This is an ["Array of Objects"](/docs/api/utilities/array#arrays-of-objects).
|
||
|
The SheetJS `json_to_sheet` method[^3] can generate a SheetJS worksheet object
|
||
|
from the array:
|
||
|
|
||
|
```js title="Generating a worksheet from the metadata"
|
||
|
const ws = XLSX.utils.json_to_sheet(metadata);
|
||
|
```
|
||
|
|
||
|
The SheetJS `book_new` method[^4] generates a SheetJS workbook object that can
|
||
|
be written to the filesystem using the `writeFile` method[^5]:
|
||
|
|
||
|
```js title="Exporting the worksheet to file"
|
||
|
const wb = XLSX.utils.book_new(ws, "Offsets");
|
||
|
XLSX.utils.writeFile(wb, "SheetJSGhidra.xlsx");
|
||
|
```
|
||
|
|
||
|
### Java Binding
|
||
|
|
||
|
Ghidra.js exposes a number of globals for interacting with Ghidra, including:
|
||
|
|
||
|
- `currentProgram`: information about the loaded program.
|
||
|
- `JavaHelper`: Java helper to load classes.
|
||
|
|
||
|
Ghidra.js automatically bridges instance methods to Java method calls. It also
|
||
|
handles the plugin and file extension details.
|
||
|
|
||
|
#### Launching the Decompiler
|
||
|
|
||
|
`ghidra.app.decompiler.DecompInterface` is the primary Java interface to the
|
||
|
decompiler. In Ghidra.js, `JavaHelper.getClass` will load the class.
|
||
|
|
||
|
_Java_
|
||
|
|
||
|
```java title="Launch decompiler process in Java (snippet)"
|
||
|
import ghidra.app.script.GhidraScript;
|
||
|
import ghidra.app.decompiler.DecompInterface;
|
||
|
import ghidra.program.model.listing.Program;
|
||
|
|
||
|
public class SheetZilla extends GhidraScript {
|
||
|
@Override public void run() throws Exception {
|
||
|
DecompInterface ifc = new DecompInterface();
|
||
|
boolean success = ifc.openProgram(currentProgram);
|
||
|
/* ... do work here ... */
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
_Ghidra.js_
|
||
|
|
||
|
```js title="Launch decompiler process in Ghidra.js"
|
||
|
const DecompInterface = JavaHelper.getClass('ghidra.app.decompiler.DecompInterface');
|
||
|
const decompiler = new DecompInterface();
|
||
|
decompiler.openProgram(currentProgram);
|
||
|
```
|
||
|
|
||
|
#### Identifying a Function
|
||
|
|
||
|
The `getGlobalSymbols` method of a symbol table instance will return an array of
|
||
|
symbols matching the given name:
|
||
|
|
||
|
```js
|
||
|
/* name of function to find */
|
||
|
const fname = 'MyMethod';
|
||
|
|
||
|
/* find symbols matching the name */
|
||
|
// highlight-next-line
|
||
|
const fsymbs = currentProgram.getSymbolTable().getGlobalSymbols(fname);
|
||
|
|
||
|
/* get first result */
|
||
|
const fsymb = fsymbs[0];
|
||
|
```
|
||
|
|
||
|
The `getFunctionAt` method of a function manager instance will take an address
|
||
|
and return a reference to a function:
|
||
|
|
||
|
```js
|
||
|
/* get address */
|
||
|
const faddr = fsymb.getAddress();
|
||
|
|
||
|
/* find function */
|
||
|
// highlight-next-line
|
||
|
const fn = currentProgram.getFunctionManager().getFunctionAt(faddr);
|
||
|
```
|
||
|
|
||
|
#### Decompiling a Function
|
||
|
|
||
|
The `decompileFunction` method attempts to decompile the referenced function:
|
||
|
|
||
|
```js
|
||
|
/* decompile function */
|
||
|
// highlight-next-line
|
||
|
const decomp = decompiler.decompileFunction(fn, 10000, null);
|
||
|
```
|
||
|
|
||
|
Once decompiled, it is possible to retrieve the decompiled C code:
|
||
|
|
||
|
```js
|
||
|
/* get generated C code */
|
||
|
const src = decomp.getDecompiledFunction().getC();
|
||
|
```
|
||
|
|
||
|
## Complete Demo
|
||
|
|
||
|
In this demo, we will inspect the `_TSTCellToCellStorage` method within the
|
||
|
`TSTables` framework of Apple Numbers 14.2. This particular method handles
|
||
|
serialization of cells to the NUMBERS file format.
|
||
|
|
||
|
The implementation has a number of blocks which look like the following script:
|
||
|
|
||
|
```js
|
||
|
if(flags >> 0x0d & 1) {
|
||
|
const field = "numberFormatID";
|
||
|
const current_value = cell[field];
|
||
|
// ... check if current_value is set, do other stuff
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Based on the bit offset and the field name, we will generate the following row:
|
||
|
|
||
|
```js
|
||
|
const mask = 1 << 0x0d; // = 8192 = 0x2000
|
||
|
const name = "number format ID";
|
||
|
const row = { Mask: "0x" + mask.toString(16), "Internal Name": name };
|
||
|
```
|
||
|
|
||
|
Rows will be generated for each block and the final dataset will be exported.
|
||
|
|
||
|
### System Setup
|
||
|
|
||
|
0) Install Ghidra, Xcode, and Apple Numbers.
|
||
|
|
||
|
<details>
|
||
|
<summary><b>Installation Notes</b> (click to show)</summary>
|
||
|
|
||
|
On macOS, Ghidra was installed using Homebrew:
|
||
|
|
||
|
```bash
|
||
|
brew install --cask ghidra
|
||
|
```
|
||
|
|
||
|
</details>
|
||
|
|
||
|
1) Add the base Ghidra folder to the PATH variable. The following shell command
|
||
|
adds to the path for the current `zsh` or `bash` session:
|
||
|
|
||
|
```bash
|
||
|
export PATH="$PATH":$(dirname $(realpath `which ghidraRun`))
|
||
|
```
|
||
|
|
||
|
2) Install `ghidra.js` globally:
|
||
|
|
||
|
```bash
|
||
|
npm install -g ghidra.js
|
||
|
```
|
||
|
|
||
|
:::note pass
|
||
|
|
||
|
If the install fails with a permissions issue, install with the root user:
|
||
|
|
||
|
```bash
|
||
|
sudo npm install -g ghidra.js
|
||
|
```
|
||
|
|
||
|
:::
|
||
|
|
||
|
### Program Preparation
|
||
|
|
||
|
3) Create a temporary folder to hold the Ghidra project:
|
||
|
|
||
|
```bash
|
||
|
mkdir -p /tmp/sheetjs-ghidra
|
||
|
```
|
||
|
|
||
|
4) Copy the `TSTables` framework to the current directory:
|
||
|
|
||
|
```bash
|
||
|
cp /Applications/Numbers.app/Contents/Frameworks/TSTables.framework/Versions/Current/TSTables .
|
||
|
```
|
||
|
|
||
|
5) Create a "thin" binary by extracting the `x86_64` part of the framework:
|
||
|
|
||
|
```bash
|
||
|
lipo TSTables -thin x86_64 -output TSTables.macho
|
||
|
```
|
||
|
|
||
|
:::info pass
|
||
|
|
||
|
When this demo was last tested, the headless analyzer did not support Mach-O fat
|
||
|
binaries. `lipo` creates a new binary with support for one architecture.
|
||
|
|
||
|
:::
|
||
|
|
||
|
6) Analyze the program:
|
||
|
|
||
|
```bash
|
||
|
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -import TSTables.macho
|
||
|
```
|
||
|
|
||
|
:::note pass
|
||
|
|
||
|
This process may take a while and print a number of Java stacktraces. The errors
|
||
|
can be ignored.
|
||
|
|
||
|
:::
|
||
|
|
||
|
### SheetJS Integration
|
||
|
|
||
|
7) Download [`sheetjs-ghidra.js`](pathname:///ghidra/sheetjs-ghidra.js):
|
||
|
|
||
|
```bash
|
||
|
curl -LO https://docs.sheetjs.com/ghidra/sheetjs-ghidra.js
|
||
|
```
|
||
|
|
||
|
8) Install the [SheetJS NodeJS module](/docs/getting-started/installation/nodejs):
|
||
|
|
||
|
<CodeBlock language="bash">{`\
|
||
|
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz`}
|
||
|
</CodeBlock>
|
||
|
|
||
|
9) Run the script:
|
||
|
|
||
|
```bash
|
||
|
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -process TSTables.macho -noanalysis -scriptPath `pwd` -postScript sheetjs-ghidra.js
|
||
|
```
|
||
|
|
||
|
10) Open the generated `SheetJSGhidraTSTCell.xlsx` spreadsheet.
|
||
|
|
||
|
[^1]: The project does not have a website. The [source repository](https://github.com/vaguue/Ghidra.js) is publicly available.
|
||
|
[^2]: `BrtRowHdr` is defined in the [`MS-XLSB` specification](/docs/miscellany/references)
|
||
|
[^3]: See [`json_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-objects-input)
|
||
|
[^4]: See [`book_new` in "Utilities"](/docs/api/utilities/wb)
|
||
|
[^5]: See [`writeFile` in "Writing Files"](/docs/api/write-options)
|