Data Support

Orbitron’s io/pipelines crate provides a unified loader (load_scene, load_trajectory, load_frequency_data) that automatically selects the correct parser. The table summarises formats tested in production:

Format Extensions Highlights
XYZ .xyz Multi-frame trajectories, streamed via canonical helpers, attaches final-frame structure plus optional positions attachment
Gaussian LOG/FCHK/CUBE .log, .fchk, .cube, .gjf, .com Optimization trajectories, vibrational modes, orbital/NBO metadata, stage summaries + boundary loaders, canonical bundles capture MO and vibrational attachments
NWChem OUT/NW + companions .out, .nw, .movecs, .hess, .cube Multi-task support (OPT/FREQ/SP), byte-ranged loaders, canonical extras summarise tasks and attach MO coefficients/trajectory shards. Sources tab also loads .movecs (Fortran-binary MO coefficients) for full nbf × nmo coverage and .hess (Cartesian Hessian) for follow-up frequency synthesis when the .out doesn’t already carry the freq block. NWChem .out extracts the basis-set definition into a GaussianBasisSet, so .out + .movecs together render molecular orbitals as 3D isosurfaces in the Surfaces tab.
Molpro OUT (+ XML) .out, optional XML sidecars Task summaries (SCF/OPT/FREQ/CASPT2/CI/MRCC/MP2), correlated-energy capture, canonical extras seeded from XML manifests
Molcas/OpenMolcas OUT + companions .out, .molden Canonical documents expose module tasks (extras.molcas.tasks), optimisation energy profiles, frequency mode counts, RASSCF/CASPT2 diagnostics. Sources tab loads .molden siblings (SCF / RASSCF / Guess / MP2 flavors) and renders MOs as 3D isosurfaces in the Surfaces tab — the MOLDEN file ships geometry, basis, and coefficients in one file so no extra companion is needed. Sibling matching auto-detects Molcas-style task suffixes (acrolein.scf.molden matches acrolein.out).
DIRAC .out (+ HDF5 checkpoint) Relativistic QC output — geometry and run/energy summary; MO coefficients load from the companion HDF5 checkpoint via Analysis → Sources for orbital isosurfaces. Content-detected (no fixed extension).
Quantum ESPRESSO .out, .in, .xml, .xsf, .dos, .pdos_*, .bands, .UPF Periodic structures from .out text output, .in input decks, structured data-file-schema.xml (eigenvalues, occupations, fermi level), Cube-formatted .xsf densities (pp.x output_format=6) and proper XSF, DOS/PDOS/bands plot summaries under extras.qe, relax energy profiles, inline band samples
VASP vasprun.xml, POSCAR, CONTCAR, OUTCAR, INCAR, KPOINTS, POTCAR, DOSCAR, EIGENVAL, PROCAR, CHGCAR/CHG/PARCHG, XDATCAR, DYNMAT, ACF.dat Open any canonical filename (no extension required); auto-switches to Analysis → Sources showing the rest of the directory. Periodic geometry + DOS/band/forces/magmom/Bader. CHGCAR exposes total + spin-density isosurfaces. XDATCAR loads as multi-frame trajectory. POSCAR ↔︎ CONTCAR Compare overlay shows relaxation. Sources retired the old .zip/.tar archive-import flow.
CIF .cif Symmetry + periodic unit cells with provenance tags
PDB .pdb Biomolecular chains/residues and inferred bonds
SDF/MOL .sdf, .mol Bond orders, connection tables, canonical structure + raw-source attachments
Standalone NBO summaries .nbo + optional .47 (FILE47) Natural population tables, optional FILE47 geometry/basis metadata for NBO7, canonical documents capture population extras and raw logs
Volumetric CUBE files .cube Single grids with metadata + volumetric attachments
Volumetric directories directory of .cube files Ensures all cubes share a geometry, emits per-grid attachments and dataset metadata

Loaders apply provenance tags, infer covalent bonds (SceneBuilder::infer_covalent_bonds), and attach task metadata. Remote loading uses the same pipeline over SSH/SFTP (see §6).

Interested in adding a new format? See the checklist in the Developer Guide (§5.1 Adding a New Chemistry Format) for a turnkey parser skeleton, naming conventions, and test scaffolding.

2.1 Quantum ESPRESSO support

Orbitron’s QE ingestion handles the full file zoo a typical QE project carries:

  • SCF / relax / nscf outputs (*.out): periodic structures + unit cells, SCF energetics, relax profiles, and task summaries (extras.qe).
  • Input decks (*.in): parses &CONTROL, &SYSTEM, ATOMIC_SPECIES, ATOMIC_POSITIONS, and CELL_PARAMETERS sections; derives the lattice from ibrav (0–14, including centred orthorhombic / monoclinic / triclinic variants) + celldm(1..6), or from explicit CELL_PARAMETERS for ibrav=0. Atom positions handled in alat / bohr / angstrom / crystal units. Open a .in directly to view the structure before running the calculation.
  • Structured XML output (<prefix>.xml / <prefix>.save/data-file-schema.xml): parsed via roxmltree. Surfaces atomic_structure, per-k-point eigenvalues + occupations, Fermi level, total energy, convergence status, exit status, and spin / SOC flags. More reliable than scraping .out because the schema is version-stable.
  • XSF volumetric (*.xsf): handles both proper XSF (CRYSTAL / MOLECULE / ATOMS) and QE’s “Cube-as-xsf” flavour (pp.x output_format=6 writes Gaussian Cube content with an .xsf extension). Cube-formatted XSF routes through the existing cube parser so pp.x densities, individual orbitals, and transition densities flow through the orbital-dataset registry alongside native .cube files.
  • DOS / PDOS / bands (*.dos, *.pdos_tot, *.pdos_atm#*, *.bands, *.bands.gnu): parsed into canonical-document extras.qe blocks with metadata for atom / WFC / orbital labels. The Sources panel surfaces each file as its own role.

Current limitations:

  • A dedicated QE Spectra panel that plots DOS / PDOS / band structure (mirroring the VASP equivalent) is future work — the parsers and Sources Load are in place but the plotting UI is not yet wired.
  • .UPF pseudopotentials are surfaced informationally in Sources but not parsed (no viewer feature consumes them today).
  • atomic_proj.xml (projwfc.x output) and the <prefix>.save/ binary checkpoint hierarchy are not parsed.
  • ibrav values are derived to a primitive cell; the -12 / ±13 monoclinic centred variants are treated as primitive with a warning that the centring isn’t expanded.

2.2 NBO FILE47 support

Orbitron can enrich standalone .nbo summaries with optional FILE47 (.47) sidecars. When a .47 file is present next to the .nbo log, Orbitron uses it to reconstruct the NBO7 basis/geometry payload so orbital and population views align with the original analysis. If the .47 file is missing or malformed, Orbitron still loads the .nbo summary and continues with population-only data.

2.3 Streaming defaults

Orbitron now prefers streaming parsers whenever possible. Gaussian and NWChem log readers walk the file once to build a run summary, then seek straight to the requested stage/task when you load individual geometries, trajectories, or frequency sets. CLI/TUI/GUI all call the same boundary helpers:

  • gaussian_stage_scene_by_boundary / gaussian_stage_trajectory_by_boundary
  • gaussian_stage_frequency_by_boundary
  • nwchem_task_scene_by_boundary / nwchem_task_trajectory_by_boundary

If you script against the Rust API, use these helpers to avoid re-reading the full file.

Note: The orbitron_services::streaming trait is still a roadmap item; current services load whole trajectories before playback. Background loaders keep the UI responsive, but true frame-by-frame streaming is pending future work.

2.4 Parser Reliability Improvements (2026)

Orbitron’s parsing infrastructure underwent significant improvements in early 2026, establishing a comprehensive set of shared parsing utilities that enhance reliability and consistency across all supported formats:

Key improvements:

  • Consistent error handling: All parsers now use shared utilities (parsing_utils.rs) that handle edge cases uniformly—scientific notation (including Fortran D-format like 1.23D+05), malformed data, whitespace variations, and boundary conditions are handled consistently across Gaussian, NWChem, Molpro, Molcas, VASP, QE, and structural formats.

  • Battle-tested reliability: Parsing utilities are verified by 116 unit tests covering common parsing patterns and edge cases. This ensures that format detection, tokenization, float parsing, unit conversions, and coordinate extraction work reliably even with unusual or malformed input files.

  • Better diagnostics: When parsing fails, error messages are more actionable and consistent. Failed float parsing, missing delimiters, and malformed coordinates return structured errors instead of panicking.

  • Zero-copy performance: Many operations now use string slices (&str) instead of allocations, improving performance on large files (multi-GB logs, long trajectories, extensive VASP outputs) without sacrificing safety or error handling.

What this means for users:

  • Format detection is more accurate and handles files with unusual formatting or missing headers
  • Parsing is more forgiving of whitespace variations and non-standard numeric formats
  • Error messages clearly indicate what went wrong during parsing (e.g., “expected float after ‘=’ delimiter on line 42” instead of generic parse failures)
  • Large files load faster due to reduced memory allocations during tokenization

These improvements affect all file formats supported by Orbitron, making the entire I/O pipeline more robust and maintainable. For technical details about the parsing utilities and their adoption, see the Developer Guide (§6.4 Parsing Utilities Reference).

2.5 Per-atom population coverage

The 2026 redesign of the Analysis panel (§3.6) made per-atom charges first-class — every supported format now surfaces them through the same UI (the “Charges” tab + the Atom Coloring halo overlay). Coverage by format:

Format Mulliken Löwdin Natural (NBO) APT
NWChem (.out) ✓ (modern + property-module headers)
Gaussian (.log) ✓ (when pop=full / NBO) ✓ (frequency jobs)
Gaussian (.fchk)
Molpro (.out)
Molcas / OpenMolcas (.out) ✓ (column-oriented blocks)

For optimization trajectories (NWChem, Gaussian), Orbitron backfills earlier frames with the converged populations so the Charges tab and halos stay visible regardless of which step you have parked. Without this, navigating to step 0 of an opt job used to drop the charges entirely.

For property-module runs (e.g. NWChem task scf property with the mulliken keyword) the parser returns the last Mulliken table emitted in the file — the converged value, not the preliminary atomic-guess Mulliken that some packages print early in the run.