Datasets

Data within mlipx is always represented as a list of ASE atoms objects. There are various ways to provide this data to the workflow, depending on your requirements.

Local Data Files

The simplest way to use data in the workflow is by providing a local data file, such as a trajectory file.

(.venv) $ cp /path/to/data.xyz .
(.venv) $ dvc add data.xyz
Local data file (main.py)
import zntrack
import mlipx

DATAPATH = "data.xyz"

project = mlipx.Project()

with project.group("initialize"):
   data = mlipx.LoadDataFile(path=DATAPATH)

Remote Data Files

Since mlipx integrates with DVC, it can easily handle data from remote locations. You can manually import a remote file:

(.venv) $ dvc import-url https://url/to/your/data.xyz data.xyz

Alternatively, you can use the zntrack interface for automated management. This allows evaluation of datasets such as mptraj and supports filtering to select relevant configurations. For example, the following code selects all structures containing F and B atoms.

Importing online resources (main.py)
import zntrack
import mlipx

mptraj = zntrack.add(
   url="https://github.com/ACEsuit/mace-mp/releases/download/mace_mp_0b/mp_traj_combined.xyz",
   path="mptraj.xyz",
)

project = mlipx.Project()

with project:
   raw_data = mlipx.LoadDataFile(path=mptraj)
   data = mlipx.FilterAtoms(data=raw_data.frames, elements=["B", "F"])

Materials Project

You can also search and retrieve structures from the Materials Project.

Querying Materials Project (main.py)
import mlipx

project = mlipx.Project()

with project.group("initialize"):
   data = mlipx.MPRester(search_kwargs={"material_ids": ["mp-1143"]})

Note

To use the Materials Project, you need an API key. Set the environment variable MP_API_KEY to your API key.

Generating Data

Another approach is generating data dynamically. In mlipx, you can build molecules or simulation boxes from SMILES strings. For instance, the following code generates a simulation box containing 10 ethanol molecules:

Using SMILES (main.py)
import mlipx

project = mlipx.Project()

with project.group("initialize"):
   confs = mlipx.Smiles2Conformers(smiles="CCO", num_confs=10)
   data = mlipx.BuildBox(data=[confs.frames], counts=[10], density=789)

Note

The BuildBox node requires Packmol and rdkit2ase. If you do not need a simulation box, you can use confs.frames directly.