¶Data Extraction
An Adonis ML license is required to use this feature.
In the adnml.scripts.houdini.data_extraction module, a Python function is provided to extract data for training AdonisML models from a processed Adonis simulation setup.
Data extraction gathers the input and output data used by the AdonisML training workflow. The extracted data is used to train ML models that can support AdnMLDeformer workflows and, when muscle paths are provided, AdnSmartTissue workflows.
The data extraction script allows users to:
- Extract joint transform input data from a KineFX joint source.
- Extract mesh displacement output data from a rest skin and a simulated skin.
- Optionally extract muscle activation data from Adonis muscle solvers.
- Record data over the full playback range or over specific frame windows.
- Skip frames to reduce redundant poses in the extracted dataset.
- Recook frames for stabilization before recording data.
- Export the resulting dataset files to a target directory.
The extraction script is the Python API equivalent of the workflow exposed through the AdnMLDataExtraction TOP HDA. In most production workflows, the data should first be prepared with the AdnMLDataProcessing HDA, and then extracted either through the TOP HDA or by calling this script directly.
The extraction process writes the generated dataset into the target folder specified by save_directory_path. The main exported files are inputs.csv, outputs.csv, joints.json, and extraction_config.json.
¶Requirements
Before running data extraction, an AdnMLDataProcessing HDA must be configured in the Houdini scene. We recommend appending Null nodes to the processing outputs and naming them OUT_SKIN and OUT_JOINTS. The sim_skin_path argument should point to OUT_SKIN, and the joints_path argument should point to OUT_JOINTS.
The underlying AdnMLDataProcessing HDA output plugs are named ADN_OUT_SKIN and ADN_OUT_ML_JOINTS, but data extraction should reference the final SOP nodes in the network, such as the recommended OUT_SKIN and OUT_JOINTS Null nodes.
If the scene has not been prepared with an AdnMLDataProcessing HDA, the extraction will fail because the script expects the simulation skin and joint source to contain the data generated by the processing step. Advanced users may automate this preparation themselves with a custom script, but the resulting SOP outputs must still provide equivalent processed skin and ML joint data.
To run data extraction, the following arguments must be provided:
rest_skin_path: Path to the rest skin SOP node.sim_skin_path: Path to the processed simulated skin SOP node.joints_path: Path to the processed ML joint source SOP node.joint_names_list: List of Houdini/KineFX joint names to extract.save_directory_path: Target folder where the extracted dataset will be saved.
The rest skin and simulated skin must have the same topology. They must have matching point counts and matching point order so the extraction process can compute displacement data correctly.
The joint source geometry must contain valid KineFX joint transform data, and every requested joint name must exist in the joint source geometry.
¶Extract Data
To extract ML training data from Python, run this command in Houdini:
from adnml.scripts.houdini import data_extraction
data_extraction.extract(
rest_skin_path="/obj/geo1/REST_SKIN",
sim_skin_path="/obj/geo1/OUT_SKIN",
joints_path="/obj/geo1/OUT_JOINTS",
joint_names_list=["root", "spine", "neck", "head"],
save_directory_path="path/to/export/folder"
)
The extraction process writes the dataset files to the folder specified by save_directory_path.
¶Extract Data With Optional Settings
Additional extraction settings can be provided to control frame sampling, stabilization, muscle activation extraction, and overwrite behavior.
from adnml.scripts.houdini import data_extraction
data_extraction.extract(
rest_skin_path="/obj/geo1/REST_SKIN",
sim_skin_path="/obj/geo1/OUT_SKIN",
joints_path="/obj/geo1/OUT_JOINTS",
joint_names_list=["root", "spine", "neck", "head"],
save_directory_path="path/to/export/folder",
muscles_paths="/obj/geo1/OUT_MUSCLES",
skip_frames=2,
stabilization_frames=5,
frame_windows=[[1, 120], [220, 360]],
force_overwrite=False
)
The optional arguments are:
muscles_paths: Muscle SOP paths used to extract muscle activation data. This can be a single string, a list of strings, or a UI-style string separated by spaces, commas, tabs, or newlines. The recommended setup is to point to a merge node that combines all ADN_OUT_ Null nodes coming from the muscle nodes. This is supported and keeps the extraction input centralized. The paths can also point to individual AdnMuscle SOPs, muscle geometry SOPs, or ADN_OUT_ nodes.skip_frames: Number of frames to skip between recorded poses. This helps reduce redundant pose data and extract more diverse samples.stabilization_frames: Number of times to recook a frame before recording displacement data. This parameter damps the motion inertia in the recorded poses.frame_windows: List of frame windows to record. If empty, the entire playback range is recorded.force_overwrite: If enabled, existing dataset files in the target folder can be overwritten.
¶Frame Windows
By default, the extraction process uses the current Houdini playback range.
To extract only specific parts of the animation, provide the frame_windows argument in the extraction call.
Each entry defines a frame range using the format [start_frame, end_frame].
from adnml.scripts.houdini import data_extraction
data_extraction.extract(
rest_skin_path="/obj/geo1/REST_SKIN",
sim_skin_path="/obj/geo1/OUT_SKIN",
joints_path="/obj/geo1/OUT_JOINTS",
joint_names_list=["root", "spine", "neck", "head"],
save_directory_path="path/to/export/folder",
frame_windows=[[1, 120], [220, 360], [500, 720]]
)
Frame windows are useful when only specific animation ranges should contribute to the training dataset.
¶Skip Frames
The skip_frames argument controls how many frames are skipped between recorded poses.
This can be used to reduce redundant pose data and extract more diverse samples.
Lower values are recommended for fast animations, while higher values can be used for slower animations. Typical suggested values for normal animation speeds are between 2 and 5.
The skip frames will be computed from the start of each frame window, this ensures that the starting frame of each window is always recorded in the dataset.
¶Stabilization Frames
The stabilization_frames argument controls how many times each recorded frame should be recooked before displacement data is computed and written.
This is used to stabilize the simulation dynamics before recording data. Higher values make each of the recorded poses lose more dynamics and converge toward a static silhouette. Well stabilized data is required for good ML deformation training.
Typical suggested values for normal animation speeds are between 5 and 10. Faster animations may require more stabilization frames.
Increasing stabilization_frames will increase the total extraction time.
¶Muscle Activation Data
To extract muscle activation data, provide the muscles_paths argument.
from adnml.scripts.houdini import data_extraction
data_extraction.extract(
rest_skin_path="/obj/geo1/REST_SKIN",
sim_skin_path="/obj/geo1/OUT_SKIN",
joints_path="/obj/geo1/OUT_JOINTS",
joint_names_list=["root", "spine", "neck", "head"],
save_directory_path="path/to/export/folder",
muscles_paths="/obj/geo1/OUT_MUSCLES"
)
The recommended setup is to connect all ADN_OUT_ Null nodes coming from the muscle nodes into a merge node, then use that merge node as the muscles_paths input. This keeps the extraction setup centralized and makes it easier to provide all muscle outputs consistently.
The muscles_paths argument can be:
- A single SOP path pointing to a merge node containing all muscle ADN_OUT_ Null nodes.
- A single SOP path pointing to one muscle output.
- A list of SOP paths.
- A string containing multiple paths separated by spaces, commas, tabs, or newlines.
The paths can point to AdnMuscle SOPs, muscle geometry SOPs, or ADN_OUT_ nodes.
When muscle paths are provided, the extracted data will support training the ML model on material properties prediction for AdnSmartTissue. If this input is not provided, the extracted data will support training models for AdnMLDeformer only.
¶Force Overwrite
By default, the extraction process will not overwrite existing dataset files.
If inputs.csv or outputs.csv already exist in the target export folder, the extraction will raise an error unless force_overwrite is enabled.
from adnml.scripts.houdini import data_extraction
data_extraction.extract(
rest_skin_path="/obj/geo1/REST_SKIN",
sim_skin_path="/obj/geo1/OUT_SKIN",
joints_path="/obj/geo1/OUT_JOINTS",
joint_names_list=["root", "spine", "neck", "head"],
save_directory_path="path/to/export/folder",
force_overwrite=True
)
Enabling
force_overwritewill replace any existing extraction data in the target folder, use it with caution.
¶Exported Files
After a successful extraction, the following files are written to the target export folder:
inputs.csv: Input data containing the selected joint transforms.outputs.csv: Output data containing the extracted mesh displacement data and optional data required for AdnSmartTissue material properties prediction.joints.json: Joint hierarchy data exported in the same order used byinputs.csv.extraction_config.json: Configuration data describing the extraction settings used for the exported dataset.
The exported joint data uses Houdini/KineFX joint names internally. The joints.json file stores full hierarchy paths built from the KineFX primitive hierarchy, in the same order used for inputs.csv.
The sim_skin_path and joints_path should come from the processed outputs of an AdnMLDataProcessing HDA. We recommend referencing the final SOP Null nodes named OUT_SKIN and OUT_JOINTS.
¶Validation
The extraction script validates the provided data before extraction starts.
The script checks that:
- The rest skin path is valid.
- The simulation skin path is valid.
- The joint source path is valid.
- The save directory path is valid.
- The rest skin and simulation skin are not the same object.
- The rest skin and simulation skin have matching topology.
- The rest skin and simulation skin contain point positions.
- The joint source geometry contains valid KineFX joint transform data.
- Every requested joint name exists in the joint source geometry.
- The skip frame value is a valid integer greater than or equal to
0. - The stabilization frame value is a valid integer greater than or equal to
0. - The frame windows are valid.
- Existing dataset files are not overwritten unless
force_overwriteis enabled.
¶Errors
The extract function may raise the following errors:
ValueError: Raised when an input path, frame setting, geometry, or joint list is invalid.FileExistsError: Raised when dataset files already exist andforce_overwriteis disabled.