API documentation#

Random Policy and Algorithm#

class ray_prover.random_algorithm.PatchedRandomPolicy(*args, **kwargs)#

RandomPolicy from Ray examples misses a couple of methods.

Compute actions for the current policy.

Parameters:

obs_batch – Batch of observations.
state_batches – List of RNN state input batches, if any.
prev_action_batch – Batch of previous action values.
prev_reward_batch – Batch of previous rewards.
kwargs – Forward compatibility placeholder

Returns:

actions: Batch of output actions, with shape like: [BATCH_SIZE, ACTION_SHAPE].
state_outs (List[TensorType]): List of RNN state output: batches, if any, each with shape [BATCH_SIZE, STATE_SIZE].
info (List[dict]): Dictionary of extra feature batches, if any,: with shape like {“f1”: [BATCH_SIZE, …], “f2”: [BATCH_SIZE, …]}.

learn_on_loaded_batch(offset: int = 0, buffer_index: int = 0) → dict#

Don’t learn anything and return empty results.

Returns:: empty dictionary (no metrics computed)

load_batch_into_buffer(batch: SampleBatch, buffer_index: int = 0) → int#

Don’t load anything anywhere.

Returns:: always zero (no samples loaded)

class ray_prover.random_algorithm.RandomAlgorithm(config: AlgorithmConfig | None = None, env=None, logger_creator: Callable[[], Logger] | None = None, **kwargs)#

Algorithm taking random actions and not learning anything.

classmethod get_default_policy_class(config: AlgorithmConfig) → RandomPolicy#

We created PatchedRandomPolicy exactly for this algorithm.

Returns:: patched random policy

Training Helper#

class ray_prover.training_helper.TrainingHelper(test_run: bool = False, storage_path: str | None = None)#

Training helper abstract base class.

abstract env_creator(env_config: Dict[str, Any]) → Env#

Return a configured environment.

Parameters:: env_config – an environment config

property env_id: str#

Environment label.

Returns:: environment name set previously
Raises:: ValueError – if not set previously

abstract get_algorithm_config() → AlgorithmConfig#: Return an algorithm config.

parse_args(arguments_to_parse: List[str] | None = None) → None#

Parse command line arguments.

Parameters:: arguments_to_parse – command line arguments (or explicitly set ones)

train_algorithm(arguments_to_parse: List[str] | None = None) → None#

Train a reinforcement learning algorithm.

Parameters:: arguments_to_parse – command line arguments (or explicitly set ones)

PPO example#

class ray_prover.ppo_prover.PPOProver(test_run: bool = False, storage_path: str | None = None)#

PPO-based prover experiments helper.

>>> storage_path = getfixture("tmp_path")
>>> from gym_saturation.constants import MOCK_TPTP_PROBLEM
>>> test_arguments = ["--prover", "Vampire", "--max_clauses", "1",
...     "--problem_filename", MOCK_TPTP_PROBLEM]
>>> PPOProver(True, storage_path).train_algorithm(
...     test_arguments + ["--random_baseline"])
╭─...
...
╭──────────────────────────────────────────╮
│ Training result                          │
├──────────────────────────────────────────┤
│ episodes_total                         2 │
│ num_env_steps_sampled                  2 │
│ num_env_steps_trained                  2 │
│ sampler_results/episode_len_mean       1 │
│ sampler_results/episode_reward_mean    0 │
╰──────────────────────────────────────────╯
...
>>> PPOProver(True, storage_path).train_algorithm(test_arguments)
╭─...
...
╭──────────────────────────────────────────╮
│ Training result                          │
├──────────────────────────────────────────┤
│ episodes_total                         2 │
│ num_env_steps_sampled                  2 │
│ num_env_steps_trained                  0 │
│ sampler_results/episode_len_mean       1 │
│ sampler_results/episode_reward_mean    0 │
╰──────────────────────────────────────────╯
...

Parameters:

arguments_to_parse – command line arguments (or explicitly set ones)
test_run – we use light parameters for testing

env_creator(env_config: Dict[str, Any]) → Env#

Return a prover with AST2Vec state representation.

Parameters:: env_config – an environment config
Returns:: an environment

get_algorithm_config() → AlgorithmConfig#

Return an algorithm config.

Returns:: algorithm config

Curriculum learning function#

ray_prover.curriculum.curriculum_fn(next_task: str, train_results: Dict[str, Any], task_settable_env: SaturationEnv, env_ctx: EnvContext) → str#

Organise two-task curriculum.

Parameters:

next_task – next task to take if the current one is solved
train_results – training metrics
task_settable_env – the environment
env_ctx – left for compatibility

Returns:

a current task of next task

Custom stopper#

class ray_prover.custom_stopper.CustomStopper(last_task: str)#

Custom stopper for curriculum learning.

stop_all() → bool#

Define for compatibility.

Returns:: always False for compatibility