python
Copy if not all(isinstance(label, int) for label in train_labels): # Convert string labels to integers label_to_id = {label: idx for idx, label in enumerate(unique_labels)} id_to_label = {idx: label for label, idx in label_to_id.items()} train_labels = [label_to_id[label] for label in train_labels] val_labels = [label_to_id[label] for label in val_labels] test_labels = [label_to_id[label] for label in test_labels] # Save label mappings for later use label_mappings = {“label_to_id”: label_to_id, “id_to_label”: id_to_label} with open(os.path.join(args.output_dir, “label_mappings.json”), “w”) as f: json.dump(label_mappings, f) logging.info(“Converted string labels to integers and saved mappings.”) # Create model model = XnapASI( echo_model_name=echo_model_name, fusion_method=fusion_method, num_labels=num_labels ) model.dropout.p = dropout # Set dropout rate tokenizer = model.echo_tokenizer # Create datasets and dataloaders train_dataset = XnapASIDataset( train_texts, train_labels, tokenizer, augment=args.augment, aug_prob=args.aug_prob ) val_dataset = XnapASIDataset(val_texts, val_labels, tokenizer) test_dataset = XnapASIDataset(test_texts, test_labels, tokenizer) train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) val_dataloader = DataLoader(val_dataset, batch_size=batch_size) test_dataloader = DataLoader(test_dataset, batch_size=batch_size) # Train the model logging.info(“Starting training…”) trained_model = train_xnap_asi( model, train_dataloader, val_dataloader, epochs=args.epochs, lr=learning_rate, device=args.device, output_dir=args.output_dir, use_amp=args.use_amp, save_interval=args.save_interval, max_grad_norm=args.max_grad_norm ) # Evaluate the model on the test set logging.info(“Evaluating on test set…”) test_results = evaluate_xnap_asi( trained_model, test_dataloader, device=args.device, output_dir=args.output_dir ) # Save final results results_path = os.path.join(args.output_dir, “final_results.json”) with open(results_path, “w”) as f: json.dump(test_results, f) logging.info(f”Saved final results to {results_path}”) logging.info(“Training and evaluation complete.”)
Explanation of the main
Function
- Label Handling:
- If labels are not integers (e.g., strings), they are converted to integers using a mapping (
label_to_id
andid_to_label
). - The mappings are saved to a JSON file for later use (e.g., during inference).
- If labels are not integers (e.g., strings), they are converted to integers using a mapping (
- Model Initialization:
- The
XnapASI
model is initialized with the specifiedecho_model_name
,fusion_method
, andnum_labels
. - The dropout rate is set based on the hyperparameters.
- The
- Dataset and DataLoader Creation:
- The
XnapASIDataset
class is used to create datasets for training, validation, and testing. - Data augmentation is applied to the training set if enabled.
- The
- Training:
- The
train_xnap_asi
function is called to train the model. It handles the training loop, validation, and early stopping.
- The
- Evaluation:
- The trained model is evaluated on the test set using the
evaluate_xnap_asi
function. - Test results (accuracy, precision, recall, F1 score) are saved to a JSON file.
- The trained model is evaluated on the test set using the
- Logging and Output:
- All outputs (models, logs, results) are saved to the specified
output_dir
.
- All outputs (models, logs, results) are saved to the specified
Argument Parsing
To make the script user-friendly, we can add argument parsing using argparse
:
python
Copydef parse_args(): parser = argparse.ArgumentParser(description=”Train and evaluate the XnapASI model.”) # Data arguments parser.add_argument(“–data_path”, type=str, required=True, help=”Path to the dataset file.”) parser.add_argument(“–test_size”, type=float, default=0.2, help=”Proportion of data for testing.”) parser.add_argument(“–val_size”, type=float, default=0.1, help=”Proportion of data for validation.”) # Model arguments parser.add_argument(“–echo_model_name”, type=str, default=”gpt2″, help=”Name of the transformer model to use.”) parser.add_argument(“–fusion_method”, type=str, default=”weighted”, choices=[“weighted”, “attention”, “gating”, “concat”], help=”Method to fuse representations.”) parser.add_argument(“–dropout”, type=float, default=0.1, help=”Dropout rate.”) # Training arguments parser.add_argument(“–epochs”, type=int, default=10, help=”Number of training epochs.”) parser.add_argument(“–batch_size”, type=int, default=16, help=”Batch size for training and evaluation.”) parser.add_argument(“–learning_rate”, type=float, default=2e-5, help=”Learning rate.”) parser.add_argument(“–max_grad_norm”, type=float, default=1.0, help=”Maximum gradient norm for clipping.”) parser.add_argument(“–use_amp”, action=”store_true”, help=”Use automatic mixed precision (AMP) for training.”) parser.add_argument(“–augment”, action=”store_true”, help=”Enable data augmentation.”) parser.add_argument(“–aug_prob”, type=float, default=0.5, help=”Probability of applying augmentation.”) # Optimization arguments parser.add_argument(“–optimize”, action=”store_true”, help=”Enable hyperparameter optimization with Optuna.”) parser.add_argument(“–optuna_trials”, type=int, default=20, help=”Number of Optuna trials for hyperparameter optimization.”) # Output arguments parser.add_argument(“–output_dir”, type=str, default=”./output”, help=”Directory to save outputs.”) parser.add_argument(“–log_dir”, type=str, default=”./logs”, help=”Directory to save logs.”) # Miscellaneous arguments parser.add_argument(“–seed”, type=int, default=42, help=”Random seed for reproducibility.”) parser.add_argument(“–device”, type=str, default=None, help=”Device to use (e.g., ‘cuda’, ‘cpu’).”) parser.add_argument(“–save_interval”, type=int, default=1, help=”Save model every n epochs.”) return parser.parse_args()
Running the Script
To run the script, use the following command:
bash
Copypython xnap_asi.py –data_path /path/to/data.csv –output_dir ./output –optimize
Conclusion
This implementation provides a complete pipeline for training, evaluating, and optimizing the XnapASI
model. It includes features like data augmentation, hyperparameter optimization, and mixed precision training. Let me know if you need further assistance!