2. Interactive Execution ========================= You can also run Finding eML interactively using a Python shell. Example run ----------- This command starts the Python 3 interactive shell in your terminal. It allows you to run Python code line by line interactively, which is useful for testing or exploring functions in the package. .. code-block:: python python3 Once inside Python: a. *Import Required Modules and Set Working Directory* The script first imports necessary Python modules and sets the working directory to ensure correct package access. .. code-block:: python import os import sys import eML.classify as eML b. *Define Command-Line Arguments* The classifier requires specific input arguments such as batch information, file paths, patient ID, and model settings. .. code-block:: python sys.argv = ['eML', '--batch', 'patient_ID', '--adata_file', '/mnt/adata_query.h5ad', '--proteins_file', '/app/src/data/proteins_to_check.txt', '--patient', 'Malmberg', '--adversarial_classifier', 'False', '--output_dir', '/mnt/output', '--ref_model', '/app/src/data/models/totalvi_vae_reference_model_withclassifiers', '--ref_adata', '/app/src/data/models/totalvi_vae_reference_model_withclassifiers.h5ad', '--classifier_type', 'BBC'] c. * Required Files* Ensures that all necessary files are available before proceeding. .. code-block:: python args = eML.parse_arguments() os.makedirs(args.output_dir, exist_ok=True) d. *Save Arguments for Reproducibility* Saves the arguments used for classification to a text file for reference. .. code-block:: python args_file = os.path.join(args.output_dir, f'{args.patient}_arguments_used.txt') with open(args_file, "w") as f: for key, value in vars(args).items(): f.write(f"{key}: {value}\n") e. *Load Proteins to Check* Reads the list of proteins that need to be removed. .. code-block:: python proteins_to_check = eML.load_proteins_from_file(args.proteins_file) f. *Validate Input Files and Load Reference Model* Ensures the required input files exist and loads the appropriate reference model. .. code-block:: python eML.validate_files(args.adata_file, args.RNApath, args.metapath, args.umappath, args.ADTpath, args.protein) ref_model, ref_adata = eML.get_model_path(args.ref_model, args.ref_adata) g. *Load and Preprocess Data* Loads the AnnData object and applies preprocessing steps required for model training. .. code-block:: python adata, ref, protein_adata = eML.load_data(args.adata_file, ref_adata, args.RNApath, args.metapath, args.umappath, args.ADTpath, args.protein) adata = eML.preprocess_data(adata, args.protein, protein_adata, ref, adata.obs, args.batch, proteins_to_check, args.protein_suffix, args.output_dir, args.patient, args.mouse) h. *Train TOTALVI Model and Classify Latent Space* Trains the TOTALVI model and classifies NK cells based on learned latent features. .. code-block:: python vae_q = eML.train_totalvi_model(adata, ref_model, ref, args.adversarial_classifier) predictions, probs = eML.classify_latent_space(vae_q, adata, args.classifier_type) i. *Save Results and Assign Cell Types* Saves classification results and assigns NK cell types based on model predictions. .. code-block:: python eML.save_results(adata, predictions, probs, args.output_dir, args.patient, vae_q, args.classifier_type, args.mouse) adata = eML.classify_cells(adata, args.classifier_type, args.output_dir, args.patient) Output File Structure: ---------------------- .. code-block:: text output_dir/ ├── _arguments_used.txt ├── _prepped.h5ad ├── _probabilitiesoutput.csv ├── _eMLclassified_adata.h5ad └── _vae_model_withclassifiers/ └── model.pt Visualize output data --------------------- *Violin plot of selected gene in NK_type categories:* .. code-block:: bash sc.pl.violin(adata_PBMC, ["FCGR3A"], groupby="NK_type", rotation= 45) .. image:: images/violinFCGR3A_PBMC.png :alt: Description of the image :align: center :width: 600px *Dotplot of selected genes in NK_type categories:* .. code-block:: bash genes= ["FCGR3A", "ENTPD1", "KLRC1", "FGFBP2", "GZMK"] # Subset the adata object to include only the valid genes adata_valid = adata_PBMC[:, genes] # Scale the expression data for the valid genes sc.pp.scale(adata_valid, zero_center=True, max_value=None) # Generate a dot plot for the 'NK_type' column sc.pl.dotplot(adata_valid, var_names=valid_genes, groupby='NK_type', vmax=1.5, use_raw=False, cmap='Reds') .. image:: images/Dotplot_NK.png :alt: Description of the image :align: center :width: 500px Visualize ouput data more with `Scanpy Documentation `_. *Note:* If you have preprocessed data already, you can initiate with defining command line arguments, loading reference model which is later followed by training the model with it's next steps in interactive session.