GitHub - ElnathanTiokou/deepcrime: DeepCrime - Mutation Testing Tool for Deep Learning Systems

Replication Package for "DeepCrime: Mutation Testing of Deep Learning Systems based on Real Faults" paper

Getting Started

In this section we provide the instructions on how to perform a test DeepCrime run on the example of MNIST subject. The run of DeepCrime would generate mutants using the selected mutation operator (‘change optimisation function’ - ‘OCH’).

There are two possible ways to perform the experiment:

Execute the project through Google Colab notebook, which provides an easy and self-contained (no need to install the requirements) way to perform the experiment (note that to be able to use this approach, one should have a Gmail account).
Download and extract the artefact on a local machine, install the requirements and run the DeepCrime project:

2.1. DeepCrime tool is located in the 'deepcrime' folder of the artefact. 2.2 Before running the tool a user should should first install the required packages. The requirements for the project are stored in the "requirements38.txt" (Python 3.8) file in the project root ('deepcrime'). 2.3 To set up and run the tool, the user should execute the following commands:

# Set up the tool
export PYTHONPATH="${absolute_path_to_the_artefact}/deepcrime"
cd {absolute_path_to_the_artefact}/deepcrime
# Execute
python run_deepcrime.py

Once executed, the tool produces a number of output files and the mutation score for the test set that was used in the experiment:

Trained models (original and mutants) are saved in the _'trained_models' _folder. Storing these models allows subsequent training-free evaluation of new test sets.
Mutated programs (programs with injected faults) are stored in the 'mutated_models/mnist'. In this example (mnist_change_optimisation_function_mutated0.py), one can see that the original 'Adadelta' optimiser was changed to the call to a mutation operator method that returns new optimiser.
The results of the evaluation of trained models for test set and train set are stored in the 'mutated_models/mnist/results_{test_set_type}' sub-folder. As DeepCrime's definition of mutation score is dependent on the killability of the operators on train data, DeepCrime gets executed twice - for the evaluation of models on train data and on test data. There are 4 types of files with resuls that are generated:
1. 'mnist.csv' contains the performance scores that are obtained from the evaluation of original models instances (10 in the example) on the corresponding (train or test) dataset.
2. The 'mnist_change_optimisation_function_mutated0_MP_sgd.csv' contains the performance scores that are obtained from the evaluation of mutated models on the corresponding (train or tesst) dataset. DeepCrime produces such files for each applied configuration of each mutation operator.
3. For each configuration of an operator DeepCrime performs a statistical test to decide whether performance achived on the original models are significantly different (so the mutation is killed) from the the perforamance observed on the mutated models generated by this mutation configuration. To the 'stats/change_optimisation_function_nosearch.csv' file we write the computed p_value, effect size and killability outcome for the applied mutation operator.
4. For the evaluated test set, DeepCrime also produces a final output file called 'mnist_ms.csv' that stores mutation and instability scores for the applied operators as well as the total mutation score of the test set.

Note: Throught the artefact, a name 'lenet' is often used to represent files related to the UnityEyes subject. This is due to the fact that the deep neural network of this case study is based on the LeNet architecture [5].

Detailed Description

On the feasibility of full replication

In the subsection Replication from Scratch we provide all the scripts and data required to perform the full replication. However, below we explain the reasons on why the full replication of our experiments is not very feasible in the scope of the artifact evaluation:

DeepCrime should be run on 5 different subjects with 3 different test suites (training data, strong test suite, weak test suite) performing binary or exhaustive searches for the majority of operators. This would lead to the training of 13500 deep learning models. We have performed these experiments on a large number of GPU powered machines including multiple Amazon EC2 instances. Therefore, it might be not possible to replicate the experimental procedure in a short time without a high number of GPU powered machines.
To perform the triviality and the redundancy analysis, the killing probability of each input in the training data for each generated mutant should be calculated. Given the overall number of mutants and inputs, this is an expensive task, therefore we it might also be outside the scope of the artifact evaluation.
To replicate the comparison with DeepMutation++, 400 mutant models per each applicable mutation operator to the each subject study (with the exception of MovieRecommender) have to be generated and then evaluated on two test suites (strong and weak). Depending on the hardware used, this might also become a costly experiment.

The structure of the replication package

Our replication package contains the following folders:

Folder Data contains the intermedite files obtained during our experiments. It is divided into the following subfolders: 1.1. deepcrime_output contains the output files produced by DeepCrime for each subject. The main folder contains CSV files which store the accuracy (or other performance) metric's values across 20 runs of the original or the mutated model. The subfolder stats contains the results for the performed searches and the killing information for each mutation configuration generated by DeepCrime. 1.2. input_dicts contains (for each subject) the dictionary that stores for each pair of killable mutants the indices of inputs from the training data for which their confidence intervals do not intersect. 1.3. inputs_killability contains (for each subject) the killability properties of each test input from the training data for each mutant generated by DeepCrime. The files not ending with '_ki.npy' store for each of the 20 performed runs whether the input is correctly predicted or not by the model specified in the file's name. The files ending with '_ki.npy' store whether the input is correctly predicted for the original model and incorrectly predicted for the mutant specified in the file's name. 1.4. predictions contains predictions of the strong and weak test suites for original models and DeepMutation++ mutants.
Folder Datasets contains full datasets used for the training of each subject systems along with weak test suites generated for them.
Folder DCReplication contains the source code required to replicate the results of our experiments.
Folder deepcrime contains the source code of the DeepCrime tool.
Folder Models is supposed to contain the trained models for each subject. However, due to the very large size (~360 GB) we had to upload the models separately. The links to the Zenodo artefacts with models are listed in the Replication from Scratch subsection.
Folder Subjects contains the source code for our subject systems.
Folder 'MutationOperators' contains spreedsheets that provide insight into the process of the extraction of mutation operators.
Folder 'Results' is the directory to which the output files of the scripts that replicate the results of the paper will be written upon the execution.

Fast Replication

Replication of Results

Similarly to the 'Getting Started' example, there are two ways to run the fast replication of the results:

Follow the instructions in the Google Colab notebook.
Execute the replication scripts on the local machine: 2.1. The scripts to run the replication are stored in the 'replication_scripts' directory of 'DCReplication' folder of the artefact. Data files necessary to run the scripts can be found in the corresponding sub-folders of the 'Data' folder of the artefact. 2.2. Before running the tool a user should first install the required packages. The requirements are specified in the 'requirements38.txt' file (for Python 3.8) that can be found in ‘DCReplication’. The requirements are the same as for the 'Getting Started' (DeepCrime) example, so there is a need to install the packages only once. 2.3. To run the tool, the user should execute the following commands:

export PYTHONPATH="${absolute_path_to_the_artefact}/DCReplication"
cd {absolute_path_to_the_artefact}/DCReplication/replication_scripts
python run_deepcrime.py

Below we describe in detail the scripts that should be run to replicate the results for Research Questions (RQs) of DeepCrime paper:

Triviality analysis and killability score (RQ1)

To perform the triviality and redundancy analysis, one would need the killability probability of each input in the training data for each mutant. The arrays that store these values are in the inputs_killability folder. To calculate the killability scores one would need the killed configurations, which are stored in the folder deepcrime_output. Running the following script would extract all the necessary information from these files to reproduce the triviality analysis results. It takes less than a minute to run.

python triviality_analysis.py

The script will generate the triviality_analysis folder in Results along with the following files:

table4.csv - a CSV file that contains information for Table 4 of the paper
{subject_name}_triviality_score.csv - a CSV file for each subject that reports triviality score for each mutant
{subject_name}_trivial_mutants.txt - a TXT file for each subject that lists its trivial mutants, i.e. mutants with triviality score >= 0.9

Redundancy analysis (RQ2)

To perform the redundancy analysis, one would need to identify inputs for which the confidence intervals of killing probability do not intersect. We have determined such inputs and stored their indexes for each pair of mutants in pickle files in the directory inputs_killability. Running the following script will extract all the necessary information to perform the redundancy analysis on all subjects. It takes around 4 minutes to run.

python redundancy_analysis.py

The script will generate redundancy_analysis folder in Results along with the following files:

table5.csv - a CSV file that contains information for Table 5 in the paper
{subject_name}_redundant.csv - a CSV file that contains the list of redundant mutants for each subject
{subject_name}_non_redundant.csv - a CSV file that contains the list of non redundant mutants for each subject

Comparison with DeepMutation++ (RQ3)

To reproduce the comparison with DeepMutation++, one would need to calculate the mutation scores for DeepCrime and DeepMutation++ for both strong and weak test suites and the resulting sensitivities. This is done by comparing the performace of original and mutated models on corresponding test sets. We store such data in deepcrime_output folder for DeepCrime, while for DeepMutation++ we store all the predictions in the predictions folder and the ground truth for each input of test sets in the corresponding Datasets sub-folders for each subject study. Running the following script will automatically perform the necessary steps in less than a minute.

python mutation_score.py

The script will generate mutation_score folder in 'Results' along with the following files:

table6.csv - a CSV file that contains information for Table 6 in the paper
{subject_name}_unstable_operators.txt - a TXT file that contains the number and list of unstable DeepCrime operators for each subject
{subject_name}_results_{dataset_type}_ts.csv - a CSV file with the final output of DeepCrime for the strong/weak test set. The file contains mutation and instability score for each operator and the overall mutation score

Extraction of mutation operators

We have started the extraction of mutation operators by analysing replication packages of three existing studies on DL faults.

The study by Humbatova and Jahangirova et al. (ICSE20) became the principal source for the extraction of our mutation operators. File 'icse20_issues.xlsx' (see 'MutationOperators' folder) illustrates the process of the replication package analysis and the extraction performed by two of DeepCrime authors. The replication package of the ICSE20 consists of two parts: issues mined from StackOverflow (SO) and GitHub (GIT) and issues obtained from interviews with DL practitioners.

The list of the sheets in this files and the descriptions:

'SO_GIT' sheet contains all the DL related entities (faults) obtained from SO and GIT mining. Columns:

'Round' - the labelling round in which the entity was discovered (more details in the ICSE20 paper)
'EntityType' - Type of the entity (for example, 'commit-tensorflow' for GIT commit or 'so-tensorflow' SO post)
'Link' - Link to the GIT commit or SO post
'Taxonomy Tag' - The tag under which the entity appears in the taxonomy
'Taxonomy Cat' - The category under which the entity appears in the taxonomy
'General Note' - Note on the issue given by the evaluator
'Error Message' - Error message that appears as a result of the fault
'Person' - Evaluator's name

'operators_from_so_github' sheet contains the list of the entities from 'SO_GIT' that were relevant for the operator extraction. Columns:

The colums are identical to the ones in 'SO_GIT' sheet with the addition of one new column:
'Proposed Mutation Operator' - Descriptions of the proposed mutation operator

3-4. 'int_mo_{evaluator}' sheet contains the analysis of all the taxonomy issues obtained from interviews with practitioners done by the corresponding evaluator. Columns:

'ID' - Taxonomy category/sub-category ID to which the issue belongs
'Taxonomy Leaf' - Taxonomy leaf (tag) to which the issue belongs
'Proposed Mutation' - Description of the proposed mutation operator
'Operator Comments' - General comments on the proposed operator

'int_mo_combined' sheet contains combined operators extraction from interview tags performed by both the evaluators. Columns:

The columns in this sheet are the merge product of the columns in the sheets 3 and 4 by the 'ID' and 'Taxonomy Leaf' with the addition of:
'Final' - Proposed operator after consensus discussion

'Proposed SO_GIT_INT' sheet contains combined list of the proposed operators from both sources. Columns:

'Source' - Source of the operator (GIT & SO or Interviews)
'Proposed Mutation Operator' - Description of a proposed mutation operator
'Proposed Mutation Operator Name Cleaned' - Cleaned description of a proposed mutation operator

'final-noduplicates' sheet contains the final list of the proposed mutation operators from the taxonomy after the cleaning from duplicates. Columns:

'Proposed Mutation Operator' - Description of a proposed mutation operator

File 'fse19_issta18_issues.xlsx' illustrates the process of the replication package analysis for and extraction of MOs from the works by Zhang et al. [2] (ISSTA18) and Islam et al. [3] (FSE19).

'FSE19' sheet contains list of GIT or SO issues that do not have the 'effect' or 'crash' from the replication package of FSE19 paper. Columns:

'Post#' - Link to a GIT issue or the number of a SO Post
'Bug Type' - Type of the issue according to the classification adopted in FSE19
'Root Cause' - Root cause of the issue according to the classification adopted in FSE19
'Effect' - Effect of the issue according to the classification adopted in FSE19
'Framework' - Framework that was used to develop the faulty application
'Source' - Source of the issue (GIT or SO)

'FSE19 without duplicates' sheet contains the list of issues from the 'FSE19' sheet cleaned from duplicates. Columns:

The first 6 columns are identical to the ones in 'FSE19' sheet with the addition of three new columns:
'General Note' - Note on the issue given by the evaluator
'Error Message' - Error message that appears as a result of the fault
'Person' - Evaluator's name

'FSE19 final' sheet contains the list of the issues from 'FSE19 without duplicates' that were relevant for the operator extraction and operators that were proposed from them. Columns:

'ALL' - Relevant issuess
'Proposed Operator' - Proposed operator

'ISSTA18 SO' sheet contains list of SO issues from the replication package of ISSTA18 paper. Columns:

'Issue' - Link to the StackOverflow post with description of the issue
'General Note' - Note on the issue given by the evaluator
'Error Message' - Error message that appears as a result of the fault
'Person' - Evaluator's name

'ISSTA18 GitHub' sheet contains list of GitHub issues from the replication package of ISSTA18 paper. Columns:

The columns are identical to the ones in 'FSE19' sheet with the addition of:
'Num' - ID number of the issue in the replication packages
'Relevant Issues'- Proposed operator

'ISSTA18 final' sheet contains the list of the issues from 'ISSTA18 SO' and 'ISSTA18 GitHub' that were relevant for the operator extraction and operators that were proposed from them. Columns:

The columns are identical to the ones in 'FSE19 final' sheet

The file 'mutation_operators.xlsx' contains the detailed list of the proposed operators from all three papers (ICSE20, FSE19, ISSTA18).

'Info sheet contains the references to the papers from which the operators were extracted.
'MO Extraction by paper' - Contains the list of candidate operators grouped by the source. Columns:

'ICSE 2020' - List of the operators proposed from ICSE20 paper ('icse20_issues.xlsx' file, 'final-noduplicates' sheet)
'ISSTA 2018' - List of the operators proposed from ISSTA18 paper ('fse19_issta18_issues.xlsx' file, 'ISSTA18 final' sheet)
'FSE 2019' - List of the operators proposed from FSE19 paper ('fse19_issta18_issues.xlsx' file, 'FSE19 final' sheet)

'MO Extraction by paper analysed' sheet contains the analysed list of the candidate operators. Columns:

The columns are identical to the ones in 'MO Extraction by paper' sheet In particular, in green colour we mark unique operators w.r.t. the first column ('ICSE 2020') that were selected for the final list of proposed operators; In red colour we mark the operators that were rejected for the final list as they are either too specific or too general. Finally, in brown colour we marked the operators that are duplicate w.r.t. the first column ('ICSE 2020').

'Operators List by Implemt.' sheet contains the final list of operators grouped by implementation status and operator category. The first group on this sheet lists the operators that were implemented in DeepCrime paper and the second group those that were not. Columns:

'Initial Operator Name' - Operator's name
'Source' - Source paper
'Final Operator Name' - Operator's final name (as appears in the DeepCrime paper)
'Operator ID '- Abbreviation (ID) for the operator
'Group' - Group to which the operator belongs
'Parameters' - List of parameters of the operator ('-' if none, 'NA' if operator is not implemented)

'Operators List by Group' sheet contains the final list of operators combined by operator group. Columns:

The columns are identical to the ones in 'Operators List by Implemt.' sheet

'Applicability to Subjects' sheet contains information on applicability of the operators to the subject studies. Columns:

'ID' - Abbreviation (ID) for the operator
_'Group' - Group to which the operator belongs
'Operator Name' - Name of the operator
'{SubjectName} Applicable' - Shows whether the operator is applicable to the corresponding case study ('Y' if yes, 'N' otherwise)

Replication from Scratch

As we noted in the subsection On the feasibility of full replication, the following experiments are very time and resource demanding.

On the factor of randomness in our experiments

Due to the stochastic nature of Deep Learning training and the randomness of some of the mutation operators, the results obtained on the full replication of the experiments might be slightly different. However, it is possible to use already trained models and to generate the intermediate files that we store in 'Data' folder of the replication package.

Download already trained DeepCrime and DeepMutation++ models

We made all the models generated by DeepCrime and DeepMutation++ as part of our experiments publicly available. Due to the high number of mutants and large size, the dataset had to be scattered across various Zenodo datasets. Below are the links for each of the subject system:

Movie Recommender and UnityEyes: https://zenodo.org/record/4737645#.YJG6D-uxW_s

MNIST: https://zenodo.org/record/4737748#.YJG6LuuxW_s https://zenodo.org/record/4737754#.YJG6QOuxW_s

Udacity: https://zenodo.org/record/4737808#.YJHhaOuxXRZ https://zenodo.org/record/4737810#.YJHhOeuxXRZ

Speaker Recognition: https://zenodo.org/record/4737848#.YJHhmeuxXRZ https://zenodo.org/record/4737850#.YJHhguuxXRZ

Run Deepcrime for all subjects

To re-run DeepCrime for all the subjects, the user should first set up the DeepCrime tool to a local machine following the instructions provided in the Getting Started subsection.

Then, the full replication can be started by executing the following script:

python run_deepcrime_full.py

If the user wants to re-use the trained models provided in the previous subsection, before executing the DeepCrime script, the user should download and place all the trained models for all subjects together in 'deepcrime/trained_models' folder. Otherwise, the DeepCrime will train and save all the models to the 'trained_models' folder itself.

Finally, DeepCrime would produce files similar to those in 'Data/deepcrime_output' for each subject and each used test set type ('train', 'test', 'weak') in the 'mutated_models/{subject_name}/results_{dataset_type}' of deepcrime.

Note: in order to run DeepCrime for the Udacity subject one would need to install package versions from 'requirements37.txt'.

Analysis of DeepCrime results

To enable the subsequent analysis on the generated data, once the DeepCrime models have been generated (or downloaded), they should be placed in the Models folder of the artefact (in its subfolder corresponding to each subject).

Calculate probability of killing for training data inputs

To calculate the mutant killing probabilities for training data inputs run the following script:

python killing_probability.py

It will generate arrays containing the necessary information in the Results/killability_analysis folder.

Generate input dictionaries for redundancy analysis

To identify for each pair of killable mutants the inputs from training data for which the confidence intervals of the probability of killing do not intersect run the following script:

python generate_input_dicts.py

This will create file {subject_name}_input_dict.pickle for each subject system in the Results/input_dicts foldeer.

Generate weak test suites for all subjects

To construct weak test sets for our subjects one should run the following scripts:

python {subject_name}_weak_ts_construction.py

The generated weak datasets are saved to 'Results/weak_ts'.

Run DeepMutation++ for all subjects

To be able to use DeepMutation++ we have updated it to be compatible with Python 3.8 and to be applicable to Functional models of Keras. We have written additional scripts to automate the generation of models per subject and to calculate the final mutation score of a test suite.

From 8 operators only 5 are applicable to UnityEyes, 6 to Audio, and all 8 to MNIST and Udacity. We have generated 400 mutated models per operator to achieve stable values of mutation score. Please note that due to randomness of the mutation operators, the obtained results might not be exactly the same as reported in DeepCrime paper.

As operators in DeepMutation are very random in nature, the lower number of mutant instances may result in significantly different mutation score achieved through a number of experiments. As generated models are too heavy to be included in the artefact, we have uploaded them separately.

To calculate mutation scores for the subject studies via DeepMutation++ tool ('DCReplication/deepmutationpp'), one would need to either dowload or generate all the mutants and then to run the evaluation.

Note: If the user has already installed the requirements for the DeepCrime tool or for the replication scripts, there are no additional packages to be installed to run DeepMutation++. In order to run DeepMutation++ for the Udacity subject one would need "requirements37.txt" (can be found in DCReplication subfolder of the artefact).

To set up DeepMutation++ execute the following commands:

export PYTHONPATH="${absolute_path_to_the_artefact}/DCReplication"
cd {absolute_path_to_the_artefact}/DCReplication

If the user chooses to use the downloaded models, they should be placed into the "deepmutationpp/mutated_models_{subject_name}".

Otherwise, to generate the mutants please run the following script:

python deepmutationpp/cnn_mutation_{subject_name}/src/run_dmpp.py

Generated models are saved in the following folders: "deepmutationpp/mutated_models_{subject_name}"

To evaluate the mutants and to calculate the MS for strong and weak test sets, please run the following script:

python deepmutationpp/cnn_mutation_{subject_name}/src/run_mutanal.py

This script would produce NPY files with predictions for the original and mutated models for both strong and weak datasets and save them in the following folders: "deepmutationpp/predictions/predictions_{subject_name}".

The evaluation results are saved in the following folders:
'deepmutationpp/results_{subject_name}/mut_score_calc_{dataset_type}_check.txt'.

Once the predictions are created, they can be placed to the "Data/predictions" folder to allow the automatic calculation of the summary table.

To automatically calculate the DeepMutation++ results from the generated prediction files, please run:

python replication_scripts/calculate_deepmutationpp_results.py

This script would produce a CSV file called 'deepmutationpp_results' in the 'Results' folder of the replication package.

Running DeepCrime with a new subject system

To run DeepCrime on a new subject it is necessary to perform a number of modifications to the code of the program:

Enclose the code that contains the training and the evaluation of a DL model under test into a function named 'main' that has one parameter specifying the name under which the trained model should be saved:

def main(model_name):

The 'main' function should return the results of the model evaluation on a test set:

score = model.evaluate(x_test, y_test, verbose=0)

return score

If a user wishes to save the trained models, s/he should also modify the code in the following way (Optional):

model_location = os.path.join('trained_models', model_name)

if (not os.path.exists(model_location)):
    ###
    # Model construction and training
    # 
    # For example:
    model.compile(...)
    model.fit(x_train, y_train, ...)
    ###
    model.save(os.path.join('trained_models', '{subject_name}_trained.h5'))
    score = model.evaluate(x_test, y_test, verbose=0)
    
    return score
else:
    model = tf.keras.models.load_model(model_location)
    score = model.evaluate(x_test, y_test, verbose=0)
    
    return score

To enable the calculation of DeepCrime's mutation score, the user would need also to produce a duplicate of the program file named _'{original_program_name}train' and change the evaluation of the model to the one on the train set:

score = model.evaluate(x_test, y_test, verbose=0)
### Should change to
score = model.evaluate(x_train, y_train, verbose=0)

In case train data is improssible to be identified automatically from a 'model.fit' call or if the train data undergoes major modifications/augmentation after it was loaded, it is necessary to provide he annotation to the data at the lines where it can be mutated (before augmentation):

    x_train: 'x_train' 
    y_train: 'y_train'

The produced program files along with their dependency files should be placed into 'deepcrime/test_models'.
Fill the mutation parameters in the deepcrime/utils/properties.py and deepcrime/utils/constants.py.
Finally, the user should make the following changes to the run_deepcrime.py:

def run_automate():
    # Specify the subject name
    data['subject_name'] = '{subject_name}'
    # Specify the name of the program PY file
    data['subject_path'] = os.path.join('test_models', '{program_name}.py')
    # Specify the list of Mutation Operators to apply
    data['mutations'] = ['{MO1}', '{MO2}', ...]

    dc_props.write_properties(data)

    # Remove the following block of code,
    #  which is only there for the Getting Started Example (lines 13-19)
    shutil.copyfile(os.path.join('utils', 'properties', 'properties_example.py'),
                    os.path.join('utils', 'properties.py'))
    shutil.copyfile(os.path.join('utils', 'properties', 'constants_example.py'),
                    os.path.join('utils', 'constants.py'))

    importlib.reload(props)
    importlib.reload(const)
    # End block for removal

    run_deepcrime_tool()

    if props.MS == 'DC_MS':
        data['mode'] = 'train'
        # Specify the name of the program file duplicate 
        #  that contains the evaluation on train data
        data['subject_path'] = os.path.join('test_models', 'mnist_conv_train.py')
        dc_props.write_properties(data)

The user should then run run_deepcrime.py

References:

[1] Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1110–1121, 2020. DOI:https://doi.org/10.1145/3377811.3380395

[2] Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 129–140, 2018. DOI:https://doi.org/10.1145/3213846.3213866

[3] Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 510–520, 2019. DOI:https://doi.org/10.1145/3338906.3338955

[4] Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepCrime: Mutation Testing of Deep Learning Systems based on Real Faults. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 21).

[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI:https://doi.org/10.1109/5.726791

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
datasets		datasets
operators		operators
test_models		test_models
trained_models		trained_models
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deep_crime.py		deep_crime.py
mutate.py		mutate.py
mutation_score.py		mutation_score.py
mutations.py		mutations.py
requirements37.txt		requirements37.txt
requirements38.txt		requirements38.txt
run_deepcrime.py		run_deepcrime.py
run_deepcrime_full.py		run_deepcrime_full.py
run_deepcrime_properties.py		run_deepcrime_properties.py
settings.py		settings.py
stats.py		stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package for "DeepCrime: Mutation Testing of Deep Learning Systems based on Real Faults" paper

Getting Started

Detailed Description

On the feasibility of full replication

The structure of the replication package

Fast Replication

Replication of Results

Triviality analysis and killability score (RQ1)

Redundancy analysis (RQ2)

Comparison with DeepMutation++ (RQ3)

Extraction of mutation operators

Replication from Scratch

On the factor of randomness in our experiments

Download already trained DeepCrime and DeepMutation++ models

Run Deepcrime for all subjects

Analysis of DeepCrime results

Calculate probability of killing for training data inputs

Generate input dictionaries for redundancy analysis

Generate weak test suites for all subjects

Run DeepMutation++ for all subjects

Running DeepCrime with a new subject system

References:

About

Releases

Packages

Languages

License

ElnathanTiokou/deepcrime

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "DeepCrime: Mutation Testing of Deep Learning Systems based on Real Faults" paper

Getting Started

Detailed Description

On the feasibility of full replication

The structure of the replication package

Fast Replication

Replication of Results

Triviality analysis and killability score (RQ1)

Redundancy analysis (RQ2)

Comparison with DeepMutation++ (RQ3)

Extraction of mutation operators

Replication from Scratch

On the factor of randomness in our experiments

Download already trained DeepCrime and DeepMutation++ models

Run Deepcrime for all subjects

Analysis of DeepCrime results

Calculate probability of killing for training data inputs

Generate input dictionaries for redundancy analysis

Generate weak test suites for all subjects

Run DeepMutation++ for all subjects

Running DeepCrime with a new subject system

References:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages