[go: nahoru, domu]

Skip to content

Commit

Permalink
Benchmark models against SDG 2
Browse files Browse the repository at this point in the history
  • Loading branch information
FinnWoelm committed May 10, 2024
1 parent 569453a commit 8bfbbf9
Show file tree
Hide file tree
Showing 27 changed files with 204 additions and 50 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,16 +209,16 @@ The table below shows the accuracy (in percent) of models tested against this be

<!-- evaluation table begin -->

| Model | Average | SDG 1 | SDG 3 | SDG 4 | SDG 5 | SDG 6 | SDG 7 | SDG 10 |
| :--------------------------------------------------------------------------------------------------------------------- | ------: | ----: | ----: | ----: | ----: | ----: | ----: | -----: |
| [AFD SDG Prospector](https://github.com/SDGClassification/benchmark/tree/main/evaluations/sdg-prospector/) | 91 | 95 | 91 | 95 | 81 | 92 | 95 | 87 |
| [Aurora SDG](https://github.com/SDGClassification/benchmark/tree/main/evaluations/aurora-sdg/) | 82 | 74 | 78 | 83 | 90 | 85 | 87 | 77 |
| [Global Goals Directory](https://github.com/SDGClassification/benchmark/tree/main/evaluations/global-goals-directory/) | 84 | 90 | 90 | 78 | 74 | 87 | 91 | 80 |
| [JRC SDG Mapper](https://github.com/SDGClassification/benchmark/tree/main/evaluations/sdg-mapper/) | 77 | 82 | 84 | 70 | 75 | 73 | 86 | 70 |
| [Meta Llama 2 70B](https://github.com/SDGClassification/benchmark/tree/main/evaluations/llama-2/) | 89 | 88 | 93 | 90 | 93 | 85 | 92 | 79 |
| [Meta Llama 3 70B](https://github.com/SDGClassification/benchmark/tree/main/evaluations/llama-3/) | 88 | 77 | 86 | 85 | 91 | 92 | 91 | 92 |
| [OpenAI GPT-3.5 Turbo](https://github.com/SDGClassification/benchmark/tree/main/evaluations/openai-gpt-3/) | 83 | 65 | 82 | 80 | 87 | 91 | 90 | 87 |
| [OpenAI GPT-4 Turbo](https://github.com/SDGClassification/benchmark/tree/main/evaluations/openai-gpt-4/) | 87 | 75 | 86 | 84 | 88 | 92 | 92 | 92 |
| Model | Average | SDG 1 | SDG 2 | SDG 3 | SDG 4 | SDG 5 | SDG 6 | SDG 7 | SDG 10 |
|:-----------------------------------------------------------------------------------------------------------------------|----------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|---------:|
| [AFD SDG Prospector](https://github.com/SDGClassification/benchmark/tree/main/evaluations/sdg-prospector/) | 90 | 95 | 87 | 91 | 95 | 81 | 92 | 95 | 87 |
| [Aurora SDG](https://github.com/SDGClassification/benchmark/tree/main/evaluations/aurora-sdg/) | 82 | 74 | 83 | 78 | 83 | 90 | 85 | 87 | 77 |
| [Global Goals Directory](https://github.com/SDGClassification/benchmark/tree/main/evaluations/global-goals-directory/) | 84 | 90 | 80 | 90 | 78 | 74 | 87 | 91 | 80 |
| [JRC SDG Mapper](https://github.com/SDGClassification/benchmark/tree/main/evaluations/sdg-mapper/) | 76 | 82 | 71 | 84 | 70 | 75 | 73 | 86 | 70 |
| [Meta Llama 2 70B](https://github.com/SDGClassification/benchmark/tree/main/evaluations/llama-2/) | 88 | 88 | 84 | 93 | 90 | 93 | 85 | 92 | 79 |
| [Meta Llama 3 70B](https://github.com/SDGClassification/benchmark/tree/main/evaluations/llama-3/) | 88 | 77 | 90 | 86 | 85 | 91 | 92 | 91 | 92 |
| [OpenAI GPT-3.5 Turbo](https://github.com/SDGClassification/benchmark/tree/main/evaluations/openai-gpt-3/) | 84 | 65 | 93 | 82 | 80 | 87 | 91 | 90 | 87 |
| [OpenAI GPT-4 Turbo](https://github.com/SDGClassification/benchmark/tree/main/evaluations/openai-gpt-4/) | 88 | 75 | 93 | 86 | 84 | 88 | 92 | 92 | 92 |

<!-- evaluation table end -->

Expand Down
5 changes: 3 additions & 2 deletions evaluations/aurora-sdg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,16 @@ Learn more: https://aurora-universities.eu/sdg-research/

| SDG | n | Accuracy (%) | Precision (%) | Recall (%) | F1 Score | TP | FP | TN | FN |
|:--------|-----:|---------------:|----------------:|-------------:|-----------:|-----:|-----:|-----:|-----:|
| Average | 78.6 | 81.9 | 80.0 | 83.0 | 0.80 | 30.9 | 7.7 | 33.7 | 6.3 |
| Average | 77.4 | 82.0 | 80.8 | 83.5 | 0.81 | 31.9 | 7.5 | 31.8 | 6.2 |
| 1 | 77 | 74.0 | 57.8 | 96.3 | 0.72 | 26 | 19 | 31 | 1 |
| 2 | 69 | 82.6 | 86.7 | 86.7 | 0.87 | 39 | 6 | 18 | 6 |
| 3 | 76 | 77.6 | 78.9 | 53.6 | 0.64 | 15 | 4 | 44 | 13 |
| 4 | 82 | 82.9 | 82.2 | 86.0 | 0.84 | 37 | 8 | 31 | 6 |
| 5 | 69 | 89.9 | 86.8 | 94.3 | 0.90 | 33 | 5 | 29 | 2 |
| 6 | 85 | 84.7 | 90.7 | 81.2 | 0.86 | 39 | 4 | 33 | 9 |
| 7 | 100 | 87.0 | 93.0 | 80.0 | 0.86 | 40 | 3 | 47 | 10 |
| 10 | 61 | 77.0 | 70.3 | 89.7 | 0.79 | 26 | 11 | 21 | 3 |

**Benchmarked on**: May 3, 2024
**Benchmarked on**: May 10, 2024

**Detailed benchmark results**: [results.csv](results.csv)
4 changes: 2 additions & 2 deletions evaluations/aurora-sdg/accuracies.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Average,SDG 1,SDG 3,SDG 4,SDG 5,SDG 6,SDG 7,SDG 10
81.9,74.0,77.6,82.9,89.9,84.7,87.0,77.0
Average,SDG 1,SDG 2,SDG 3,SDG 4,SDG 5,SDG 6,SDG 7,SDG 10
82.0,74.0,82.6,77.6,82.9,89.9,84.7,87.0,77.0
3 changes: 2 additions & 1 deletion evaluations/aurora-sdg/stats.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
sdg,n,accuracy,precision,recall,f1,tp,fp,tn,fn
Average,78.6,81.9,80.0,83.0,0.8,30.9,7.7,33.7,6.3
Average,77.4,82.0,80.8,83.5,0.8,31.9,7.5,31.8,6.2
1,77.0,74.0,57.8,96.3,0.7,26.0,19.0,31.0,1.0
2,69.0,82.6,86.7,86.7,0.9,39.0,6.0,18.0,6.0
3,76.0,77.6,78.9,53.6,0.6,15.0,4.0,44.0,13.0
4,82.0,82.9,82.2,86.0,0.8,37.0,8.0,31.0,6.0
5,69.0,89.9,86.8,94.3,0.9,33.0,5.0,29.0,2.0
Expand Down
5 changes: 3 additions & 2 deletions evaluations/global-goals-directory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,16 @@ Learn more: https://globalgoals.directory/

| SDG | n | Accuracy (%) | Precision (%) | Recall (%) | F1 Score | TP | FP | TN | FN |
|:--------|-----:|---------------:|----------------:|-------------:|-----------:|-----:|-----:|-----:|-----:|
| Average | 78.6 | 84.2 | 92.0 | 73.9 | 0.81 | 27.4 | 2.3 | 39.1 | 9.7 |
| Average | 77.4 | 83.6 | 92.3 | 73.8 | 0.81 | 28.1 | 2.2 | 37 | 10 |
| 1 | 77 | 89.6 | 82.8 | 88.9 | 0.86 | 24 | 5 | 45 | 3 |
| 2 | 69 | 79.7 | 94.3 | 73.3 | 0.83 | 33 | 2 | 22 | 12 |
| 3 | 76 | 89.5 | 91.7 | 78.6 | 0.85 | 22 | 2 | 46 | 6 |
| 4 | 82 | 78.0 | 87.9 | 67.4 | 0.76 | 29 | 4 | 35 | 14 |
| 5 | 69 | 73.9 | 100.0 | 48.6 | 0.65 | 17 | 0 | 34 | 18 |
| 6 | 85 | 87.1 | 100.0 | 77.1 | 0.87 | 37 | 0 | 37 | 11 |
| 7 | 100 | 91.0 | 97.7 | 84.0 | 0.90 | 42 | 1 | 49 | 8 |
| 10 | 61 | 80.3 | 84.0 | 72.4 | 0.78 | 21 | 4 | 28 | 8 |

**Benchmarked on**: May 3, 2024
**Benchmarked on**: May 10, 2024

**Detailed benchmark results**: [results.csv](results.csv)
4 changes: 2 additions & 2 deletions evaluations/global-goals-directory/accuracies.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Average,SDG 1,SDG 3,SDG 4,SDG 5,SDG 6,SDG 7,SDG 10
84.2,89.6,89.5,78.0,73.9,87.1,91.0,80.3
Average,SDG 1,SDG 2,SDG 3,SDG 4,SDG 5,SDG 6,SDG 7,SDG 10
83.6,89.6,79.7,89.5,78.0,73.9,87.1,91.0,80.3
Loading

0 comments on commit 8bfbbf9

Please sign in to comment.