High Loss with Zero WER, How?

I’m training my own model on Arabic language.

Just trying to understand the loss function here, how can it provide a high loss with Zero WER (the result exactly matches the source) ?

WER: 0.000000, CER: 0.000000, loss: 53.280418
 - wav: file:///old/input.wav
 - src: "إِنِّى ظَنَنتُ أَنِّى مُلَٰقٍ حِسَابِيَهْ"
 - res: "إِنِّى ظَنَنتُ أَنِّى مُلَٰقٍ حِسَابِيَهْ"
--------------------------------------------------------------------------------
Best WER: 
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 54.017227
 - wav: file://wav/043040_Maher_AlMuaiqly_64kbps.wav
 - src: "أَفَأَنتَ تُسْمِعُ ٱلصُّمَّ أَوْ تَهْدِى ٱلْعُمْىَ وَمَن كَانَ فِى ضَلَٰلٍ مُّبِينٍ"
 - res: "أَفَأَنتَ تُسْمِعُ ٱلصُّمَّ أَوْ تَهْدِى ٱلْعُمْىَ وَمَن كَانَ فِى ضَلَٰلٍ مُّبِينٍ"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 49.771873
 - wav: file://wav/025030_Maher_AlMuaiqly_64kbps.wav
 - src: "وَقَالَ ٱلرَّسُولُ يَٰرَبِّ إِنَّ قَوْمِى ٱتَّخَذُوا۟ هَٰذَا ٱلْقُرْءَانَ مَهْجُورًا"
 - res: "وَقَالَ ٱلرَّسُولُ يَٰرَبِّ إِنَّ قَوْمِى ٱتَّخَذُوا۟ هَٰذَا ٱلْقُرْءَانَ مَهْجُورًا"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 42.961025
 - wav: file:////wav/019030_Maher_AlMuaiqly_64kbps.wav
 - src: "قَالَ إِنِّى عَبْدُ ٱللَّهِ ءَاتَىٰنِىَ ٱلْكِتَٰبَ وَجَعَلَنِى نَبِيًّا"
 - res: "قَالَ إِنِّى عَبْدُ ٱللَّهِ ءَاتَىٰنِىَ ٱلْكِتَٰبَ وَجَعَلَنِى نَبِيًّا"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 42.902084
 - wav: file://wav/044010_Maher_AlMuaiqly_64kbps.wav
 - src: "فَٱرْتَقِبْ يَوْمَ تَأْتِى ٱلسَّمَآءُ بِدُخَانٍ مُّبِينٍ"
 - res: "فَٱرْتَقِبْ يَوْمَ تَأْتِى ٱلسَّمَآءُ بِدُخَانٍ مُّبِينٍ"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 42.364765
 - wav: file://wav/019020_Maher_AlMuaiqly_64kbps.wav
 - src: "قَالَتْ أَنَّىٰ يَكُونُ لِى غُلَٰمٌ وَلَمْ يَمْسَسْنِى بَشَرٌ وَلَمْ أَكُ بَغِيًّا"
 - res: "قَالَتْ أَنَّىٰ يَكُونُ لِى غُلَٰمٌ وَلَمْ يَمْسَسْنِى بَشَرٌ وَلَمْ أَكُ بَغِيًّا"
--------------------------------------------------------------------------------
Median WER: 
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.288775
 - wav: file://wav/043050_Maher_AlMuaiqly_64kbps.wav
 - src: "فَلَمَّا كَشَفْنَا عَنْهُمُ ٱلْعَذَابَ إِذَا هُمْ يَنكُثُونَ"
 - res: "فَلَمَّا كَشَفْنَا عَنْهُمُ ٱلْعَذَابَ إِذَا هُمْ يَنكُثُونَ"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.288494
 - wav: file://wav/026130_Abu_Bakr_Ash-Shaatree_64kbps.wav
 - src: "وَإِذَا بَطَشْتُم بَطَشْتُمْ جَبَّارِينَ"
 - res: "وَإِذَا بَطَشْتُم بَطَشْتُمْ جَبَّارِينَ"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.272864
 - wav: file://wav/070010_Ghamadi_40kbps.wav
 - src: "وَلَا يَسْـَٔلُ حَمِيمٌ حَمِيمًا"
 - res: "وَلَا يَسْـَٔلُ حَمِيمٌ حَمِيمًا"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.251325
 - wav: file:///""/wav/056090_Ghamadi_40kbps.wav
 - src: "وَأَمَّآ إِن كَانَ مِنْ أَصْحَٰبِ ٱلْيَمِينِ"
 - res: "وَأَمَّآ إِن كَانَ مِنْ أَصْحَٰبِ ٱلْيَمِينِ"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.248876
 - wav: file:///wav/055050_Abdullah_Basfar_64kbps.wav
 - src: "فِيهِمَا عَيْنَانِ تَجْرِيَانِ"
 - res: "فِيهِمَا عَيْنَانِ تَجْرِيَانِ"

It seems Loss increases as WER decreases, any idea?

The loss scales with the transcript length, so it can be skewed by that for longer sentences. High loss on low WER samples could also indicate cases where the language model is doing more of the work, and the acoustic model is not actually that sure.