← back
ENGtoID
github.com/daverlon/ENGtoID
Seq2Seq · LSTM · PyTorch · NLP · Machine Translation · Data Preprocessing · Tokenization
Overview
ENGtoID is a sequence-to-sequence neural machine translation system that translates English sentences into Bahasa Indonesia using an LSTM encoder-decoder architecture.
ENGtoID performance
The plots show the training and validation performance of the model, demonstrating dips in accuracy caused by experimental teacher forcing
The project was built as an end-to-end ML pipeline, covering dataset ingestion, cleaning, vocabulary construction, training, evaluation, and error analysis at scale (~1M sentence pairs).
Pipeline Design
Model Architecture
Training Strategy
Key Engineering Decisions
Evaluation
Error Analysis
Limitations
Future Improvements