Phoneme-Level Speech API

Speech-to-IPA & Pronunciation Assessment API

Langcraft provides a developer-first API that converts spoken audio into IPA phonemes with millisecond timestamps, pronunciation confidence, and mispronunciation detection. It is purpose-built for high-volume language learning apps, oral assessments, and real-time speech feedback experiences.

Send WebM, WAV, MP3, or M4A recordings and receive JSON that includes aligned words, canonical phonemes, produced phonemes, similarity scores, and articulation metrics. Use the alignment data to highlight substitutions, insertions, and deletions for each learner.

Key API Features

  • Accurate Transcription: Speech-to-IPA with per-phoneme start and end timestamps.
  • Scoring: Confidence scores and articulatory similarity metrics for automated pronunciation assessment.
  • Alignment: Forced alignment of audio to reference text to detect mispronunciations and fluency gaps.
  • Format Support: Native support for WebM, WAV, MP3, and M4A uploads.
  • Integration: Simple REST endpoint: POST https://api.langcraft.world/transcribe

Enterprise Use Cases

Teams use Langcraft to power placement tests, pronunciation drills, shadowing exercises, and automated speaking assessments. Phoneme-level scores make it easy to highlight specific errors (like /θ/ vs /s/ substitutions) and measure fluency without manual grading.

  • EdTech: Language learning apps that need instant feedback after each speaking turn.
  • Assessment: Exam platforms that score oral proficiency at scale with high human-rater correlation.
  • Speech Therapy: Clinical tools that visualize articulation gaps over time.
  • Training: Customer success teams measuring agent clarity and script adherence.

Get started

Review the API documentation, try the live demo, or request an API key to integrate phoneme-level speech analysis into your product.

Read the API docs Test the live demo Request an API key