• Sana Voice Overview
  • Scoring A Phrase
  • Object Model
  • API Semantics
  • Sana Voice Overview

    Learning to speak new languages can be difficult. Sana Voice empowers learners to perfect their pronunciation and sound like a native with state-of-the-art speech recognition technology. Sana Voice effectively models pronunciation independent of your native language and provides instant personal feedback.

    Sana Voice API can be used to score a word, sentence or phrase. The endpoint contains an overall score as well as scoring for each word at both phoneme and character level.

    Sana Voice Example Image

    Scoring A Phrase

    Scores a word, sentence or phrase. The endpoint contains an overall score as well as scoring for each word at both phoneme and character level.

    POST https://voice.sanalabs.com/api/v0.1/score

    Body Parameters

    Key Mandatory Type Description
    dialect Yes string The dialect to use for scoring. As of now, only en-us is supported.
    user_id Yes string User ID of the end user who the pronunciation feedback is provided for. This should be anonymized.
    target_phrase Yes string A word, phrase or sentence to score.
    audio Yes Binary .wav file with the user audio to be scored.

    Response Format

    On success, the HTTP status code in the response header is 200 OK and the response body is empty. On error, the header status code is an error code and the response body contains a list of Error Response objects.

    Key Type Description
    overall_score int A value between 0-100. The score of the overall phrase.
    word_scores Array of Word Score objects Scorings for each different word including phonemes.

    Example

    Request

    
    curl "https://voice.sanalabs.com/api/v0.1/score" \
    -H "Content-Type: multipart/form-data" \
    -H "X-API-KEY: $API_KEY" \
    -X "POST" \
    -F dialect=en-us \
    -F user_id=123456 \
    -F target_phrase=she looks \
    -F audio=@audio_file.wav \
    
    

    Response

    {
        "overall_score": 78,
        "target_phrase": "she looks",
        "word_scores": [{
            "word": "she",
            "score": 85,
            "phoneme_scores": [{
                "phoneme": "sh",
                "score": 93
            }, {
                "phoneme": "i",
                "score": 85
            }],
            "letter_scores": [{
                "letter": "s",
                "score": 95
            }, {
                "letter": "h",
                "score": 95
            }, {
                "letter": "e",
                "score": 85
            }]
        }, {
            "word": "looks",
            "score": 70,
            "phoneme_scores": [{
                "phoneme": "l",
                "score": 93
            }, {
                "phoneme": "oo",
                "score": 33
            }, {
                "phoneme": "k",
                "score": 75
            }, {
                "phoneme": "s",
                "score": 94
            }],
            "letter_scores": [{
                "letter": "l",
                "score": 93
            }, {
                "letter": "o",
                "score": 85
            }, {
                "letter": "o",
                "score": 10
            }, {
                "letter": "k",
                "score": 75
            }, {
                "letter": "s",
                "score": 94
            }]
        }]
    }
    

    Object Model

    This section describes the objects that are used throughout the different endpoints.

    Word Score

    Key Type Description
    word string Denotes the meaning or category of the tag.
    score int A value between 0-100. Depicts how well the learner pronounced this specific word.
    phoneme_scores An Array of Phoneme Score Objects A score of each Phoneme in a word.
    letter_scores An Array of Letter Score Objects A score of each Letter in a word.

    Phoneme Score

    Key Type Description
    phoneme string The distinct unit of sound within the word
    score int A value between 0-100.

    Letter Score

    Key Type Description
    letter string A letter within the word
    score int A value between 0-100.

    API Semantics

    This section explains the semantics of our Rest API. It includes common information that is valid for all the endpoints.

    API Endpoints

    The base URL for all our endpoints is https://voice.sanalabs.com. Please note that non-secure access to the API is not available. All HTTP requests will be redirected to HTTPS automatically.

    Authentication

    A valid API key is needed to access the Sana Voice API. Contact Sana Labs to get your own API key. Your API keys carry privileges for you to access the Sana Voice API, be sure to keep them secret. Do not share your API keys in publicly accessible places such as Github or client-side code.

    The Sana Voice API expects the API key to be included in all API requests to the server in a header that looks like the following:

    X-API-KEY: $API_KEY

    If the key is omitted or is wrong, you will get a 401 Unauthorized response to your request.

    To authorize, pass the X-API-KEY header

    curl \
      -H "X-API-KEY: $API_KEY" \
      https://voice.sanalabs.com/api/v1/score
    

    Make sure to replace $API_KEY with your API key.

    Rate Limits

    There is no hard rate limit at the moment where Sana will drop your data. However, if you need to make requests at a rate exceeding 200 req/s, please contact Sana Labs first.

    Errors

    All endpoints either result in success or an error. The API returns 200 or 201 for successful requests and relevant HTTP status code and an Error Response object in case of an error. See the Error Status Codes section for the HTTP Status Codes Sana Web API returns.

    Error Status Codes

    The Sana Web API uses the following error codes:

    Error Code Error Text Error Description
    400 Bad Request Your request is invalid.
    401 Unauthorized No API Key or your API key is wrong.
    402 Payment Required Your API Key expired.
    404 Not Found The specified resource could not be found.
    405 Method Not Allowed You tried to access a resource with an invalid method.
    429 Too many requests You have exceeded your rate limit.
    500 Internal Server Error There was a problem on the server side. Please try again later.
    503 Service Unavailable The API is temporarily offline for maintenance. Please try again later.