Emacsen's Blog

Bridging Jitsi Meeting with Twilio Using Python

Jitsi Meet is a Free Software audio and video conferencing platform that allows for people around the world to participate in a video conference without proprietary software like Zoom or Google Meet.

Jitsi has an add-on program called Jigasi that allows for call-ins (and even call-outs). Unfortunately, while Jitsi Meet is well documented, Jigagi has less documentation. In this guide, I will demonstrate how to set up Jitsi Meet and Jigasi using the Twilio phone platform.

This post will try to cover the basics of the various components, but I am not an expert on any of them- I just managed to get everything working after a lot of trial and error.

Connecting to the Phone Network

Jitsi is great for computer based meetings. It even has an iOS and Android app, but occassionally we need to support phone dial-in attendees. Jitsi uses a media transport called WebRTC, while VOIP software most commonly uses a protocol called SIP.

This means we need to bridge both the technical protocols but also the very different way that these two protocols see the world.

Traditionally, making a voice-enabled application would involve setting up a PBX. PBX stands for Private Branch Exchange, which is another way to say that a PBX system works like a small phone company.

In the past, PBX systems were proprietary and expensive, but Asterisk changed all that. Asterisk and other SIP FLOSS servers can run on relatively small installations, but still require a good deal of specialized knowledge to use. In addition you will still need a "trunk provider" to connect your installation to the phone network.

Twilio is a phone provider that makes it easy for programmers to build phone applications by simply putting up a web server. It requires no proprietary software on the client end, easy sign-up and competitive prices.

The largest downside of Twilio is that because of it's specialized API, there is a bit of vender lock-in, unlike using a plain SIP trunk provider and connecting it to a program like Asterisk or FreeSwitch. On the upside, the Twilio API is very somple and its tools make debugging applications a breeeze.

Since we only want one or two numbers and an easy installation, we're going to go ahead and use Twilio for this application.

Another Web Server?

Twilio has an event driven API. When a telephonony event occurs, Twilio triggers an event on its end. One option for events is to hit a specified HTTP endpoint. We can run own webserver and direct Twilio on what to do next.

For this particular application, I'm going to use the popular Python Flask web framework because it's easy and because Twilio offers an SDK that makes using it very easy, but you could use any web server you like.

Installing Jitsi Meet

If you already have Jitsi meet installed, you can feel free to skip this section.

Jitsi Meet itself is fairly well documented. To make deployment easier, I've been using the official Jitsi Meet Docker image. The installation manual for the Docker install is available here.

While not strictly necessary, since you will need to run additional services anyway, I'm using a SSL reverse proxy that integrates Let's Encrypt called docker-compose-letsencrypt-nginx-proxy-companion.

If you want to do the same , you will need to set your LETSENCRYPT_DOMAIN and LETSENCRYPT_EMAIL in your Jitsi .env, but don't set ENABLE_LETSENCRYPT. In addition, you will need to set DISABLE_HTTPS.

It should be mentioned that SSL is mandatory for WebRTC on the browser level, so using some SSL configuration is necessary, whether it's through a proxy or Jitsi itself.

You'll also need to change your docker-compose.yml file. Add VIRTUAL_HOST=${LETSENCRYPT_DOMAIN} and LETSENCRYPT_HOST=${LETSENCRYPT_DOMAIN} to your web section environment section. You'll also need to add the proxy network (which defaults to webproxy to web's networks. Just add webproxy: there and in the networks section add:

    webproxy:
      external: 
        name: webproxy

If you're already familiar with this proxy companion, or jwilder/nginx-proxy then this will be familiar to you.

Once that's working to your satisfaction, let's move onto the next step.

Dial-in Number

The next step is to sign up for Twilio and get a phone number. This is the number that people will use when they dial into the phone conference.

Because this phone connection connects to the standard phone system, you will need to pay or this, but the prices are relatively inexpensive. In my experience, my Twilio costs were about $3 a month for light/moderate usage.

As an aside, it should be mentioned that Twilio also offers their own WebRTC-based videoconferencing system. If we only cared about pricing, then it would be a safe win to use thier system, but we are using Jitsi because we also care about the Software Freedom.

Twilio's SIP

In addition to the number, you will also need to set up a SIP domain. Twilio offers a number of SIP offerings and navigating the system is a little confusing. I found this article on sip phones from Twilio was very informative.

You will need a SIP domain that represents your organization, but since you can also have multiple SIP domains, that is up to you. Similarly, you can choose a username independently of anything else, though this blog post from Twilio suggests using an E.164 format phone number for the username.

You'll also need to set parameters around network address based logins and other settings. The Twilio documentation mentions being able to create this configuration through a RESTful Interface, but since this is a one-off, I think using the GUI is easiest.

At this point I'll assume both your Twilio and SIP configuration are working and you're able to register

Web Server Setup and CORS

Setting up a web server is outside the scope of this tutorial. I'm going to assume you've already set up a web service before and or have an understanding of HTTP methods (GET, POST, etc.) as well as the basics of XML and JSON encoding.

You'll need to stand up a web server somewhere. Since you already have a Jitsi instance, standing up another service should be fairly easy for you. If you want to use Docker, look at the Dockerfile included in the project for an example of setting up a Docker instance.

You may also find ngrok to be a nice tool to use during testing, but this is optional.

For Twilio's purposes, we just need a web server running somewhere it can talk to, but we'll also be setting up a could of HTTP endpoints for Jitsi Meet clients, and because of that, we'll need to set up Cross-Origin Resource Sharing or CORS on the web server. In my example we're going to configure the server to return the header:

Access-Control-Allow-Origin: *

But you may want to restrict the respone only to your Jiti domain but I have't bothered in my example.

You will also need to configure SSL access to your web server- not for Twilio, but later on when we configure Jitsi Meet. You can certainly wait on this step until you're ready to test the Jitsi Meet components, but it's good to know that you'll need this eventually.

For the remainder of this tutorial, I'm going to assume your web server is up and running and at https://example.com.

Configuring Twilio to Answer a Call

/In this section we'll talk about Twilio and using it's event-driven system to answer calls. If you're already familiar with Twilio, you can safely skip this section./

In the old days, setting up a PBX was expensive and required both proprietary hardware and software. That all changed when Asterisk came out. With Asterisk, you could run your own PBX, either physically or virtually, with Free Software. You just had to learn a little telephony nomenclature and you could set up your own virtual PBX in an afternoon. Today, several options exist for running your own PBX, including FreeSWITCH, OpenSIPS and FreePBX (which sits on top of Asterisk). All of these systems are wonderful, but even in the best case, require some understanding of telephony and creation of a dialplan. A dialplan is nothing more than a set of instructions that are carried out when a telephony event occurs. For example your office might want the CEO's phone to ring in their office, but also ring their administrative assistant, or you might want everyone in the office to have their own extension, but only certain extensions have direct dial in numbers that people can call externally.

Twilio abstracts the dialplan idea into a series of events that you configure it to respond to. You can choose how to respond to those events, but in our case, we will use webhooks, which are nothing more than simple HTTP endpoints.

Our first example will be to configure our phone number to say a greeting and then hang up. We could easily configure this with a static file, even on the Twilio website, but by testing it on our web server, we're also ensuring that our web server is configured properly.

Twilio provides an SDK that abstracts its domain specific XML, TwiML and makes it easy to use. You don't need the SDK, of course. You could do it all yourself manually.

I'm going to name the HTTP endpoint "answer", since that is the event that we'll respond to. I'll also be setting up some basic Flask things. If you've worked with Flask before or most web frameworks, nothing here will be especially new.

    from flask import Flask
    from twilio.twiml.voice_response import VoiceResponse

    app = Flask(__name__)

    @app.route("/answer")
    def answer():
        """Respond to incoming phone calls with a greeting"""
        resp = VoiceResponse()
        resp.say("Hello and welcome to the conferencing system")
        return str(resp)

    if __name__ == "__main__":
        app.run(host='0.0.0.0', debug=True)

A majority of that program is just setting up the web server, but we can see just how easy it is to set up.

If you look at the result of hitting that endpoint, you will see something that looks like

    <?xml version="1.0" encoding="UTF-8"?>
    <Response>
      <Say>Hello and welcome to the conferencing system</Say>
    </Response>

I've formatted the output, but you can see that the result is a small XML document. We could just store that as a static file, but we're going to need to make our site more interactive later.

Checking our SIP Configuration

If you've already configured and tested your SIP endpoint, this step is unnecessary

With our telephone number and web server configured, let's turn our attention back to our SIP configuration. If you haven't done that already, go to Programmable SIP Domains and add a new domain for yourself.

Then go ahead and add a user for that domain. As mentioned earlier, one practice is to name the user the same as the phone number, but that's entirely optional.

What's not optional is making the SIP domain, user for that domain and also setting the IP address ranges that will connect to the endpoint. This will be your Jigasi server's IP, but I also recommend testing the SIP endpoint with a SIP softphone such as Linphone or Zoiper, so you'll want to add the IP address of the computer you'll be testing it from as well.

If you haven't used Twilio's SIP before, one small gotcha that I encountered is that the SIP domain is not always the same as the server, so I had to add us1 to the sip domain, such as myuser@mydomain.sip.us1.twilio.com.

Just be sure that your SIP phone can connect to the endpoint successfully. We'll be configuring our number to ring the SIP phone next. so it's a good time to ensure that this part is working before we move on.

Configuring our number to call our SIP Endpoint

We have our web server and our SIP endpoint both working, so now it's time to connect them together.

Since we're now dealing with a bunch of configuration, I'm going to use dotenv to make it easy for me to store configuration separately from the application. In production, I'm using Docker, so I'll be storing my configuration there instead, but this is a nice bridge between the two. We'll then use environ to retrieve our configuration.

Let's store our SIP user with domain as SIPURI.

Then when someone calls our number, we'll have it call the SIP endpoint. When that happens, your softphone should ring and you'll be able to talk to yourself.

    from flask import Flask
    from twilio.twiml.voice_response import VoiceResponse, Dial
    from dotenv import load_dotenv
    from os import environ

    load_dotenv()
    SIP_URI = environ['SIP_URI']

    app = Flask(__name__)

    @app.route("/answer")
    def answer():
        """Call the SIP endpoint"""
        resp = VoiceResponse()
        dial = Dial()
        dial.sip(f"sip:{SIP_URI}")
        resp.append(dial)
        return str(resp)

    if __name__ == "__main__":
        app.run(host='0.0.0.0', debug=True)

Not a lot of change here, but now when we call our phone number, it should call our SIP user, which is connected to the softphone.

If this all works, you're cooking with gas and it's time to move on to configuring Jigasi itself!

Configuring Jigasi

In this example, I'm going to be using the Docker installation of Jitsi. In this configuration, a lot of the details have been abstracted away and only need to be set inside the .env file your Jitsi installation uses.

If you're not using the Docker installation, you'll need to make the changes in the config files themselves.

Here's the relevant part of the Jitsi .env

    #
    # Basic Jigasi configuration options (needed for SIP gateway support)
    #

    # SIP URI for incoming / outgoing calls
    JIGASI_SIP_URI=SIP_USER

    # Password for the specified SIP account as a clear text
    JIGASI_SIP_PASSWORD=MY_SIP_PASSWORD

    # SIP server (use the SIP account domain if in doubt)
    JIGASI_SIP_SERVER=MYSIP_DOMAIN

    # SIP server port
    JIGASI_SIP_PORT=5060

    # SIP server transport
    JIGASI_SIP_TRANSPORT=UDP

JIGASI_SIP_URI should be the same as the SIP_URI we set for our Flask application, JIGASI_SIP_PASSWORD is the password, and JIGASI_SIP_SERVER should be the SIP Domain, including the us1 part.

Once you do this, you'll need to recreate the Jitsi and Jigasi config files. If you're using the Docker images, the .env file specifies a CONFIG variable which stores the location of the configuration directory.

You'll need to erase that directory and recreate it with:

    mkdir -p CONFIG_DIR/{web/letsencrypt,transcripts,prosody/config,prosody/prosody-plugins-custom,jicofo,jvb,jigasi,jibri}

Substituing CONFIGDIR with the location in CONFIG.

Also you'll need to be sure that from now on, you reference both the docker-compose.yml as well as the jigasi.yml files, such as:

docker-compose -f docker-compose.yml -f jigasi.yml up -d

Once you make these changes and restart the services, Jigasi should register as a SIP endpoint (just like the softphone) and be able to recieve calls. The problem is that it doesn't know which conference to send the calls to by default.

We can give Jitsi a default conference room for it to use by setting it in CONFIG/web/config.js as org.jitsi.jigasi.DEFAULT_JVB_ROOM_NAME but I think a better way is to modify our Python script to specify the room there.

What we need to do is technically to specify the room name inside of a SIP header when we make the SIP INVITE. That header is X-Room-Name by default and we can specify the room name there.

Twilio lets us set SIP headers on the URI, so all we need to do is specify X-Room-Name on the dial.sip line like so:

    dial.sip(f"sip:{SIP_URI}?X-Room-Name=MyDefaultRoomHere")

Now a call to our number will be directed to the MyDefaultRoomHere room!

Technically we could stop here. If we always know that we want calls to come into this one room, we don't need to take any further action.

But we probably want features like PIN numbers and other things, so let's go ahead and add that!

Mapping PINs to Rooms

Jitsi Meet has the concept of rooms. Rooms have a unqiue identifier which we can think of as an access token into the room. We need to map those room names to digits that we can easily type into the phone.

Then when a caller calls in, we need to ask them for a PIN and then map that back to a room name, which we then use to tell our python program where to send them.

This is a bit of a chicken and egg problem, because we need both parts to fully test this, but I'm going to implement the PIN<->Room Name mapping first.

We technically could do this entirely in memory, but then if we shut the program down, we'd lose all the previous mappings, so we need to serialize this to disk. We could use a full fledged database, but on my system I only get a few visitors a day on my Jitsi instance and generate maybe one or two new rooms a week, so a full fledged database seems like overkill as well, so I'm opting ofr a very simple solution in the form of a Python library tinydb which works like a dictionary, but loads the data each time it's called, which means that while not guaranteed, it's certainly thread safe enough for our this use.

Jitsi Meet makes the calls on the client side, from the web interface, and this is why we must address the Cross Origin Resource Sharing issue. Since we're not dealing with any large resource generally, we'll just put a blanket policy allowing anyone. In production you may wish to set this to your Jitsi meet URL.

The official Jitsi meet instance server has an instance of mapping the conference to a PIN at https://jitsi-api.jitsi.net/conferenceMapper. This URL takes in one of two parameters through a GET request, either conference or id. The conference is the full conference name, that is the room name @ the instance. The ID is what I'm caling the PIN. The result is a JSON document.

Some tutorials suggest using an auto-incrementing ID, but I think this is a mistake because even though it doesn't tell you what room you'll get, it does make it likely that someone could guess the next room PIN, so instead I'll be using a random number.

    from flask import Flask, jsonify, request
    from flask_cors import CORS
    from tinydb import TinyDB, Query
    from secrets import randbelow
    ...

    PIN_DIGITS = 6
    DB_FILE = environ.get("DB")
    db = TinyDB(DB_FILE)
    ...

    app = Flask(__name__)
    cors = CORS(app)
    ...

    @app.route('/conferenceMapper')
    def conference_mapper():
        pin, conference = request.args.get('id'), request.args.get('conference')
        if not pin or conference:
        return jsonify({"message": "No conference or id provided",
                "conference": False,
                "id": False})
        elif pin:
        result = db.search(Query().id == pin)
        if result:
            conference = result[0]['conference']
            return jsonify({
            "message": "Successfully retrieved conference mapping",
            "id": pin,
            "conference": conference})
        else:
            return jsonify({
            "message": "No conference mapping was found",
            "id": pin,
            "conference": False})
        else:
        # The conference has been specified- make a new PIN
        max_int = pow(10, PIN_DIGITS)
            while True:
            pin = randbelow(max_int)
            result = db.search(Query().id == pin)
            if not result:
                db.insert({"id": pin, "conference": conference})
                return jsonify({
                "message": "Successfully retrieved conference mapping",
                "id": pin,
                "conference": conference})        

That will give us back what Jitsi Meet expects.

If you're wondering what limebrass is, it's the name I gave to my conferencing system. It doesn't mean anything other than it's a unique name.

Now we must tell Jitsi Meet to use this new mapping. That's done by editing the CONFIG/web/config.js file and adding in dialInConfCodeUrl in the large Javascript object, before the makeJsonParserHappy, such as:

    dialInConfCodeUrl: 'https://example.com/conferenceMapper',

Now that this is done, we need to turn our attention back to Twilio for a moment and how we will connect the PIN we've just made to the phone system.

Luckily for us, Twilio makes this very easy with a Gather directive that can be used to collect digits. Our process will be to ask the caller to enter in their PIN, then if the conference exists, they'll be connected into it. If not then they'll be given another chance to enter their PIN. And if they can't do it three times, they'll be asked to call back.

Twilio's Gather directive works a bit like an HTML form in that it has an action paramater that it POSTs the result to.

If we didn't care about letting someone try to enter their pin a second or third time, we could use one single endpoint for both the answer and the gather, but since we do want to allow this, we'll need two endpoints.

First let's change our answer code and add the redirect.

Our first step then will be to change our /answer code to announce that the user is in the phone conference, then to redirect them to the gather request.

    @app.route("/answer")
    def answer():
        """Announce the conferencing system"
        resp = VoiceResponse()
        resp.say("Welcome to the conferencing system!")
        resp.redirect("/gather?tries=0")
        return str(resp)

You may have noticed that I added a query parameter tries to the URL. That's so we can count the number of tries that have been attempted and hang up when it's been too many.

Now let's work on the gather code.

    @app.route("/gather")
    def gather(methods=["GET", "POST"]):
        "Gather the PIN number"
         if request.method == "GET":
        tries = int(request.args.get("tries", 0))
        resp = VoiceResponse()
        gather = Gather(num_digits=PIN_DIGITS, action="/gather?tries={tries})
        gather.say("Please enter your conference number, followed by the pound sign.")
        resp.append(gather)
        # If no response, end the call
        resp.say("I didn't a conference pin. Please call back once you have it!")
        return str(resp)
         else:
        # This is the POST method, and should only be called once a
        # gather is made
        tries = int(request.args.get("tries", 1))
        pin = int(request.form.get("Digits", 0)
        if not pin:
            resp.say("I didn't get a conference pin. Please call back once you have it!")
            return str(resp)
        # Look up the PIN
        result = db.search(Query().id = pin)
        if not result:
            tries += 1
            if tries >= 2:
               resp.say("Too many incorrect pin attempts. Please call back once you have it!")
            resp.rediect(f"/answer?tries={tries}", method="GET')
            return str(resp)
        # Success! Redirect the caller to the correct conference!
        conference = result[0]["conference"]
        dial = Dial()
        dial.sip(f"sip:{SIP_USERDOMAIN}?X-Room-Name={conference}")
        resp.append(dial)
        return str(resp)

Phew! Our little Python program is getting bigger, but it's all relatively straightforward code.

You may notice that I'm playing a little fast and loose with error handling here. That's because this application will only be interacted with by other known applications. If an exception occurs, it's due to a bug somewhere, rather than us wanting to try to correct for it. This is also why I don't feel very strongly about disabling the Debug mode, though if I ran this for any significant installations, I would turn it off.

At this point, a user who knows a conference pin can dial in. But how will they know the number to dial into? That's the next section!

Setting Call-In Number

Now that we have the pin sorted out, let's make it easy for someone to find the call-in number(s). Jitsi Meet makes it easy to find out by having a configurable url that returns a JSON document with a list of phone numbers. This could be a static file, but let's just include it in our web application.

    @app.route("/dialInNumbers")
    def dial_in_numbers():
        """Return our available phone numbers"""
        return jsonify({
        "message": "Phone numbers available.",
        "numbers": PHONE_NUMBERS,
        "numbersEnabled": True})

In this code, we use our environment to set the phone numbers we want to use. The format used is a JSON object. Showing an example is probably easier than explaining it:

    {"US":
      ["+1.555.555.1212"]}

You can see we have a mapping of country codes and a list of numbers. The formatting of the numbers is entirely up to you.

Setting Call-Out

At this point we can do everything a standard call-in phone conference can do, but we can also optionally allow for call-outs, which is to say that we can initiate a phone call from inside a conference.

This can be useful if you're needing to contact someone directly and don't want to go through the dance of having them call in. But because this can also be used to initiate calls, it's advised that this only be enabled on Jitsi installatiosn that have authentication turned on!

With that warning out of the way, let's make a new endpoint!

    @app.route('/callOut')
    def call_out():
        "Make an outgoing call"
        caller_id = request.args['callerId']
        to = request.args['To']
        to_formatted = to.split('@')[0].split(':')[1]
        resp = VoiceResponse()
        resp.dial(to_formatted, caller_id=caller_id, answerOnBridge=True)
        return str(resp)

As you can see, it takes in two arguments, To and callerID. To contains a full SIP address, so what we need to is strip that out so it looks like a phone number, formatted in E.164 format, ie a + symbol, then the country code and phone number.

Twilio's policy is that the calledId must be a number we have associated with our account, ether that we bought from them or have verified. We'll supply that manually, but we could also be clever here and look at other factors in deciding which caller ID to supply. For example, we might have numbers in different countries and want to use the appropriate number for the country we're dialing out to. As long as the number is either through Twilio or verified with them, we can do that. In this case, though, I've simply supplied the calledId as an argument to the script in Twilio's SIP domain configuration.

The final bit of setup, then, is to go to your SIP domain (ie https://www.twilio.com/console/voice/sip/endpoints?), clicking on your SIP domain, then putting in the URL (ie https://example.com/callOut?calledId=+15555551212) in the "A CALL COMES IN" field.

That may seem a little confusing at first, but we need to think about it from the perspective of the SIP endpoint. It is what's getting a call, which is why it's considered "inbound" for it.

Final thoughts

And that's it! A fully functional system for both calling in and calling out with Twilio and Jitsi! All we had to do was write a tiny amount of glue code and viola, we have a powerful connection between our phone system and a conference system! If you don't use it much (like me) then the price for this is going to be fairly inexpensive and we didn't have to set up a PBX server like Asterisk, just a little web server!

There's certainly a lot more to do here if you want to turn this little toy into a "real application". You'll want to change the voices in your call-in to something pleasant, you'll want to set up real logging, and probably a more substantial database than what we have, but this should be a good launching point for a beginner.

Enjoy!

misc