Emacsen's Blog

ReviewPub Part 1: Introduction and Models

This is the first part of a multi-part series on the development of ReviewPub, a site for federated online reviews. The code will eventually be released in an online git reposotiry, but more importantly than the code itself is the process that we'll go through together to develop it.

The idea of this project came out of talking with my friend and co-host of the podcast Libre Lounge, Chris Webber. Chris is the co-chair of the W3C Social Committee, the group that developed the ActivityPub standard. ActivityPub, for those who aren't familiar, is the protocol that runs the modern Fediverse, a collection of social applications that federate and inter-communicate.

From working with Chris on the show, I had the idea that I should develop a new ActivityPub application that could illustrate what developing an ActivityPub application looked like. It would start simple and then could be built on so we could explore the topic in depth and even push the envelope of ActivityPub itself.

To that end, I decied to set out some guidelines for this software, which were that I wanted it to be a traditional web application using traditional web application tools such as Python and Django. In addition, I would use a traditional SQL database. The code would be easy to understand and be primarily a teaching tool rather than necessarily as a production service.

The ideal audience for these articles are developers who have built web applications before and who can therefore follow the steps involved. There's no need for having used Python or Django to understand the concepts presented, but we will use many Python and Django features, so a background in them is helpful.

The Subject-Verb-Object Message Structure

In ActivityPub, every message contains a subject, a verb and an object, along with some additional metadata about each.

Setup

As this is Python/Django, we need to do some basic setup. We'll create a models.py file inside our reviewpub/reviewpub directory, and then do some ritual to import modules we'll want to use.

from django.db import models
from django.conf import settings
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
import jsonfield

User = settings.AUTH_USER_MODEL

If you've developed Django before, this should be very familiar to you. I am using a non-standard module called jsonfield. It's use to serialize json to a database. In a more real system, I'd use Postgres's native JSON type, which actually stores JSON as a tree structure and is indexable, but for SQLite jsonfield will do fine.

The other thing that may be unusual is the way I'm setting User. That's because there are times when someone may want to modify the AUTH_USER_MODEL. This way they can do so but it's still looks like regular User throughout the code.

Actors

The subject of a message is an Actor, which is roughly analogous to what would be considered an account in other systems. Let's begin to model this in our Django application. Inside reviewpub/reviewpub/models.py let's create an Actor model.

class Actor(models.Model):
    ap_id = models.URLField(unique=True)
    inbox = models.URLField()
    outbox = models.URLField()

THis minimal example is all that would be required for us to have an Actor model. Every actor must have an ID page, an inbox and an outbox. Technically all the fields could use any URI (not use URLs) but since we're only designing the application to support HTTP, this should be fine.

At the same time, we may want to store some additional fields about a user, such as thier prefered displayname.

At that point, the Actor model starts to also sound a bit like a profile that would apply equally to local and remote accounts, so we should extend it to handle both:

class Actor(models.Model):
    ap_id = models.URLField(unique=True)
    inbox = models.URLField()
    outbox = models.URLField()
    displayname = models.CharField(max_length=200)
    user = models.OneToOneField(User, null=True,
        blank=True, on_delete=models.CASCADE)

There are a few things to note here. The first is that we're making a one-to-one association between users and actors, but we're allowing the field to be null. A null field would indicate that this actor is remote.

We're also using the field ap_id (ActivityPub ID) rather than id because while id would work for remote actors, we don't necessarily want to compute our local actor's URLs.

To drive this point about local and remote actors being the same, we can add a couple of new functions, namely a function to know if a user is remote, and one that gives us the __str__ of an actor.

@property
def remote(self):
    if self.user:
        return True
    return False

def __str__(self):
    if self.remote:
        return f"Remote Actor {self.ap_id}"
    else:
        return f"Local Actor {self.user.username}"

Since we will want any new User to also get an Actor at create time, let's go ahead and make a signal for that.

@reciever(post_save, sender=settings.AUTH_USER_MODEL)
def create_or_update_actor(sender, instance, created, **kwargs):
    if created:
        Actor.objects.create(user=instance)
    instance.actor.save()

You see on this that the signal is on the AUTH_USER_MODEL rather than our Actor. It ensures that if a user is created that a correponding Actor is generated, though not necessarily the reverse.

Let's add one last method to our Actor for now, which is to ensure that if the Actor is remote that all its necessary fields are filled in, including the ap_id, inbox, and outbox.

def clean(self):
    if self.remote:
        errors = {}
        if not self.ap_id:
            errors['ap_id'] = ValidationError(_('Remote actors must have an ActvityPub ID`))
        if not self.inbox:
            errors['inbox'] = ValidationError(_('Remote actors must have an inbox'))
        if not self.outbox:
            errors['outbox'] = ValidationError(_('Remote actors must have an outbox'))
        if errors:
            raise ValidationError(errors)

Wrapping up actors, we've got a good deal of code, but most of it is just validation and other secondary methods to ensure things are done properly.

Objects

The next model we'll tackle in our Subject-Verb-Object structure will be objects.

ActivityPub is a bit differnet than other standards in two interesting ways. Firstly, it's extremely extensible in terms of the types of objects and activities it supports. All you need to do is define your own vocabulary and point to it.

Secondly, it also doesn't strict require that developers support all the types of objects and activities in the ActivityPub standard itself. This is somewhat unique because many standards require that you implement the entire thing to be compliant. ActivityPub doesn't do that, and in fact it would be a huge undertaking to implent all of ActivityPub/Activity Streams in the traditional MVC way.

So instead of thinking about ActivityPub, let's focus on our application and what it needs. In ReviewPub, people will be writing reviews, so for now let's represent that a simple block of text, like Mastadon or write.as would have.

This resembles the Acitivy Streams Note type, so let's implement that:

class Note(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    ap_id = models.URLField(unique=True, null=True, blank=True)
    body = models.TextField()

Here we have an Actor, who owns the object, the ActivityPub ID of the object, and the body of text that is our note.

It would be entirely fine if we used this model as is in our application, but I would argue that doing so would not fit the ActivtyPub way of doing things well, because while it is fine for representing data we generate internally, we may recieve objects from other servers that aren't notes.

For objects that our application isn't familiar with, we have several choices. One thing we could do is drop those objects on the floor, but that feels inherently wrong.

The nice thing about using ActivityPub (and Activity Streams) is that often a client can render the data from an object even if it doesn't fully understand it. It's like if you recieve a note from someone who specializes in a highly technical field, like a doctor or lawyer, they may use jargon that is unfamiliar to you, but you still get the jist of what they're saying. ActivityPub lets clients render what they can understand, even if they don't fully comprehend the message.

Because of this functionality, we should probably store these unknown objects in the database and send them over to the client so it can try to render them.

Let's imagine what that might look like:

class AP_Object(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    ap_id = models.URLField(unique=True, null=True, blank=True)
    raw = jsonfield.JSONfield()

We use the name AP_Object rather than Object because Object in Python refers to the core Object type and while it might be safe, we want to avoid re-using those kind of keywords as names.

This ActivityPub Object is very generic, and simply stores whatever it recieves as JSON in the database without trying to process it at all. In this example I'm using a jsonfield package but in a real system, both Postgres and MySQL both offer native JSON types.

Storing the data in this raw form is good for incoming data that will come to our inbox, but if we kept this representation for our own generated data, it would mean we'd be rendering our reviews and then storing the results in the database. This is icky as it breaks the MVC/MTV separation of model and representation that we expect in an modern web application

Ultimately, as a developer you will need to decide how you want to solve this, but luckily for me, Django will take care of some of these abstractions for me.

class AP_Object(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    ap_id = models.URLField(unique=True, null=True, blank=True)
    raw = jsonfield.JSONField(null=True, blank=True)

    class Meta:
        abstract = True

class Note(AP_Object):
    ap_type = 'Note'

    body = models.TextField()

Now I can have my cake and eat it too. I've even thrown in a static class property into Note to make it easier for myself later by letting me know what Avtivity Streams type I will want to render this model as.

There are some things that are conspiciously missing from this model like created_at and other standard fields. That's because those are really better represented as an Activity, the verb, rather than on the object itself. so let's go ahead and add an Activity to our abstract AP_Object.

1
2
3
4
5
6
7
8
9
#!/python
class AP_Object(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    ap_id = models.URLField(unique=True, null=True, blank=True)
    raw = jsonField(null=True, blank=True)
    activity = models.ForeignKeyField('Activity', on_delete=models.CASCADE)

    class Meta:
        abstract = True

That seems to cover all our bases.

Now we only have one last thing to do, which is to create a model for unknown objects. This is because Django doesn't let us instantiate abstract base classes directly. So let's make an inherited class just for that:

class Generic_AP_Object(AP_Object):
    ap_id = models.URLField(unique=True)

Now we can store notes as Note and anything else we recieve as an Generic_AP_Object. You'll notice we overrode the ap_id property. That's because for generic objects, they will always have come externally and therefore should always have a ActivityPub ID, so in our subclass, we don't allow it to be null.

We should also go ahead and add str methods to both Note and Generic_AP_Object.

For note, we'll just use the integrer ID of the object for now.

def __str__(self):
    return f"Note {self.id}"

And for generic objects, we'll use the ActivityPub ID.

def __str__(self):
    return f"Object {self.ap_id}"

Activities

Now that actors and objects are sorted, let's turn our attention to activities, which we said were like verbs.

Like objects, we have some choices to make in thinking about them. On one hand, there are a limited number of activities that we want to support right now, but on the other hand, we should try to make our software as generic as possible. Let's see if we can accomplish both by supporting the most common activity types first and then falling back to storing them in the database raw if we don't.

For now, let's just support three types of activities, create, update and delete.

CREATE = 0
UPDATE = 1
DELETE = 2

ACTIVITY_TYPES = (
    (CREATE, 'Create'),
    (UPDATE, 'Update'),
    (DELETE, 'Delete'))

class Activity(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    type = models.PositiveIntegerField(choices=ACTIVITY_TYPES, null=True, blank=True)
    raw = jsonfield.JSONField()

So far, so good, except we're not linking to the object...

This is where Django is going to come in and save our proverbial behinds with Generic Foreign Keys.

Generic Foreign Keys let me not worry about the specific table I'm linking to. Right now we only support Note and Unknown_Object but in the future we may have more.

Without Generic Foreign Keys, we would need to create an Activity type for each type of object we wanted to handle, or our Object type would need to be extremely generic.

As an aside, what may be becomming obvious is that these types of applications seem like good candidates for document stores, rather than traditional databases. A document store is more flexible for just this type of situation. But we're able to do much the same with our raw fields and at the same time not have to give up on the MVC/MTV framework of development and all the benefits it brings.

With that over, let's add the generic foreign key.

class Activity(models.Model):
    actor = models.ForeignKeyField(Actor, on_delete=models.CASCADE)
    type = models.PositiveIntegerField(choices=ACTIVITY_TYPES, null=True, blank=True)
    raw = jsonfield.JSONField()
    object_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    object = GenericForeignKey()

Now that this is done, we need to turn our attention to audiences.

ActivityPub messsages are directed at audiences, like email, they have a To and Cc attribute as well as the option to be public or not.

The way we express whether something is public or not is by setting to To to https://www.w3.org/ns/activitystreams#Public.

This is somewhat hard to remember so for our purpose, we'll set a binary flag.

public = models.BooleanType(default=False)

And now we'll turn our attention to the primary and secondary audiences. Since those are always actors, we can use a simple many to many relationship, which Django offers us out of the box.

to = models.ManyToManyField(Actor)
cc = models.ManyToManyField(Actor)

Now that this is done, I'm going to rename our actor property to from to better match the ActivityPub representation.

It's getting time to wrap up our models, but before we do, we should add valiation to our activity.

Unless the activity is remote, it should have come with an ActivityPub ID. And the easiest way to know if the activity was remote or not is if it came from a remote actor.

def clean(self):
    if self.from.remote and not self.ap_id:
        raise ValidationError(_(
            "Remote activities must have an ActivityPub ID"))

And finally, before we wrap up, we'll add the __str__ method.

def __str__(self):
    if self.from.remote:
        return f"Remote Activity {self.ap_uri}"
    else:
        return f"Local Activity {self.id}"

Review

Since we've been walking through the code step-by-step, it may appear to be a lot of lines, but actually it hasn't been all that much. We've just written code to serialize Actors, Activities and Objects, trying to strike a balance between traditional Django models and ActivityPub's flexible vocabulary.

Next Up

Next let's attach these models to the Django Admin console so we can make some actors, then we'll move onto making some views!

NOTE TO SELF:

The one thing I like least about this code is my own naming convention. Ideally everything with be prefixed with AP_ or nothing would be. In addition, I've considered replacing ap_id with uri to make it clearer even though it deviates from ActivityPub's naming conventions.

misc