Content Modeling 101

Content modeling is the underrated practice of figuring out how content is structured. It's vital to get this right when building software centered on client content. This guide is probably most useful for anyone working in the CMS space including developers, designers, and the users who will be editing content.

What is Content Modeling?

Content, like the stuff that fills a website, naturally has structure to it. It's hierarchical. It's interconnected. In some cases, it's actually quite messy. It's full of edge cases and irregular patterns. Content modeling is an informal way of locking down these structures into something everyone can agree on. It involves breaking down content into the smallest unit parts and bringing them together by their relationships.

The goal is to discover the structures within the content of the application.

A content model is likely going to be unique for each specific application. You're primarily concerned with populating the software with content to meet the requirements of the project and not necessarily to model your client's entire content ecosystem. That also means you may even be creating content structures that are new for your client because now they'll have this new software you're building that can better capture their true demand for content.

How is it Different from Data Modeling?

While the scope of the content model is limited by the needs of the current project, there shouldn't be any technical limitations imposed on the content model. It's meant to capture the essence of how content is structured and not to fit the out-of-the-box patterns provided by an ORM or the relationship constraints of a database. A content model should be accessible to anyone and written in plain language. A data model, on the other hand, goes deeper and captures the actual implementation of how content will be stored and accessed through an application. It may impose certain restrictions on the content structures for improving query performance. These added structural changes aren't actually a real reflection of the content. It's helpful to think of the content as separate from the technical details, but it's still good to keep in mind that some content structures are more difficult to implement than others.

Why do We Need a Content Model?

Content is at the heart of the business logic for content-rich applications. Business logic is that most volatile chunk of software that is subject to frequent changes and re-writes over and over until the stakeholders are satisfied. One of the best ways to avoid redundant development effort is to properly capture the client's business needs ahead of writing out the actual code. Content modeling provides that, at least for content-rich software.

No matter whether you formally build out a content model or not, it's already there inherent in the domain-specific requirements of the project. That's the nature of working with content. It would be an oversight to not consider the inner structures of the content apart from how it fits into your code.

Whatever comes out of the content modeling process can become a reference for future development. It can help guide conversations about the business logic of an application with non-technical stakeholders. It also becomes an anchor for how you talk about the software or even how you name your database tables and the data structures in code. As content requirements change, it's the foundation of how you might articulate what exactly about the content is changing.

Who Needs to be Involved?

You can look at the content model as the output of some requirements-gathering exercise: clients describe their needs, designers propose their solutions, developers lay out the technical capabilities, and project managers capture the agreed upon requirements. In short, everyone needs to be involved in order to successfully lock down the content model.

If anyone with significant impact on the project is left out, they might miss the rationale behind organizing content in certain ways. It could also hurt the project if certain invalid assumptions about content aren't aired out ahead of time. On the one hand, you're going to want to capture the model within a document, even if it's just a snapshot of a whiteboard. That does allow you to share the content model with anyone unable to be directly involved in the modeling process. On the other hand, the modeling process is sort of a contractual moment where all parties are literally coming to terms about what the content is going to be in the whole application. Giving input during this process is much more valuable than merely sending around the output, the content model, to everyone else.

The Process

This is an example process of the steps you might take to come up with a content model. You're needs as a company might be different. You should re-work this if you need to, but it's helpful to use the same process each time so everyone gets used to it.

When is the Right Time to Build a Content Model?

You want to go through this process prior to development, but even earlier if possible. Designers are going to need this content information just as much as developers. You also don't want to start too early, otherwise you run the risk of building out an incomplete model before anyone really had time to consider the full realization of the application might be.

You're also going to want to try to get the whole content model done in one or two sessions. It's a time-consuming process, but you don't want to leave too much time between sessions. You might be able to get away with splitting a content model into different phases, but that only works if you're also splitting the application development into different phases too.

1. Identifying Entities

Lay out whatever early concepts you may already have for a project. These can be visual aids like wireframes or just the names of parts of the application written on the drawing board.

As a team, start listing the names of the smallest content elements in the application, the entities. Entities should be simple like "book", "chapter", "author". You can think of these as the "nouns" of the project. Don't worry too much about the relationships between the entities right now, just list out everything you can think of. If you must, you can place semi-related entities near each other, but that's not important at this stage.

2. Drawing Relationships Between Entities

With all of the entities laid out, shift gears and start drawing lines between the entities. Maybe just start with the lines, but for each line, you're also going to need to describe how that relationship works. Be as verbose as possible here. No need to use jargon or shorthand to describe these relationships.

  • Is A related to just one B or can there be many Bs related to it? For example, you might ask if a book can have multiple authors.
  • Is the relationship bidirectional? For example, your application may allow specifying books similar to each book, but does that mean if you make book A similar to book B that B's own set of similar books will also contain book A?

3. Adding Fields to Entities

This is where the bulk of details comes into the content model. Describe all the different type of media and text associated with each individual entity. Give these fields relevant names. If there are constraints like some fields are required or the text can't be too long or whatever, note these details as well.

Sometimes you may discover that what you thought was just a field is actually a relationship to another entity. It's totally okay to draw that relationship line now.

4. Gathering Behavioral Needs

There may be other information to gather here as well. For example, you may have content that needs to be scheduled at different times. What pieces of content are subject to scheduling is certainly part of the content model and should be captured.

5. Locking it in

As this process was an exercise in settling on the language and structures to use for the project going forward, it's good to stress that this content model, like it or not, is going to become a permanent fixture in the project. If there is a name for an entity that really just does not sit will with anyone, now is the time to bring it up. It could also indicate that something deeper isn't correct with the content model.

Mistakes to Avoid

While content modeling ought to be an accessible exercise to anyone, there are some ways to get it wrong.

Avoid Structural Terms

The goal is to describe the content on its own. Terms like "layout" or "theme" don't really mean a lot outside of the context of the design of the application. Should the design change or the content need to live in other applications, you would struggle to carry over these design-specific content terms into new places.

Content is portable. That's just how we think about it. If you see one application with content like books and authors, you naturally expect that you could build a second application re-using that same set of content.

Don't Gloss Over the Relationships

One of the hardest things to get right is defining the relationships between pieces of content. What's worse is that after the software is developed and content starts getting added to the application, it becomes very difficult to re-work the relationships after this. Poorly understood relationships could result in costly data migrations down the road.

Exceptions Happen

It's inevitable that there will be things in the content model that just don't fit everyone's ideal. It's a process that demands compromise. There will alway be exceptional cases built into the fabric of the application and that's part of what makes building the application so attractive in the first place. It's custom-made to fit these weird project requirements. That's the point of it. A content model is never going to feel like you can tie a bow around it and call it done. It's going to evolve over time and it might even get messy. However, having the content modeling process to work this stuff out is a thousand times better than not being deliberate about it.

It Takes Practice

This stuff is truly difficult to get right. It's a process. Not everyone is prepared to think of the content in terms of entities, fields, and relationships. Not everyone is prepared to leave behind the visual language of design: headers, layouts, templates. That's also sort of why this process is so important. It's a chance to get everyone on the same page.

After building out content models for a few projects, you will see the rewards pay off when you aren't losing time to re-working the inner structures in the code. You'll also communicate better because all parties are using the same terminology in meetings and support tickets. As you start to appreciate these benefits, the content modeling process will make more sense and you'll get better as a team doing it.