March 26, 2023

How to add additional data to your source data

Source data is the original data you received in your application. Fetching data from an external server, loading data from a database, or reading data from a file are examples of source data.

At some point, you would need to derive more data based on that source data. For example, let’s say I have an author object that contains, among other things, a list of all the posts the author created—some are published and some are still in draft.

Let’s say that I want to display the number of published posts and the most popular post the author has published.

The simplest way I can implement this is by creating two functions that take the author object and return the needed result.

const author = {
  //...
  posts: [
    { title: 'Post title 1', numberOfViews: 100, status: 'published' },
    { title: 'Post title 2', numberOfViews: 0, status: 'draft' },
    { title: 'Post title 3', numberOfViews: 250, status: 'published' }
  ]
}

function getPostsByStatus(author, status) {
  return author.posts.filter((post) => post.status === status)
}

function getMostPopularPost(author) {
  return (
    getPostsByStatus(author, 'published').sort(
      (a, b) => b.numberOfViews - a.numberOfViews
    )?.[0] || null
  )
}

// Usage
const numberOfPublishedPosts = getPostsByStatus(author, 'published').length
const mostPopularPost = getMostPopularPost(author)

You might have noticed that I created getPostsByStatus instead of getNumberOfPublishedPosts. I did this because getPostsByStatus is used in other places, like getMostPopularPost, and getting the number is a matter of calling length on it.

Another reason I did this is because I want to show you a case where you might have duplicate logic when deriving the same data—in this case it’s passing 'published' for the status parameter and then reading .length from the result value, but it might be more complex in other examples.

What’s wrong with the above code?

Actually, there’s nothing wrong with the above code, but one thing I don’t like is that I need to repeat the same derivation code every time I need to get these additional data about the author object. So I need to look up the functions to get these data and use them according to my needs. In this case, I need to see how to get published posts, which is by using getPostsByStatus and providing the status to the second argument—and after that call .length.

This is a very simple example, and yet I think I should be able to access these data directly on the author object—like using author.numberOfPublishedPosts and author.mostPopularPost. Having this, will make the usage of that object more consistent across my whole codebase.

How to enrich your source data to have additional derived data

To do this, I need to create a function that takes the source data (the author object) and returns a new object with the additional fields.

function enrichAuthor(originalAuthor) {
  const author = structuredClone(originalAuthor)
  author.numberOfPublishedPosts = getPostsByStatus(author, 'published').length
  author.mostPopularPost = getMostPopularPost(author)
  return author
}

// Usage
const author = enrichAuthor(originalAuthor)
const numberOfPublishedPosts = author.numberOfPublishedPosts
const mostPopularPost = author.mostPopularPost

Notice how I cloned the author object using structuredClone before adding the additional fields to it. Doing this ensures that I don’t mutate the original object unintentionally—to avoid mutability issues.

Another great benefit I have now is that I can look at the body of enrichAuthor function and see all the additional fields in one place—instead of scattering them throughout the codebase.

An alternative

Sometimes you want the additional fields to reflect the new changes on the original author object. For example, if a new published post was added, author.numberOfPublishedPosts should show that.

Unfortunately, the above solution doesn’t support that, but there’s an alternative: encapsulate the record in a class.

class Author {
  #posts
  constructor(originalAuthor) {
    this.#posts = originalAuthor.posts
  }

  getPostsByStatus(status) {
    return this.#posts.filter((post) => post.status === status)
  }

  get mostPopularPost() {
    return (
      this.getPostsByStatus('published').sort(
        (a, b) => b.numberOfViews - a.numberOfViews
      )?.[0] || null
    )
  }

  get numberOfPublishedPosts() {
    return this.getPostsByStatus('published').length
  }
}

// Usage
const author = new Author(originalAuthor)
const numberOfPublishedPosts = author.numberOfPublishedPosts
const mostPopularPost = author.mostPopularPost

Now if you call numberOfPublishedPosts after the originalAuthor’s posts changes, you will see the correct value.

Which one to choose?

It depends on your programming style. Some prefer encapsulating records in a class and provide all the related functions to it. Others prefer the more functional style, which is the first approach.

However, if you expect the source data to change, then going with the record encapsulation is better.

Stay up-to-date on the latest projects and articles from me