Building my own Analytics Tracker

Building my own analytics tracker to personalize my APIs. A Barebone Analytics script to track impressions and interactions.

Building my own Analytics Tracker
Screenshot of a website's analytics dashboard.

Recently I ran into a problem for one of my APIs that I couldn't find the right tool for. The API provides personalized product data to customers, who display the results on their storefronts. My problem is, I rely on my customers' data to provide personalized results, and each customer track different data about their users.

I specifically needed to know, which of the products provided by my API were viewed (impressions) and which ones were clicked (interactions). Some of my customers track this data in Google Analytics, others in Matomo, others none at all. So how could I get to it without building a bunch of integrations? I could ask my customers to integrate one specific analytics solution, but this would impact their storefronts' performance & would need to be customized for each customer.

For better or worse, I set out to build my own custom analytics tool.

A tracker is just a javascript function

Analytics trackers are built to be script files which need to be added to your site's header. These kind of files are also called embeddable, since you embed them into your site. I haven't built a script before, I checked out some existing tools on Github. I use Plausible Analytics for some of my own projects and their tracker code is open source, so I took a look. Turns out, analytics tools are fairly lightweight scripts. They have to be, since they need to be loaded up on almost every page of a site to provide insights into the maintainer.

Plausible's tracker is less than 400 lines of typescript, more than half of which is comments and typescript types. When the script is initialized it gets configuration options from the current url, browser info & script attributes. This way it knows which page to track views for, what screen size the user's device has, etc.

The script defines a root function to track 'events'. Everything in the context of analytics is an event, but they can be boiled down to page views and button clicks. Pageviews are tracked when a page is first returned from the server & the script is loaded up with it, and also when client side navigation happens.

Click event tracking is more tricky, since to know that an event has occurred in the DOM (the html representation of what you see in the browser), you need to add 'listeners' to the elements of the html that you want to track. This means the script has to go through the page's source & add listeners to each <a> tag (internal or external links) and any other tag where interaction should be tracked.

Tracking is just your customers' browsers making API calls to your server

When an event is triggered, the tracking script sends an api call to a designated endpoint. This is a bog standard POST request with the necessary data embedded in the body. Since there might be a lot of these events tracked, it's important to not overwhelm the receiving server. Plausible can be self-hosted, so the receiving server is only dealing with the load of the customer's sites. However if we were to build an analytics solution that is multi-tenant (hosting multiple customers), we'd want a message queue. This could first receive a massive amount of api calls at once, then send them to our server in batches.

Building our my analytics tracker

So why build my own tracker? Basically, I want to avoid any customization & want my data sent directly to a message queue. I want to tell my customers to adjust their front-end and tag their storefront product with a custom attribute:

barebone-analytics-data-product-id

This attribute is what my script will be looking for. If it is present, it will track the element as viewed (impressions on first load) and watch it for clicks (interactions). If an element with this tag appears, for example if navigation happens or a search shows new results, it will also track the element as viewed (impressions on client side changes). This is all I want the script to do for a start.

Using Plausible's tracker as a template, I'll fit the script into a single file. Let's call it, bareboneanalytics-tracker.js. In this file, we'll do a few things:

  • Get attributes passed with the script as url parameters. These will be our way to specify a customer & environment.
  • Set up the impression and interaction tracking functions. These will each send api calls with a different event type.
  • Setting up a function to check for the previously mentioned barebone-analytics-data-product-id attribute.
  • Setting up an observer to listen to changes in the DOM and detect if new elements with the barebone-analytics-data-product-id attribute were added.
  • A function to clean up listeners when elements are removed.

Getting config settings

function getParamsFromScript() {
    const scripts = document.getElementsByTagName('script');
    for (let script of scripts) {
        if (script.src.includes('bareboneanalytics-tracker.js')) {
            const urlParams = new URLSearchParams(script.src.split('?')[1]);
            const params = {
                clientId: urlParams.get('client_id'),
                environment: urlParams.get('environment') || 'production'
            };
            console.log('Barebone Analytics: Parameters:', params);
            return params;
        }
    }
    console.warn('Barebone Analytics: Script "bareboneanalytics-tracker.js" not found.');
    return { clientId: null, environment: 'production' };
}

const { clientId, environment } = getParamsFromScript();

if (!clientId) {
    console.warn('Barebone Analytics: Tracking script disabled: client_id is missing.');
    return;
}

This will grab the client_id and environment url parameters from my script. If the script is loaded up as https://cdn.mydomain.com/bareboneanalytics-tracker.js these parameters would needed to be added on. This would be our full url:

https://cdn.mydomain.com/bareboneanalytics-tracker.js?client_id=client1&environment=uat

After this, we can define the basic tracking functions. These are just POST api calls:

const logUrl = 'https://your-server/api/track';

function trackEvent(eventType, data) {
    console.log(`Barebone Analytics: Logging ${eventType}:`, data);
    const bodyData = {
        clientId: clientId,
        eventType: eventType,
        environment: environment,
        timestamp: new Date().toISOString()
    };
    
    if (eventType === 'impression') {
        bodyData.productIds = data;
    } else if (eventType === 'click') {
        bodyData.productId = data;
    }

    fetch(logUrl, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(bodyData)
    })
    .catch(error => console.error(`Barebone Analytics: Error logging ${eventType}:`, error));
}

function trackImpressions(productIds) {
    trackEvent('impression', productIds);
}

function trackClicks(productId) {
    trackEvent('click', productId);
}

Next, we want the function to track clicks and impressions. This function will be added as a listener to elements we identify as products to track clicks.

function handleClick(event) {
    const productId = event.currentTarget.getAttribute('barebone-analytics-data-product-id');
    trackClicks(productId);
}

Now onto observing changes. Javascript has a built in MutationObserver class, which can be used to track changes in the DOM. This means elements added/removed. This is important since a lot of sites now load an initial state from the server-side to appease the SEO Gods, but they need fluid and fast client-side interaction as well. When you think of client-side interaction, think of searching on an ecommerce storefront or filtering the search results. Both of these are (ideally) client side interactions, which modify the DOM to add/remove elements based on the search results received from API calls.

When a change to the DOM happens, the MutationObserver's callback will be called and we'll get a list of mutations. We can check if nodes were added, since we're interested in tracking only first impressions.

If the node has the barebone-analytics-data-product-id attribute we're looking for, we ask the observer passed to the function to observe it. This is because we don't immediately want to track an impression, only when the element is in the user's viewport. We will set up the observer for this later. We also add an event listener, to listen for click events. This is important to track interactions.

For removed nodes, we remove listeners if they were observed before to avoid a ballooning list of observers that no longer have elements to observe.

Finally, we tell the mutationObserver that we want to observe the whole document body for changes.

function observeNewElements(observer) {
    const mutationObserver = new MutationObserver(mutations => {
        mutations.forEach(mutation => {
            mutation.addedNodes.forEach(node => {
                if (node.nodeType === Node.ELEMENT_NODE) {
                    const productElements = node.querySelectorAll('[barebone-analytics-data-product-id]');
                    if (node.hasAttribute('barebone-analytics-data-product-id')) {
                        observer.observe(node);
                        node.addEventListener('click', handleClick);
                        trackedElements.add({ element: node, event: handleClick });
                    }
                    productElements.forEach(product => {
                        observer.observe(product);
                        product.addEventListener('click', handleClick);
                        trackedElements.add({ element: product, event: handleClick });
                    });
                }
            });

            mutation.removedNodes.forEach(node => {
                if (node.nodeType === Node.ELEMENT_NODE) {
                    const productElements = node.querySelectorAll('[barebone-analytics-data-product-id]');
                    if (node.hasAttribute('barebone-analytics-data-product-id')) {
                        node.removeEventListener('click', handleClick);
                        trackedElements.delete({ element: node, event: handleClick });
                    }
                    productElements.forEach(product => {
                        product.removeEventListener('click', handleClick);
                        trackedElements.delete({ element: product, event: handleClick });
                    });
                }
            });
        });
    });

    mutationObserver.observe(document.body, {
        childList: true,
        subtree: true
    });

    observers.push(mutationObserver);
}

To tie it all together, we need to initialize tracking when the DOM is loaded. This function does a few things:

  1. Check the current DOM to see which products are in the server-side HTML & records them in the product variable.
  2. We define an observer function. This will keep track of visible products in the current user viewport. Using the built in IntersectionObserver, the script can observe viewport changes (scrolling) and list all the elements on the page (entries) when such a change happens. If the entry is in the viewport (entry.isIntersecting), we can check for the entry's attributes. If they match a pattern we're looking for (barebone-analytics-data-product-id), we can can run code here. In our case, we want to log first impression for each product id. To track this, we create a Set named impressionsLogged. A javascript Set can only have one item with the same value in it. In this case, it's a Set of strings, representing unique productIds we've already tracked. We also track visibleProducts, which we use to push impressions. For every scroll, this list is emptied.
  3. Before the observer is used for checking viewport changes, we used to record the first impressions. If there are any products in the product variable set in step 1, the script uses the observer to send impressions for them if they are visible within the viewport of the user.
  4. Finally, we keep track of our observers & call the observeNewElements function defined earlier. This is what will keep track of products in the viewport. Here we call it for the products which were already rendered on the server side.
function initTracking() {
    console.log('Barebone Analytics: Initializing tracking...');
    const products = document.querySelectorAll('[barebone-analytics-data-product-id]');
    const impressionsLogged = new Set();

    console.log('Barebone Analytics: Found products:', products);

    const observer = new IntersectionObserver((entries) => {
        const visibleProducts = [];

        entries.forEach(entry => {
            if (entry.isIntersecting) {
                const productId = entry.target.getAttribute('barebone-analytics-data-product-id');
                console.log('Barebone Analytics: Product visible:', productId);
                if (!impressionsLogged.has(productId)) {
                    visibleProducts.push(productId);
                    impressionsLogged.add(productId);
                }
            }
        });

        if (visibleProducts.length > 0) {
            console.log('Barebone Analytics: Visible products:', visibleProducts);
            trackImpressions(visibleProducts);
        }
    }, {
        root: null,
        rootMargin: '0px',
        threshold: 0.5  // Trigger when 50% of the product is visible
    });

    products.forEach(product => {
        observer.observe(product);
        product.addEventListener('click', handleClick);
        trackedElements.add({ element: product, event: handleClick });
    });

    observers.push(observer);
    observeNewElements(observer);
}

There's a few variables here & there that I didn't detail. Check out the full script here:

GitHub - bzatrok/bareboneanalytics-tracker: A barebones tracking script to observe clicks and impressions of DOM elements.
A barebones tracking script to observe clicks and impressions of DOM elements. - bzatrok/bareboneanalytics-tracker

You can self-host this script on your own server & ideally point a CDN to it. After specifying a log url on a server of your choice, you need to make sure to handle the POST requests sent to it. These will contain the following payload:

{
    clientId: '7b14efbf-fd05-4f0f-aebe-4bc25069ac27',
    eventType: 'impression',
    environment: 'uat',
    timestamp: '2024-07-22T17:26:48Z',
    productIds: ['1','2','3']
}

I'll skip handling the data for now, we can cover that in another article. Hope following along was interesting or it helped you! If you like the stuff I write about, buy me a beer!

Does your company want to use analytics in their applications? Get in touch via my website: https://zatrok.com/