The Google Analytics 4 Session: How it’s calculated

Google analytics GA4 sessions are calculated using the user_pseudo_id and the session_id

How are Google Analytics 4 sessions calculated?

A common question in marketing is, “How many people visited my website last month?” the answer should be relatively simple. Most marketing analysts would use the go-to metric they have always used, and the stakeholders wouldn’t expect anything else: the Google Analytics 4 session.

If we break down the question, we are asking how many “people” visited our website. This is where we can explore it further, and the ambiguity this creates leads to debates and consistent uncertainty when talking about the attribution of marketing spend to conversions.

What is a GA4 session?

In technical terms, the session is a count of the unique values of the concatenation of the user_pseudo_id and the session_id. What does that mean? In the simplest terms, it’s a count of the number of times the website was accessed.

The user_pseudo_id is the identifier generated when the user enters the website for the first time, and it is what GA4 reads as a cookie to recognize a returning visitor. These are assigned at a user level. So if the cookies aren’t cleared, and the person visits again, they will be classified as the same user.

The session_id is the timestamp of the session_start event, which occurs each time there has been a period of inactivity for 30 minutes or more (generally speaking).

The combination of these two equals a session count of 1.

Traffic volume can be defined by User ID, User Pseudo ID or Session ID

Google Analytics calculates the traffic volume using either User ID, user_pseudo_id or the session ID. This depends on the reporting identity of the GA4 property.
The three identifiers Google Analytics 4 uses to define traffic.

In this example taken from the Google Analytics sample data, the user_id is our unique identifier we have given to a user. In a real website, this would be the unique ID for signing up for emails, creating an account, or any other ways of collecting first-party data.

We concatenate user_pseudo_id and session_id to get a unique value for each session starting (as the session ID, by its nature as a timestamp, could be duplicated), giving us what I would call the “traffic volume” of a website in any particular period of time, which is useful for two things:

  • Real-time/ short term tracking – How many visitors do we have right now?
  • Measuring behaviour – How many sessions does a user take to convert?

We can see here by pivoting the table, the number of sessions per user_pseudo_id can be gained from doing this simple analysis, giving us insights into user behaviour, for example, how many touchpoints it takes to make a purchase.

Becuase session ID is one level down from the user_pseudo_id, there is a many to one relationship. There could be multiple session IDs per user_pseudo_id
A user_pseudo_id could have many sessions mapped to it.

So What?

In terms of useful actions that can be taken based on this insight, we can identify the channels that attract the highest number of converting users to our website, using first-click attribution to identify the first channel that drives a sequence of touchpoints leading to conversion. We can also recreate custom attribution models by exploring the contribution of each of these sessions (and their traffic sources) to a conversion, since we can tie them to a sequence of interactions.

Why don’t we use “Users” more as marketers?

Why are we overcomplicating this process? By counting users (the user_pseudo_id), we collect a figure that is closer to the number of individuals interacting with our website. This is simply the number of times that GA4 has either assigned or read a unique cookie.

“Users” are just as easily accessible as GA4 sessions in the interface, and was accessible in GA3 too. One of the answers to the above question could be the fact it just looks more favourable to growth. By its nature, the sessions metric is higher than users, so presenting the larger figure to senior stakeholders gives the impression of more efficient spending and, therefore, better performance.

Last Click Attribution forces us to use GA4 sessions as a primary metric.

Another reason is the usage of last-click attribution, which favours the high-spending, closing channels such as pay-per-click or display. This is perfect for the managers of such accounts to achieve a high Return on Ad Spend (ROAS), and for Google, who happily sell clicks to marketers, and who will believe a certain percentage of those clicks guarantees a conversion.

By using the last-click model, we are voluntarily limiting ourselves by having a tiny snapshot in time where we have identified a user, and by tracking session-level metrics, marketers will only ever limit themselves to feeding off scraps of partial information, which are at best an index of performance (not representative of actual performance), or at worst, simply misleading.

So we have identified utilizing users as the partial solution, in combination with first-click attribution. However, this is by no means the solution we need. This is still limited by the fragility of cookies, and there are far better ways of determining the true number of people who visited the website.

Sessions don’t represent the truth

If we look at the chart below, this shows the user_pseudo_ids grouped by the User IDs with their respective session counts, demonstrating the potentially misleading nature of both counting methods.

User ID is the highest level identifier, and each user ID could have many user_pseudo_ids and session IDs mapped under them.

What does this mean?

In this example,

  • The user ID 5843 contains within it 5 distinct user-pseudo_ids, what we would call users in GA4, but it also has 8 individual sessions attributed to it.
  • User 5843 visited 4 times on one device where their cookies were not deleted, but then visited on 4 other occasions, all with separate cookie information, meaning this tracking journey was broken (if we didn’t collect their user ID).
  • User 9312 visited 7 times, each time the cookie tracking failed, which is a very common thing to happen in the modern privacy environment.

This has implications not only for the attribution of marketing conversions but also for simply understanding our question: how many people visited our website?

  • According to our user IDs in this example, we have 2.
  • According to the basic user level cookie tracking, we have 12.
  • According to the session counts, we have 15.

Which option do you think is the correct approach?

The collection of the user ID or any unique identifier is the fundamental principle of a first-party data strategy, which should be embedded into all businesses’ marketing operations to gain a true understanding of performance and not chase the standard Google Analytics 4 session as our holy grail of metrics, it is definitely not!

The good thing is, it is in our power to enable the next generation of marketing analytics.


Contact me to start your data transformation journey today.

Or connect with me on LinkedIn