Visiting a website entails a kneejerk-like reaction: Do I trust this page enough to stay? Here, we show a protoype how to predict the trustworthiness of a website and why we see such ideas as important.
It is milliseconds that decide.
Imagine you just followed a link and landed on a webpage. Within milliseconds you will decide whether this is a good website. Whether it will contain the information you are looking for. The product you want. Whether it is accessible, quick, well-designed. Whether it is trustworthy. Whether you will stay on that website or simply go back and choose something else.
Milliseconds decide about billions of Euros in the online world. But very few initiatives really venture into the depth of psychological understanding of design, user experience and website user experience. Into the world of user perception.
But we do.
The innate, emotional, knee-jerk like reaction of humans to input stimuli is a huge factor in making decisions. Whether you believe a statement, like a person or think that something is visually appealing - most of these are usually decided within seconds.
So why shouldn’t it also be part of the many factors that make up a “good” website?
To me, there is no good reason why we shouldn’t consider human perception as a factor of conversion and success. It is at the core of asking the question “why” when discussing the ROI of money spent, of visitor satisfaction, brand identification and much more. Human perception of websites might be one of the biggest contributors to the success of a website (no source).
So how can we measure and quantify human perception? Well, first we have to decide on a specific emotional result of visual perception. That could be the trustworthiness, accessibility, legal aspects, visual appeal or simply authority of a website. The perception dimension matters.
Knowing what we want to understand better, we need to observe the human response to that specific emotion, before we can try to replicate it, which will be much harder and time-consuming than pure descriptive reporting. So how can we do that?
At the Ryte Data Lab we are trying to work to expand the borders of currently established solutions. One of those initiatives is the rapid generation of new prototypes for market validation. The trustworthiness classification of websites is one of those prototypes.
Our goal with this project is to predict whether a human will perceive a website as trustworthy. To achieve this, we had to iterate through a number of steps:
Let’s dive into some details of those steps.
Trustworthiness classification is a prototype. But we have other great features which you can already use:
When using classification tasks in machine learning, we usually need a training set, the so-called “labels”. Labels are annotations of the data set; in our case, whether a website is perceived as trustworthy by a human or not.
To get those labels, especially in a numerous fashion, we started by taking screenshots of thousands of websites. Those screenshots were our training set, which we gave out to labelling. To get a broad estimation of whether a website was trustworthy, we were not satisfied with one rating per website. So we established a rating process which let multiple people rate each website, so we could obtain the average rating, but also the variance (i.e. how undecided people were about a website).
Ultimately, we ended up with tens of thousands of trustworthiness ratings of websites on a scale from 1 to 5. This decent sized data set was the basis for the next step - analysis and the use of artificial intelligence. But don’t worry, obviously all of this data gathering was (nearly) fully automated, so no interns were harmed in the process.
Producing data is the one part, but utilising it is the more important one. So we utilised the magic of Data Science for our next steps. First, we gathered, filtered, interpreted and prepared the data. A couple of important steps covered the identification of how we interpret “trustworthiness” - is a medium rating something we should rely on? What about outliers in either direction? What about cases with a high variance of ratings?
These and many more decisions had to be made to balance the quality of the data on the one, but the quantity of data on the other hand. Once prepared, we utilised our favorite cloud service provider to implement and train a neural network.
A neural network for image classification, as it is the task in our case (“is this website trustworthy or not?”), has the advantage of understanding “components” of the input. Through the different layers of deep learning algorithms it can generalise and/or specialise in the input interpretation.
More practically that means in our case that for example very bright color might be an indicator for high trustworthiness; or it might be text in specific areas, it might be pictures of people or many other things. So the network generates an “understanding” of what leads to a high trustworthiness rating and trains itself to map input as close as possible to the correct rating. Once finished, we have a new tool to our disposal: Trustworthiness prediction!
But who would we be if we wouldn’t reveal some juicy results instead of just showing the process. The first question usually is: how accurate is the prediction of your model? We achieve, with a very first prototype, already an accuracy of about 80%. That means, if you would guess, you would be 30% worse than our prediction. This is without any noteworthy optimisation!
Yet, what is more interesting to most readers are some real examples I guess. Let’s start with some very positively classified sites:
On to some examples from the other side of the realm, namely “not trustworthy” sites:
Together, these results are very reasonable already in this early stage. When evaluating the classification qualitatively, we do see why some sites were sorted into trustworthy and others were not.
But the fun begins when digging deeper and constructing artificial case examples. Did you know that a white background is considered trustworthy, while dark colors are not? That leopard pattern - funnily enough - is well-received by the model? Images, changing colors are perceived as trustworthy. Little text and few content? Not so much. These edge cases allow us to peek into the state to which the model has been trained.
In short, we have constructed a first prototype that maps real innate human behavior of trustworthiness perception to a mathematical model. It is a very delicate endeavour to replicate human behavior; but we are happy with this first, tiny proof of concept, while we obviously already work on the next iterations.
To round this article up, let’s take a minute to discuss why we at Ryte think that endeavours such as a Trustworthiness Classification for Websites is worth our time and energy. Not only does it make for a fancy use case of artificial intelligence, but we are convinced that the whole digital world will move more and more into the realms of real human behavior recording, mapping and simulation. So the value of success in the digital world will be based on understanding and utilising real human behavior.
Let’s just assume that we will continue this prototype and identify patterns of website design which make users more likely to stay on a site or - more importantly - convert, as they trust and value the site. The better we understand real human reactions and behavior, the better we are equipped to improve their experience. More improvement means better experience. Better experience means more success.
I started off with the ludicrous assumption that every Millisecond is worth Billions. As humanity continuously moves towards a more and more digital world, this statement gains in truth. Each millisecond will decide about a positive website user experience. And the better a company is equipped to improve website user experience, the higher is it’s chance to satisfy their users’ needs, save them time, money and make them feel well-cared for. If users feel appreciated, understood and safe, they will convert and return for additional purchases or services. Because they build up trust. Trust, which humans have towards digital assets.
In the end, it really is milliseconds that decide about Billions. It is milliseconds that will shape the future of success. And we are working hard to give you seconds.
Interested to be the first to try out new features?
Published on 11/02/2021 by Kilian Semmelmann.
Who writes here
Kilian joined Ryte as Head of Data Science in 2021. He is responsible for growing the organisation into a data-driven company and provide our customers with innovative approaches to extract value from data.