Quick tour YouTube algorithm: Two days ago, a dozen researchers tried our tool.
Video clips that contain political content are not treated the same as videos with non-political messages
The same video was played with twelve students under different conditions. We used the browser their daily use (logged in through Google) as well as a newer one. Below, they are referred to as the clean browser or Youtube account. In the second, we expected to witness the benefits of personalization that is more precise and focused. Our aim was to evaluate the related videos and see what the differences were.
A-political video can be viewed in clean internet browsers or via our Youtube account
YouTube will suggest videos to you as being “Recommended for your attention” while you view a nonpolitical video.
If we change from the “clean”, to our personal accounts-logged browser which contains more information to assist us in making decisions, then the percentage of “For You” recommendations increases. Here are some examples of “for-you” video related to the cleaner browser (7%): The percentage rises when you view the same video using a personal browser. The test video was called “Cutest Cat Combination”.
Expected result – The years of behavioral surveillance that has been accumulated by browsers who log into Youtube are passed on to the next generation. Correctly, we expect to see more yellow dots and greater diversity.
Unexpected insight – Even though our browsers are clean, we can still see the four suggested videos. It happened for students with a non English system language. YouTube is notified by the browser. Google uses one of their 20 suggestions as a way to provide content in the user’s native language. The personalization of suggestions is done even though there aren’t cookies.
On a browser clean, or using our Youtube account, we can view a political video.
It is obvious visually that video clips containing political content trigger different rules. Exactly recommends “For” when you are looking at a video about politics
Now we move on to the next part of the experiment, which was watching a video about politics. The differences in the personal and clean settings were even more apparent. This case, we didn’t have any “for you” videos. Youtube appears to not want to provide any information on this topic (perhaps because they don’t make mistakes). Fast half of videos that are shown on Youtube’s personal account can be played. One explanation is that the platform may personalize related videos on politically polarized (and other) topics if it has certain data. The used video was “Philip Hammond — no deal Brexit”.
Weird insights YouTube has a good idea of what is political. However, we are still curious as to whether this topic classification is applicable in other languages.
The official APIs for algorithm analysis are not reliable
YouTube verified the related videos using their API. Next, they compared them with what actually appears on YouTube.
The related videos are shown in red circles. These videos are actually recommended to be watched in yellow or green.
This analysis mainly confirms the importance of using only user-centric observation for algorithm analysis. YouTube API’s data shows that “” is what the API says, but empirical measurements reveal that 14 videos (the overlapped area) were actually recommended to viewers. Because the video representations of Green & Yellow refer to people personalized suggestions, and research wouldn’t be able or willing to make any guesses, researchers can only find the yellow circles when they are shown to people who are exposed to them. This was performed using “Gangnam-style – PSY”.
Testing YouTube should not be trusted. No matter how transparent APIs to researchers promise, they aren’t trustworthy. The GDPR compliance management of data and passive scraping will allow you to examine how algorithms treat you.
Every second, click is counted as one data point
First image. After watching Fox and CNN 20 seconds, there are only three suggested videos for CNN watchers. There is no recommendation for any other groups (blue spots). It is possible to see both a large number of suggested videos for both groups (grey areas) as well as some specific suggestions for one person (light blue spots).
Second image. Extend the watch time until 2 minutes. In this second image we can see eleven suggestions to at least two CNN Users and six for Fox.
The developers have made it clear that watching time per play is the primary factor in determining the suggestion list. In fact, this is one of best methods to appreciate the video and determine whether a suggestion has been useful. For future studies, it’s crucial to know how YouTube collects this data. You might find other types interactions on YouTube. We need more information to confirm this.
“How algorithms control your lives”
Our secret algorithm for creating related videos maximizes engagement.
However, although algorithmic abuse is a well-known problem over the years, it could appear that surveillance capitalism is not being addressed. Tracking Exposed is a plan that can be implemented and includes experiences from testing Facebook algorithms black-box.
YouTube is our new support platform
Our umbrella project
Note: Every person is unique and personalization can be very different for them.
Our society should be open to observing and considering how algorithms can impact our lives. Do not depend on experts to tell you how something affects you.
Why are we different?
Our browser extension will passively collect information about your customized experience. This evidence is yours to use and you can decide what you wish to share.
Install youtube.tracking.exposed browser extension
However, there are legitimate privacy concerns.
- You are in complete control of all your data. (TODO – We have yet to implement an interface so developers cannot completely erase their data at this time).
- This code is fully auditable and free of charge.
- We have public funding, but it isn’t in our sustainability plan to exploit the experiences of others.
- Anonymous users can use the cryptographic secret keys generated in your browser to gain access to their data.
- We’re GDPR-compliant (TODO: but don’t have the right description as to how or why we process data).
- The purpose of this tool is to help individuals and groups see the impact personalization algorithms have on their lives.
- You can also use your own profiles in the tool. To do: documenting how you refresh the cryptographic token and using a “clean” browser is a task we are still working on.
Check out our experiment and share your results with others!
Methodology: A new Brave browser is installed, but no cookies nor login from YouTube
10 students each open the video in the same window simultaneously. These 20 video suggestions are compared.
Each of the suggested videos is represented by a violet bubble at the middle. The suggestions, although few in number, share many links since most are identical across all observers. This is just a start: Google’s treatment of users in the event that they don’t know anything regarding a profile. Naturally, they know something about the user (location of the connection and operating system model. They also know default language. This was done to try to minimize these differences and create something comparable to a stage of an unpersonalized algorithm.
Verified expect: A large number of users share their suggestions, as Google’s data points for personalizing the suggestions is reduced.
Visually it’s clear to see how data points related to profiles can cause personal recommendations.
It is much harder to find videos that aren’t in the center. These videos are usually unique to each person and are indicated by the green dots.
The consensus is that the vast majority of the suggested ideas are shared with users. Google’s data point usage to personalize these suggestions has been reduced.
This is how our experiments started.
Although the experiments took only a few hours to complete, similar tests can be replicated by a team. This could include your classmates, colleagues, friends and family. The research question: of the numerous data points that Google observes, what makes the pattern shift to the personalized scenario?