Cookies, Local Storage, Session Storage, and Session is a favorite topic for Interviews. It’s not possible to include all so this is just about cookies. I will post another article as continuity on these topics.
This is not just an article for front end programming rather it is to understand how the web works.
Many developers have sallow knowledge of cookies in general and more importantly third parties cookies which is responsible for all these ads that appear on sites.
Before anything, the only thing that you need to know in order to understand this is that “HTTP protocol which we use to browser internet is stateless”. what this means is that every request is treated and executed independently without knowledge of previous requests executed. Let me try it again in another way, the reason for calling it stateless is that when the transaction between your computer and server ends, the connection between your browser and server also ends.
For example – Suppose you logged into Facebook.
When you enter the username and password and pressed the login button, your browser sends your credentials to the server which verifies you and sends the requested page.
Now, when you make another request to that same server, it has forgotten who are you and asks you again to log in because the HTTP request is stateless. Like I said earlier, it doesn’t remember the last transaction and doesn’t know who are you. Can you imagine how pathetic that can be to log in every time when you click on something on Facebook?
Here come the COOKIES to rescue us !!
So what are cookies?
A cookie is just a small file containing helpful information about you and your preferences on that website. Here, when you send login credentials to server clicking log in button server not only response with requested content to you rather also sends a cookie to your browser. The cookie is then stored on your computer and submitted to the server with every request you make to the website. A cookie is not only just for login. Let’s illustrate this with another example: You selected a language Spanish on a website. A website will then save your language preference in a small document i.e cookie on browser or computer.
Each time you revisit that same website, your browser sends that cookie along with the web request. On the internet, every request is an independent request to a server. When you revisit that website, the website will read the cookie and send the requested page as per your preference. This can be stated in another way that the website remembers your language and let you view the website in Spanish without having you to select the language Spanish again.
A cookie is not limited to this, the cookie can store diverse information like the number of times you visited the website, preferred layout to the website, item in your carts, links you clicked on the website.
What is saved on the cookie is up to the creator of the website you are visiting.
Another thing that should be known for sure is that there are limits to who can read the cookies.
Cookie for one website can’t be read by another website. For example, the language preference you made on the website earlier cannot be known and read by another website that you open in your browser. Only the website that saves the information in the cookie can read or access it. Initially, cookie became so popular to store so much variety of data as it could help the developer to show a website that better suits the need of the users.
With more additional information in the cookie, the size became a big issue so the developer came with an easy solution to store id (identifier) in your cookie and store other information in the server instead. This way website could store an unlimited amount of information or data in the server. Now the cookie will serve as an identifier to your computer. The website sees your computer as a tag and looks up your data on the site or server rather than on your computer. So, a cookie will typically contain the name of the domain from which the cookie has come, the “lifetime” of the cookie, and a value, usually a randomly generated unique number.
This was the breakthrough to the Third-party Cookie.
Summary of what I said till now:
A cookie can have a simple function like remembering login detail for the specific website so that you can go out and re-enter without login. A common use of the cookie is to store the session id when you log in to a site. A cookie is put on your computer by the site when you first visit it. Then with every click that you make on the site, a cookie with session-id is sent back to the site from your computer and the site uses that to confirm that you are ‘logged in’.
Now you are here mean you already know that cookie is meant to improve your experience online from the above content. Basically, there are two types of cookies: the first party and the third party ( there is also a 2nd party but this is not talked much about ). There is a difference between first party and third party cookies. At the basic technical level they are both the same i.e they can do the same thing, carry the same kind of information and are intended to perform some kind of function but what they differ is in their application.
A third-party cookie is a non-domain created usually placed by the advertiser for advertising goals so that they can retarget you based on your behavior online i.e those irritating ads that follow you all around the internet. Third-party is a one that contributes some content to the web page like an image that is not located on the same website you are visiting. Third
There are many restrictions on the usage of cookies. Most browsers restrict the number of cookies to 300 and they cannot contain a lot of data which is generally 4096 bytes. The biggest restriction is that cookies set by one website cannot be accessed by another website i.e it is scoped to a domain name.
If the cookie set by one website cannot be accessed by another website, how can facebook keep track of what site we visit?
I have seen many developers and interview candidates have no clue of this:
Remember that when you visit a website you don’t get the whole web content all at a once. Your browser makes a separate request for each image or anything else embedded in the website. So any request going to the third party may get your cookies from those third parties.
When you go to Amazon.com to shop, Amazon pages refer to DoubleClick.net which is the third party. So, when you load a page from Amazon.com, your browser sees the reference on that page to something on another domain and sends the request. You get back the ads and cookies from DoubleClick.net. Now, let’s say you are searching for information on certain disease condition and you end up visiting a site called diseaseCHECK.com (am just making this name of the website), and they also use a DoubleClick.net, so you get a webpage from diseaseCHECK. Your browser again requests the ads content from DoubleClick sending back the exact cookie that you used while shopping in Amazon earlier. If that cookie uniquely identifies you, it nows knows about your disease conditions and your shopping habits.
Third-party cookies can link all that web browsing activity together.
DoubleClick is a subsidy of Google. DoubleClick is used by sites all over the net like youtube, OverStock.com, etc (i am just saying :P).
I am going to giving you another example of a third-party cookie but this time it’s more common than DoubleClick.
Suppose, you logged into Facebook, it is obvious now that your computer will get cookies from Facebook which will be stored in your computer hard disk. The cookie is bound to Facebook’s domain ( facebook.com) which means facebook.com can only read what is in that cookie.
Let’s say you browse away and land into another page. The blog cannot read the Facebook cookie because it is out of the blog’s scope. Facebook also can’t know that you are in this blog. Let’s say that owner of this blog has put Facebook like a button in this blog. To render/show like button and make it work, the blog should download some code from Facebook servers. Now when it’s talking with Facebook, it sends the cookie that Facebook set earlier on your computer. Now, Facebook knows who you are and that you visited this blog.
I gave you an example of Amazon and Facebook on how they track us over the internet. This is just two cases, many companies track us on the internet with these techniques.
The logic is simple, to do this convince as many websites to put a piece of your code that makes to connect you. Facebook and other social media like Twitter, Linkedin, etc have this easy because many websites prefer linkage with them like Facebook like button, share button, etc. Along with these, Google also has an easy job on this because many websites rely on google for advertisement or google analytics.
If you think one website has just one first-party and at max one third party cookie, please have a look at this image above. Visit link https://webcookies.org/ to check more. These lists are the number of third party cookies that are being delivered by sites. Some are sending 140 on this list. This is just a sample list of websites, there are many sites that have set more than that third party cookies. But this means nothing more than that site can link to multiple third-party domains. The site you visit may have more or less than these.
This is an example and I want you to visualize that I said above.
This is the case of dictionary.com.