{"id":2319,"date":"2025-06-11T12:16:05","date_gmt":"2025-06-11T06:46:05","guid":{"rendered":"https:\/\/texpertssolutions.com\/notes\/?p=2319"},"modified":"2025-06-26T14:52:29","modified_gmt":"2025-06-26T09:22:29","slug":"what-is-self-attention","status":"publish","type":"post","link":"https:\/\/texpertssolutions.com\/notes\/2025\/06\/11\/what-is-self-attention\/","title":{"rendered":"What is Self Attention?"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">\ud83e\udd14 What is Self-Attention?<\/h2>\n\n\n\n<p><strong>Self-Attention<\/strong> is a mechanism that allows a model to <strong>look at other parts of the same input<\/strong> when understanding a specific part.<\/p>\n\n\n\n<p>\ud83d\udc40 Basically:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Each word (or patch, in images) decides <strong>how much to pay attention<\/strong> to every other word in the same input \u2014 including itself!<\/p>\n<\/blockquote>\n\n\n\n<p>\ud83d\udcd6 Think of it like this:<br>When understanding the meaning of the word &#8220;<strong>bank<\/strong>&#8220;, the model looks at <strong>surrounding words<\/strong> to decide:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Is it \ud83c\udfe6 (money) or \ud83c\udfde\ufe0f (riverbank)?<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 Simple Example:<\/h2>\n\n\n\n<p><strong>Input sentence:<\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cThe cat sat on the mat.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>Let\u2019s focus on &#8220;<strong>cat<\/strong>&#8221; \ud83d\udc31<\/p>\n\n\n\n<p>With <strong>self-attention<\/strong>, the model asks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201cWhat do I need to know about the other words in the sentence to better understand \u2018cat\u2019?\u201d<\/li>\n<\/ul>\n\n\n\n<p>It might give <strong>attention scores<\/strong> like:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Word<\/th><th>Attention Score (to &#8220;cat&#8221;)<\/th><\/tr><\/thead><tbody><tr><td>The<\/td><td>0.1<\/td><\/tr><tr><td>cat<\/td><td>0.4 \u2705 (itself)<\/td><\/tr><tr><td>sat<\/td><td>0.3 \ud83e\ude91<\/td><\/tr><tr><td>on<\/td><td>0.05<\/td><\/tr><tr><td>the<\/td><td>0.05<\/td><\/tr><tr><td>mat<\/td><td>0.1 \ud83e\uddfa<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>So, &#8220;cat&#8221; mostly pays attention to itself and &#8220;sat&#8221; (because they\u2019re closely related). This helps the model <strong>understand relationships<\/strong> better \ud83e\udde9<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd0d Why Is It Called <em>Self<\/em>-Attention?<\/h2>\n\n\n\n<p>Because the model is <strong>attending to itself<\/strong> \u2014 each word (or input token) <strong>looks at all other tokens in the same sequence<\/strong>, including itself \ud83d\udd01<\/p>\n\n\n\n<p>It\u2019s like each word is thinking:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cHey, what do the rest of us mean together?\u201d \ud83e\udde0\ud83d\udcad<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\uddea Where Is Self-Attention Used?<\/h2>\n\n\n\n<p>\u2705 Transformers (BERT, GPT, etc.) \ud83e\udd16<br>\u2705 Vision Transformers (ViT) \ud83d\uddbc\ufe0f<br>\u2705 Text translation \ud83c\udf0d<br>\u2705 Chatbots &amp; summarization \u270d\ufe0f<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u2699\ufe0f How It Works (Quick Look)<\/h2>\n\n\n\n<p>Each word is turned into <strong>three vectors<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Query (Q)<\/strong> \u2753<\/li>\n\n\n\n<li><strong>Key (K)<\/strong> \ud83d\udddd\ufe0f<\/li>\n\n\n\n<li><strong>Value (V)<\/strong> \ud83d\udce6<\/li>\n<\/ul>\n\n\n\n<p>The model computes attention like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Attention(Q, K, V) = softmax(Q \u00d7 K\u1d40 \/ \u221ad) \u00d7 V\n<\/code><\/pre>\n\n\n\n<p>\ud83d\udcca This math helps decide how much <strong>focus<\/strong> (weight) to give each word in the sentence.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udff0 Summary Table<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>Self-Attention \ud83d\udd01<\/th><\/tr><\/thead><tbody><tr><td>\ud83e\udde0 Focuses on<\/td><td>All other tokens in the same input<\/td><\/tr><tr><td>\ud83d\udc41\ufe0f Learns<\/td><td>Word relationships &amp; context<\/td><\/tr><tr><td>\ud83d\udccd Used in<\/td><td>Transformers (text &amp; vision)<\/td><\/tr><tr><td>\ud83d\udca1 Helps with<\/td><td>Meaning, context, dependencies<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\u2705 TL;DR:<\/h3>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Self-Attention<\/strong> lets each word or token in a sequence <strong>pay attention<\/strong> to all others \u2014 to understand context, relationships, and meaning better. \ud83e\udde0\u2728<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud83e\udd14 What is Self-Attention? Self-Attention is a mechanism that allows a model to look at other &hellip; <a title=\"What is Self Attention?\" class=\"hm-read-more\" href=\"https:\/\/texpertssolutions.com\/notes\/2025\/06\/11\/what-is-self-attention\/\"><span class=\"screen-reader-text\">What is Self Attention?<\/span>Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":2348,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[641],"tags":[],"class_list":["post-2319","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-machine-learning"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/texpertssolutions.com\/notes\/wp-content\/uploads\/2025\/06\/5.png?fit=1280%2C720&ssl=1","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/posts\/2319","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/comments?post=2319"}],"version-history":[{"count":3,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/posts\/2319\/revisions"}],"predecessor-version":[{"id":2365,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/posts\/2319\/revisions\/2365"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/media\/2348"}],"wp:attachment":[{"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/media?parent=2319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/categories?post=2319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/texpertssolutions.com\/notes\/wp-json\/wp\/v2\/tags?post=2319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}