Nodes fail to start or get stuck in NotReady state with log nf_conntrack_max: permission denied
<liclass="md-nav__item">
<ahref="#workaround"class="md-nav__link">
Workaround
</a>
</li>
<liclass="md-nav__item">
<ahref="#fix"class="md-nav__link">
Fix
</a>
</li>
</ul>
</nav>
</li>
</ul>
@ -1134,7 +1178,7 @@
<h2id="running-behind-a-corporate-proxy">Running behind a corporate proxy<aclass="headerlink"href="#running-behind-a-corporate-proxy"title="Permanent link">¶</a></h2>
<p>Running k3d behind a corporate proxy can lead to some issues with k3d that have already been reported in more than one issue.<br/>
Some can be fixed by passing the <code>HTTP_PROXY</code> environment variables to k3d, some have to be fixed in docker’s <code>daemon.json</code> file and some are as easy as adding a volume mount.</p>
<h3id="pods-fail-to-start-x509-certificate-signed-by-unknown-authority">Pods fail to start: <code>x509: certificate signed by unknown authority</code><aclass="headerlink"href="#pods-fail-to-start-x509-certificate-signed-by-unknown-authority"title="Permanent link">¶</a></h3>
<h2id="pods-fail-to-start-x509-certificate-signed-by-unknown-authority">Pods fail to start: <code>x509: certificate signed by unknown authority</code><aclass="headerlink"href="#pods-fail-to-start-x509-certificate-signed-by-unknown-authority"title="Permanent link">¶</a></h2>
<ul>
<li>
<p>Example Error Message:</p>
@ -1147,7 +1191,7 @@ Some can be fixed by passing the <code>HTTP_PROXY</code> environment variables t
<li>Possible Solution: Mounting the CA Certificate from your host into the node containers at start time via <code>k3d cluster create --volume /path/to/your/certs.crt:/etc/ssl/certs/yourcert.crt</code></li>
<h3id="spurious-pid-entries-in-proc-after-deleting-k3d-cluster-with-shared-mounts">Spurious PID entries in <code>/proc</code> after deleting <code>k3d</code> cluster with shared mounts<aclass="headerlink"href="#spurious-pid-entries-in-proc-after-deleting-k3d-cluster-with-shared-mounts"title="Permanent link">¶</a></h3>
<h2id="spurious-pid-entries-in-proc-after-deleting-k3d-cluster-with-shared-mounts">Spurious PID entries in <code>/proc</code> after deleting <code>k3d</code> cluster with shared mounts<aclass="headerlink"href="#spurious-pid-entries-in-proc-after-deleting-k3d-cluster-with-shared-mounts"title="Permanent link">¶</a></h2>
<ul>
<li>When you perform cluster create and deletion operations multiple times with <strong>same cluster name</strong> and <strong>shared volume mounts</strong>, it was observed that <code>grep k3d /proc/*/mountinfo</code> shows many spurious entries</li>
<li>Problem: Due to above, at times you’ll see <code>no space left on device: unknown</code> when a pod is scheduled to the nodes</li>
@ -1155,13 +1199,17 @@ Some can be fixed by passing the <code>HTTP_PROXY</code> environment variables t
<li>As per the conversation on <ahref="https://github.com/rancher/k3d/issues/594#issuecomment-837900646">rancher/k3d#594</a> above issue wasn’t reported/known earlier and so there are high chances that it’s not universal.</li>
</ul>
<h2id="nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied">Nodes fail to start or get stuck in <code>NotReady</code> state with log <code>nf_conntrack_max: permission denied</code><aclass="headerlink"href="#nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied"title="Permanent link">¶</a></h2>
<h2id="solved-nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied">[SOLVED] Nodes fail to start or get stuck in <code>NotReady</code> state with log <code>nf_conntrack_max: permission denied</code><aclass="headerlink"href="#solved-nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied"title="Permanent link">¶</a></h2>
<li>When: This happens when running k3d on a Linux system with a kernel version >= 5.12.2 (and others like >= 5.11.19) when creating a new cluster<ul>
<li>the node(s) stop or get stuck with a log line like this: <code><TIMESTAMP> F0516 05:05:31.782902 7 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied</code></li>
</ul>
</li>
<li>Why: The issue was introduced by a change in the Linux kernel (<ahref="https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.12.2">Changelog 5.12.2</a>: <ahref="https://github.com/torvalds/linux/commit/671c54ea8c7ff47bd88444f3fffb65bf9799ce43">Commit</a>), that changed the netfilter_conntrack behavior in a way that <code>kube-proxy</code> is not able to set the <code>nf_conntrack_max</code> value anymore</li>
<li>Fix: This is going to be fixed “upstream” in k3s itself in <ahref="https://github.com/k3s-io/k3s/pull/3337">rancher/k3s#3337</a> and backported to k3s versions as low as v1.18.</li>
<li><strong>Note</strong>: k3d v4.4.5 already uses rancher/k3s:v1.21.1-k3s1 as the new default k3s image, so no workarounds needed there!</li>
</ul>
<p>This is going to be fixed “upstream” in k3s itself in <ahref="https://github.com/k3s-io/k3s/pull/3337">rancher/k3s#3337</a> and backported to k3s versions as low as v1.18.</p>
<ul>
<li><strong>The fix was released and backported in k3s, so you don’t need to use the workaround when using one of the following k3s versions (or later ones)</strong><ul>