Katana: An ELF/DWARF Manipulation Tool with Hotpatching Capabilities
 ====================================================================

Author: James Oakley <james.oakley@dartmouth.edu>
Date: 2011-04-13 00:01:54 EDT


Table of Contents
=================
1 Introduction 
2 General Usage Information 
    2.1 Shell 
        2.1.1 Syntax and Data Model 
        2.1.2 Available Commands 
        2.1.3 History 
3 Hotpatching 
    3.1 Other Systems 
    3.2 What Katana Does 
    3.3 What Katana Does Not Do (Yet) 
    3.4 What Katana May Never Do 
    3.5 How to Use Katana For Hotpatching 
        3.5.1 Preparing a Package for Patching Support 
        3.5.2 Source Code Practices 
        3.5.3 Compilation/Linking 
        3.5.4 To Generate a Patch 
        3.5.5 To Apply a Patch 
        3.5.6 To View a Patch 
        3.5.7 Options 
        3.5.8 Configuration Files 
        3.5.9 See Also 
    3.6 Patch Object Format 
    3.7 Patch Generation Process 
    3.8 Configuration 
    3.9 Initializing the patch object 
    3.10 Comparing source trees 
    3.11 Type Diffing 
    3.12 Function Diffing 
    3.13 Patch Application Process 
    3.14 Roadmap 
4 DWARF Manipulation 
5 Credits and Licensing 


1 Introduction 
~~~~~~~~~~~~~~~
  Katana is a research system for ELF/DWARF manipulation. It was
  originally developed for research into hotpatching. It was later
  revised for research into security implication of gcc/C++ exception
  handling, which is implemented primarily using DWARF call frame
  information. Therefore, if you are interested in vulnerabilities
  related to exceptiong handling/DWARF you may probably ignore the
  parts of this manual which discuss hotpatching. If you are instead
  interested in hotpatching, you may probably ignore the parts of this
  manual that deal with manipulating exception handling structures.
  
  Katana aims to provide a hot-patching system for userland. Further
  it aims to work with existing toolchains and formats so as to be
  easy to use and to hopefully pave the way for incorporating patching
  as a standard part of the toolchain. Because of this aim, Katana
  operates at the object level rather than requiring any access to the
  source code itself. This has the added bonus of making it, in
  theory, language agnostic (although no work has been done to test it
  with anything besides programs written in C). A diagram of software
  lifecycle with hotpatching is shown below (unless you are reading this in plain text)


  This document is intended to provide a users guide to Katana,
  insight into its inner workings, and discussion of its flaws and
  plans for the future. As the software is not complete, making use of
  Katana without understanding the inner workings and technical
  shortcomings is not recommended. Nevertheless, the only sections of
  this document necessary for "Users' Guide" purposes are 
  ["What Katana Does"], ["What Katana Does Not Do (Yet)"], and most importantly 
  ["How to Use Katana For Hotpatching"].
 
  This document is a work in progress. It is not a polished guide yet.


  ["What Katana Does"]: sec-3.2
  ["What Katana Does Not Do (Yet)"]: sec-3.3
  ["How to Use Katana For Hotpatching"]: sec-3.5

2 General Usage Information 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1 Shell 
==========
   If Katana is not passed an argument indicating one of the
   hot-patching commands (described later in  [*How to Use Katana For Hotpatching]), then it is assumed to be operating as a shell. If it
   is provided an argument, that argument is taken as the name of a
   file to read shell commands from. Otherwise commands are read from
   stdin using the readline library. 

   [*How to Use Katana For Hotpatching]: sec-3.5

2.1.1 Syntax and Data Model 
----------------------------
    The Katana shell syntax is very simple. There are no control flow
    structures, only commands and variables. A line is terminated by a
    semicolon (;) or a newline character. Each line may be either
    blank, contain exactly one COMMAND, or contain an ASSIGNMENT.

    A COMMAND is of the form COMMAND_IDENTIFIER PARAM PARAM PARAM ...., where
    tokens are seperated by spaces and the number of PARAMs depends on
    the command.

    An ASSIGNMENT is currently of the form VARIABLE=COMMAND although
    in the future it may be possible to write other sorts of
    assignments.

    A VARIABLE reference consists of a dollar-sign ($) followed by a
    letter or underscore followed by any number of letters,
    underscores, or digits.

    A COMMAND_IDENTIFIER is one or more words which identify a
    COMMAND. In many cases a command is identified by only one word,
    but sometimes similar commands are grouped by sharing the first
    word in their identifier.

    A PARAM is a VARIABLE reference, STRING, or NUMBER

    A STRING is any literal beginning and ending with the character ".

    A NUMBER is a decimal, hex, or float literal.

* Data Types 
  The following types of variables exist
  + string
  + elf
  + elf section
  + raw data
  + array
   

2.1.2 Available Commands 
-------------------------
* load 
  /Usage/: `load FILENAME'
  /Params/: FILENAME must a string literal or variable that can be interpreted
            as a string.
  /Function/: Loads the data in the given file as an ELF object if
            possible. If not, loads it as raw data.
* save 
  /Usage/: `save VAR FILENAME'
  /Params/: VAR must be a variable that can be interpreted as an ELF
            object or that can be interpreted as raw data. FILENAME must be a
            literal or variable that can be interpreted as a string.
  /Function/: Saves VAR to FILENAME.
* dwarfscript 
  + dwarfscript emit 
    /Usage/: `dwarfscript emit [SECTION] ELF OUTFILE'
    /Params/: SECTION must be the name (string) of the section to write as
              Dwarfscript. If not specified it defaults to
              ".eh_frame". ELF must be an ELF object. OUTFILE must
              be a string with the name of a file to write the resulting
              Dwarfscript to.
    /Function/: Writes the Dwarfscript representation of the given
                SECTION from the given ELF to OUTFILE.
  + dwarfscript compile 
    /Usage/: `dwarfscript compile INFILE'
    /Params/: INFILE must be a string containing the name of a file.
    /Function/: Interprets the contents of the file named by INFILE
                as Dwarfscript and compiles the Dwarfscript into
                beinary form. Returns an array with 3 items
                0: raw data for .eh_frame 
                1: raw data for .eh_frame_hdr
                2: raw data for .gcc_except_table.
* replace 
  + replace section 
    /Usage/: `replace section ELF SECTION_NAME NEW_SECTION'
    /Params/: ELF must be an ELF object. SECTION_NAME must be a
              string. NEW_SECTION must be either an ELF section or raw data.
    /Function/: Replaces the section with the name SECTION_NAME in
                the oject ELF with the data from
                NEW_SECTION. Section headers are replaced if NEW_SECTION is
                able to provide them, but not if it is only raw data.
                 
  + replace raw 
    /Usage/: `replace raw ELF OFFSET NEW_DATA'
    /Params/: ELF must be an ELF object. ADDRESS must be an
              integer. NEW_DATA must be raw data.
    /Function/: Replaces the raw data at OFFSET in the ELF object
                with NEW_DATA. OFFSET must refer to a location in an
                existing section.
* info 
  + info eh 
    /Usage/: `info eh ELF [OUTFILE]'
    /Params/: ELF must be an ELF object. OUTFILE, if present, must
              be the name of a writable file (which may or may not
              exist yet). 
    /Function/: Prints out information about the exception-handling
                structures in ELF. If OUTFILE is present, this
                information is written to it.
* hash 
  + hash elf 
    /Usage/: `hash elf STR'
    /Params/: STR must be a string.
    /Function/: Prints the result of running elf_hash (from libelf)
                on the string.
                
* patch 
  + gen 
    /Usage/: `patch gen OLD_OBJECTS_DIR NEW_OBJECTS_DIR EXECUTABLE'
    /Params/: All three params are strings. The first two are the
              old and new object file directories respectively. The
              last is the name of the executable that can be found
              in both directories.
    /Function/: Generates (and returns) a patch object ELF.
  + apply 
    /Usage/: `patch apply PO PID'
    /Params/: The PO parameter should be an ELF patch object. PID
              should be the (integer) pid of the process that PO is
              to be applied to.
    /Function/: Applies the patch object PO to the running process
                described by PID.
* ! (shell command) 
  The rest of the line following by ! is executed in a shell.

2.1.3 History 
--------------
    Command history is saved using libreadline in `$HOME/.katana_history'.

3 Hotpatching 
~~~~~~~~~~~~~~

3.1 Other Systems 
==================
   There are other hotpatching systems in existence. The curious are
   invited to explore Ginseng and Polus. Both of these systems parse
   the source code, which adds significant complexity to them and
   results in significant programmer annotation of the code to give
   hints to the systems. Ginseng uses complicated type-wrappers
   when patching variables which does not fit cleanly with existing
   executables and has some impact on the performance of the
   software. Ginseng is considerably more mature than Katana,
   however. Neither system is production ready, but Ginseng is probably
   closer than Katana at the moment.

   The system most like Katana in many ways is KSplice, and the curious
   reader is definitely invited to investigate. KSplice patches the
   kernel and not userland, does not attempt to patch variables, and
   creates patches as kernel modules rather than working towards a
   general ELF-based patch format.

3.2 What Katana Does 
=====================
   + Runs on x86 and x86-64
   + Generates patches for simple programs
   + Applies simple patches

3.3 What Katana Does Not Do (Yet) 
==================================
   + Patch any major programs: it has not yet been demonstrated on
     anything more than toy examples
   + Provide any method to handle opaque data it cannot patch (void*,
     situations where which action a user would prefer is unclear, etc)
   + Patch previously patched processes
   + Provide robust operation
   + Run on any architectures other than x86 and x86-64
   + Tested on any operating system besides GNU/Linux
   + Allow for calls in patched code to previously unused functions
   + Work for programs which actually make use of some of the large
     code model features of the x86-64 ABI.
   + And much more

   See [Roadmap] for more things which are not complete


   [Roadmap]: sec-3.14

3.4 What Katana May Never Do 
=============================
   + Work on any binary formats besides ELF

3.5 How to Use Katana For Hotpatching 
======================================
   Katana is intended to be used in two stages. The first stage
   generates a patch object from two different versions of an
   treee. By an object tree, we mean the set of object files (.o files)
   and the executable binary they comprise. Katana works completely at
   the object level, so the source code itself is not strictly
   required, although all objects must be compiled with debugging
   information. This step may be done by the software vendor. In the
   second stage, the patch is applied to a running process. The
   original source trees are not necessary during patch application, as
   the patch object contains all information necessary to patch the
   in-memory process at the object level. It is also possible to view
   the contents of a patch object in a human-readable way for the
   purposes of sanity-checking, determining what changes the patch
   makes, etc.

3.5.1 Preparing a Package for Patching Support 
-----------------------------------------------
     Katana aims to be much less invasive than other hot-patching system
     and require minimal work to be used with any project. It does,
     however, have some requirements.

3.5.2 Source Code Practices 
----------------------------
    Katana does not look at the source code, therefore unlike several
    other hotpatching systems, it does not require any annotation in
    the source code. There are, however, some best practices to
    follow.
    + Avoid the use of `void*' at least for global variables (since
      Katana does not currently patch local variables, preferring to
      wait until any functions using changed variables are no longer
      on the stack). Since it is typeless and opaque, it is very hard
      to analyze and patch.
    + Avoid unnamed types. i.e., instead of `typedef struct {...} Foo;'
      use `typedef struct Foo_ {...} Foo;'. 
    + Avoid accessing structure members by offsets instead of by the
      member names. As long as you keep all the code where you do this
      up to date, it should not be a problem, but katana cannot detect
      when you do this.

3.5.3 Compilation/Linking 
--------------------------
    Required CFLAGS:
    + -g

    Recommended CFLAGS:
    + -ffunction-sections
    + -fdata-sections
      
    Recommended LDFLAGS:
    + --emit-relocs

3.5.4 To Generate a Patch 
--------------------------
    Let the location of your project be /project. You must have two
    versions of your software available: the version identical to the
    running software which must be hotpatched, call it v0, and the
    version to which you wish to hotpatch the running software, call it
    v1. Let foo be the name of your program. Then /project/v0/foo must
    exist and /project/v0 must also contain (possibly in
    subdirectories) all of the object files which contributed to
    /project/v0/foo. The source code itself is immaterial, as Katana
    does not parse it. Similarly, /project/v1/foo must exist and
    /project/v1 contain all of the object files contributing to
    /project/v1/foo. Katana is then invoked as

    `katana [OPTIONS] -g [-o OUTPUT_FILE] /project/v0 /project/v1 foo'

    or more formally

    `katana [OPTIONS] -g [-o OUTUT_FILE] OLD_OBJECTS_DIR NEW_OBJECTS_DIR EXECUTABLE_NAME'

    If `-o OUTPUT_FILE' is not specified, the output file will be `OLD_OBJECTS_DIR/EXECUTABLE_NAME.po'

3.5.5 To Apply a Patch 
-----------------------
    The process to be patched is running with a pid of PID. It can be
    patched from its current version to a more recent version by the
    Patch Object (PO) file PATCH. Katana is then invoked as

    `katana [OPTIONS] -p [-s] PATCH PID'

    If all goes well, the patcher will run, print out some status
    messages, and leave your program in better state than it found
    it. The optional -s flag tells Katana to stop the target program
    after patching it and detaching from it. This is mostly of use for
    debugging Katana.

3.5.6 To View a Patch 
----------------------
    One of the goals of Katana and its Patch Object (PO) format is to
    increase the transparency of patches: a user about to apply a patch
    should know what it will do. This goal is not yet fully realized,
    but it is possible to view some information about a patch with

    `katana [OPTIONS] -l PATCH'

3.5.7 Options 
--------------
    The following options may be passed to katana regardless of whether
    one is generating, applying, or viewing a patch:
    + -c CONFIG
      where CONFIG is the name of a configuration file to load

3.5.8 Configuration Files 
--------------------------
    Katana loads configuration files as follows. Configuration files
    loaded later in the sequence may overwrite settings from files
    earlier in the sequence.
    + /etc/katana
    + ~/.katana
    + ~/.config/katana
    + ./katana
    + any file specified with -c

    Configuration files are written in JSON. The JSON requirement that
    strings be quoted is relaxed (i.e. anything is assumed to be a
    string unless it can be interpreted otherwise). The following
    properties are recognized:
    + maxWaitForPatching <INTEGER>
      This value specifies the maximum number of seconds to wait for
      the target to enter a safe state.
    + flags <OBJECT>
      The value of flags should be an object which may contain the
      following properties, all of which should be bool-valued:
      + checkPtraceWrites
        Whenever something is written into the target memory, read the
        value back out and verify that it was written correctly. This
        has a performance penalty, but does provide some more robust
        error checking, although it should not be necessary.

3.5.9 See Also 
---------------
    the katana manpage (although the information in this document is
    considerably more extensive than in the manpage)

3.6 Patch Object Format 
========================
   This section of the document is not yet written. It will provide a description and specification of the PO format used by Katana

3.7 Patch Generation Process 
=============================
   This section of the document is still under construction. When
   complete, it will provide a description of the internal process that
   Katana uses to generate a patch. Understanding it is not necessary
   for using Katana.
   

3.8 Configuration 
==================
   Katana reads configuration files from (in order, with later
   configuration files overriding options found in earlier ones) from
   `/etc/katana', `~/.katana', `~/.config/katana', and `./.katana'.

3.9 Initializing the patch object 
==================================
   Katana sets up a patch object ELF file with the necessary sections,
   see [Patch Object Format]

   [Patch Object Format]: sec-3.6

3.10 Comparing source trees 
============================
   + Katana compare the old and new source trees, looking at the object (.o)
     files.
   + For object files which exist only in the new tree, their contents
     are added to the patch object being created.
   + For object files which exist only in the old tree, a warning
     about their removal is issued and nothing further is done.
   + For object files which exist in both trees, type diffing and
     function diffing are performed and the differences are written
     tot he patch object being created.

3.11 Type Diffing 
==================
   This section of the document still needs to be written. The general
   idea is that structures are examined for for added members, moved
   members, and changed members.

3.12 Function Diffing 
======================

3.13 Patch Application Process 
===============================
   This section of the document is not yet written. It will provide a
   description of the internal process that Katana uses to apply a
   patch. Understanding it is not necessary for using Katana.

3.14 Roadmap 
=============
   This section is highly incomplete. Future goals include
   + Better interaction with the heap and dynamically allocated variables
   + Better interaction with void*
   + More efficient use of .rodata
   + Patching already patched processes
   + Patch composition
   + Patch safety checking: make sure a patch actually corresponds to
     the process it's being applied to
   + Storing warnings from generation inside a patch

4 DWARF Manipulation 
~~~~~~~~~~~~~~~~~~~~~

5 Credits and Licensing 
~~~~~~~~~~~~~~~~~~~~~~~~
  Katana is under development at Dartmouth College and Copyright 2010
  Dartmouth College. It may be distributed under the terms of the GNU
  General Public License with attribution to Dartmouth College as
  specified in the file COPYING distributed with Katana. This document
  is Copyright 2010-2011 Dartmouth College and may be distributed
  under the terms of the GNU Free Documentation License as found in
  the file FDL which should have been distributed with this
  documentation. If it was not, it may be found at
  [http://www.gnu.org/licenses/fdl.txt].

  Katana is being written by James Oakley and was designed by Sergey
  Bratus, Ashwin Ramaswamy, James Oakley, Michael Locasto, and Sean
  Smith.