Capture fleeting thoughts with org-roam

13. January 2021

I frequently find myself wanting to capture a new idea or tidbit for future reference into an org-roam entry and immediately resume what I was doing, e.g. while taking notes during a meeting. org-capture and helm-org make this simple.

My workflow:

  1. Invoke org-capture (C-c c r) and select my "Roam Random" template
  2. Choose a Roam entry using org-roam-find-file
  3. Filter though the entry’s headline candidates with helm-org and finalise the capture

The capture template:

1
2
3
4
5
("r" "Roam Random" item
  (function (lambda () (my/helm-in-org-buffer (org-roam-find-file-name))))
  "- %U %?"
  :prepend t
  :kill-buffer t)

The helper functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defun my/helm-in-org-buffer (filename &optional preselect)
  "Display and filter headlines in an org file with `helm'.
FILENAME is the org file to filter PRESELECT is the default entry"
  (interactive)
  (helm :sources (helm-org-build-sources
                  (list filename))
        :candidate-number-limit 99999
        :truncate-lines helm-org-truncate-lines
        :preselect preselect
        :buffer "*helm org in buffers*"))

(defun org-roam-find-file-name ()
  "Find and return the path to an org-roam file
with the `org-roam-find-file' interface"
  (interactive)
  (save-window-excursion (org-roam-find-file) buffer-file-name))

eww-reload-on-save – Hot Reloading for eww

5. September 2020

I frequently use eww and shr-mode when writing blog posts and HTML email to confirm layout and formatting is accessible to text-only user agents. Having grown enamored with hot reloading after developing ClojureScript apps with Figwheel, I wanted to create a similar workflow for writing, and have now released a package that does exactly that:

eww-reload-on-save is a minor mode that will refresh an eww buffer whenever a non-eww buffer in which the mode is active is saved.

You can change the buffer to be reloaded with eww-reload-on-save-choose-eww-buffer, and optionally increase the post-save delay before the target eww buffer is reloaded by changing the value of eww-reload-on-save-delay-seconds - the latter’s default can also be altered with customize.

Piping Shell Output to an emacs Buffer

2. September 2020

Every few months, I see someone on #emacs mention new eshell functionality, or re-read amebevar’s excellent post on using it as their main shell and am inspired to give it another try. I dream of replacing long pipelines of sed and awk invocations with simple elisp, of experiencing the best parts of Plan9’s UX, seamlessly redirecting output to buffers, and being freed from the tyranny of readline‘s limited vi bindings by the wonderful evil!

Sadly, as a dyed-in-the-wool bash user who’s committed more scripting knowledge to memory than is healthy, I always come back to shell-mode. Without fail, the feature I miss most from my dalliances with eshell is piping command output into a buffer. With this alias, you can have the best of both worlds.

1
alias eless='(f=$(mktemp "/tmp/pipe.XXX"); cat > $f; emacsclient $f; rm -v $f)'

Invoke it by simply piping to it: cat some-big-file.txt | eless. When you kill the buffer with C-x # your emacsclient session will end, returning you to your shell buffer, and the temp file created will be deleted.

Fetching GIFs from GIPHY with bash

25. August 2020

As someone who can only be described as a reaction GIF ‘connoisseur’, I rarely pass up the opportunity to add a new GIF to my collection. Unfortunately, the proliferation of hosts like GIPHY and Tenor, who reencode GIFs in webp or MPEGs formats to save bandwidth and rely on JavaScript or the user agent to handle looping, has made this more complicated than a Right Click -> Save As.

Tired of trawling through Developer Tools to grab the source image, I wrote a bash script that does the heavy lifting of extracting the GIF’s ID from a URL, generating a filename based on the GIF’s title, and downloading it.

You can grab it grab it from a gist here. Plop it on your PATH, chmod +x, and invoke giphy-grab:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ giphy-grab
giphy-grab [-f|--force] [-s|--skip-title] [-o|--output] <url>
            --force: overwrite output file if it exists
            --skip-title: do not fetch GIF name from GIPHY
            --output: filename or destination folder for GIF

$ export url="https://giphy.com/gifs/cat-cats-OmK8lulOMQ9XO"
$ giphy-grab ${url}
Destination URL: ${url} | Output: Cute Cat.gif

# Output to a folder with title from Giphy - perfect for scripting!
$ mkdir gifs && giphy-grab -o gifs/ ${url}
Destination URL: ${url} | Output: gifs/Cute Cat.gif

diff small changes (and long lines!) by character

4. August 2020

I recently had to work with some test fixtures containing extremely long lines. Using a traditional line-based diff on these to ensure only the smallest changes necessary were made and committed was difficult, and so I delved into the depths of the git-diff man page to see if there was a better way.

Combining two options, --word-diff-regex and --word-diff, it’s possible to see changes inline, side-by-side. For example, changing AaA to BbB will show as ABabAB.

--word-diff-regex allows you to specify a regular expression to determine what a word is. This specific use case is even called out on the manpage, which notes: --word-diff-regex=. will treat each character as a word and, correspondingly, show differences character by character.

--word-diff forces the diff to be shown using only colors, without any +, -, or bracket characters demarcating changes.

Hopefully this saves you some squinting.

org-capture Refile Functions & Helm Sources

18. February 2019

For a few years, predating my adoption of emacs, I’ve kept a Quotes file with phrases I want to remember or come back to. Originally formatted as Markdown, converting it to org was easy enough with macros (pandoc would’ve been a great alternative if the structure was more complex).

read more

Skip LaTeX Title Page with org-export

16. February 2019

I was recently writing a paper where the submission requirements mandated using custom LaTeX class but omitting the title page typically produced when doing so.

After unsuccesfully experimenting with various org options to try and remove the title page, I realised you could drop down to LaTex and redefine the class’ \maketitle definition instead:

1
#+LATEX_HEADER: \renewcommand\maketitle{}

This turns it into a null op and has the benefit of retaining the title during export to other formats, a boon for my purposes. I hope this post saves some poor soul the forty-five+ minutes I spent reading org source to find a solution!

mkDocs on Azure App Service + CD with VSTS

10. April 2018

Our team uses mkDocs as an internal wiki; a simple static site generator, it’s easy to host anywhere. At Microsoft, Azure is our go-to for internal services like this, and so we use Azure App Service as our deployment target. Though a PaaS offering, it works quite well for static sites as well - files deployed to the server root are served up by the IIS proxy that sits in front of the site.

Our mkDocs builds and deployments are managed by Visual Studio Team Services using the new YAML Builds. Every time a commit is made to the master branch, a new build is triggered; if succesful, the contents of the site directory on the build server are pushed to our docs site, ensuring it’s always up-to-date.

This is how we got everything up and running.

read more

A Better bash Prompt

1. April 2018

Recent work has included a significant amount of time spent ssh‘d into different hosts; to avoid confusion about which machine I was executing commands on I decided to update my default bash prompt to include some additional information. If everything’s going well, my prompt looks like this:

[1722][Cate@prozess: ~]$

Though simple at first glance, it’s quite information dense. The first bit, [1722] tells me this the 1722nd command in my bash_history, meaning I can easily rerun it with !1722.

The colour coding of the next square bracket delimited section means that the last command I ran executed successfully ($0 == 0), and I also know my username, my hostname, and my relative path. If I ssh into my bounce box or my Windows build machine, it’s now immediately clear which environment I’m operating in. Interested in trying it yourself?

read more

Setting Environment Variables on an HDInsight Spark Cluster with Script Actions

20. March 2018

It’s a common pattern in web development to use environment variables for app configuration in different environments. My team wanted to use the same pattern for our Spark jobs, but documentation on how to set environment variables for an HDInsight cluster was hard to come by, but we eventually found a solution for HDInsight 3.6.

tl;dr: You need to modify spark.yarn.executorEnv and spark.yarn.appMasterEnv in $SPARK_HOME/conf/spark-defaults.conf. For example, if you want the variable SERVICE_PRINCIPAL_ENDPOINT available from a sys.env call inside your Spark app, you’d add the following lines to spark-defaults.conf:

1
2
spark.yarn.appMasterEnv.SERVICE_PRINCIPAL_ENDPOINT <val>
spark.executorEnv.SERVICE_PRINCIPAL_ENDPOINT <val>

These settings will now be used whenever a job is submitted to the cluster, whether that be via spark-submit or Livy. Though only the head nodes of the cluster need to have their defaults.conf updated, it’s recommended you automate deployment of your customizations through Script Actions.

I wrote a small bash script that generates another bash script, uploads it to Azure Blob Storage, and submits it to the HDInsight Script Action REST endpoint for execution across all the nodes on the cluster. Because a Script Action may be executed on a node multiple times, it’s important to make sure they’re idempotent, lest you end up with the same key set dozens of time in the same file.

I get around this by enclosing our Script Action-deployed section between two distinct comments, and using grep and sed to delete any preëxisting block before it’s added back to the file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
env_block() {
    declare -A ClusterEnv=(
        ["ADLS_ROOT_URL"]="adl://nib-dl.azuredatalake.net"
    )

    local concat=""
    local NL=$'\n'
    for key in "${!ClusterEnv[@]}"; do
        concat+="spark.yarn.appMasterEnv.${key} ${ClusterEnv[$key]}$NL"
        concat+="spark.executorEnv.${key} ${ClusterEnv[$key]}$NL"
    done

    local header="### START SCRIPTACTION ENV BLOCK ###"
    local footer="### END SCRIPTACTION ENV BLOCK ###"
    echo "$header$NL$concat$footer"
}

generate_script_action() {
    #Filename to output script to
    CLUSTER_FILE=\$SPARK_HOME/conf/spark-defaults.conf
    local output=$1
    cat > $output <<EOF
env_vars=\$(cat << END
$(env_block)
END
)

#Load /etc/environment so we have \$SPARK_HOME set
source /etc/environment

#Delete existing Script Action block
lines=\$(cat $CLUSTER_FILE | grep -nE "###.+SCRIPTACTION" \
    | cut -f1 -d":" | tr '\n' ',' | sed -e 's/,$//')

#If the section isn't found, skip deleting from the target file
if ! [ -z "\$lines"  ]; then
    sudo sed -i'' -e "\${lines}d" $CLUSTER_FILE
fi

echo "\$env_vars" | sudo tee -a $CLUSTER_FILE
EOF
}

generate_script_action is a bash function that takes a single argument, a filename, and writes out a Script Action that sets the appropriate keys in spark-defaults.conf as described above. In our production version, we use Azure KeyVault to fetch fresh secrets and re-execute the script action on each build and deploy to prod.

Capturing Regions with Org Capture

28. February 2018

I’m an avid user of org-capture for recording TODOs, code snippets, and reference material without interrupting my train of thought. A quick C-c c t lets me rattle off a note-to-self while capturing a link back to the buffer I’m in for later reference.

I recently found myself frequently repeating the pattern of calling kill-ring-save on a region to yank it into my org-capture buffer. By taking advantage of org-capture-templates and a little elisp, this pattern can be reduced to simply highlighting the target region before invoking capture.

read more

Remote Debugging Spark Jobs

11. January 2018

As we move past kicking the tires of Spark, we’re finding quite a few areas where the documentation doesn’t quite cover the scenarios you’ll run into day-after-day in a production environment. Today’s exemplar? Debugging jobs running in our staging cluster.

While taking a functional approach to data transformations allows us to write easily testable and composable code, we don’t claim to be perfect, and being able to set breakpoints and inspect values at runtime are invaluable tools in a programmer’s arsenal when println just won’t do.

The what of debugging JVM application in production is simple, and among the most powerful (and impressive!) capabilities of the Java platform. The Java Debug Wire Protocol (JDWP) enables attaching to a remote server as if it were on your machine. Just launch the JVM for your Spark driver with debug flags and attach. Easy, right?

Not quite.

read more