Open sourcing work experiments and libraries

Recommendations I have for open sourcing work-related codebases

Mar 25, 2024

In Schibsted, one of the areas I help with is the open source office. The office exists to help encourage open source internally, both through contributions upstream, or open sourcing things we find useful.

One of the most common questions we get is how can I open source a library or code I’ve worked on, which is the cousin of why should I open source a small bit of code I did?

Here’s some of the guidelines I try to follow and recommend for both production-ready libraries, and small experiments, along with a case study of an experiment which I open sourced.

Why open source an experiment? Why open source anything?

Open source is all about sharing solutions to problems that people have, and if it solves a problem for 1 person it’s just as valuable as becoming an industry standard framework. It benefits both the employee and the employer. An experiment is usually a little different to a production-ready library — libraries benefit from getting external contributors, experiments benefit from sharing ideas and knowledge with others. Collaboration in open source doesn’t need to be purely code, sometimes sharing a concept is as influential as sharing code.

Some ways it can benefit:

In the future, you’ll be able to revisit the code and borrow anything you might need.
When you tell someone about something you worked on, it’s very handy to be able to grab the repo and send them a link.
Others may use your code and come to better solutions for problems, or others may contribute to your repo. Both help the community.
Potential employees might be drawn in to join your company by seeing what you’re working on.
If you’re looking for a job, open source examples help a lot for people to get a feel on what you are interested in.

Before you begin

A good experimental project has a realistic scope and ambition: what can you get done feasibly in a short time, and what purpose does it serve? What do you hope to learn from it? What problems does it solve?

It helps if you've spent some time thinking about it beforehand, so that your set of unknowns on implementation is the blocker to what you want to do, rather than the idea itself being a blocker. Identify the unknowns, and make sure you set enough time aside to look into them properly. You may have to try out lots of different solutions to get there.

These concepts also apply to libraries, though I’d add that the best libraries often come from observing real patterns and problems. In my mind, experiments are to help you gather knowledge about a topic you currently don’t have, while libraries are there to solve common problems with existing solutions.

Make your README useful

Since the README is the entry point to your repo, it’s worth spending a lot of time on it. Both for potential users, and for your future self. Don’t be afraid to use some of the more complex features of Github Flavoured Markdown, like tables, Mermaid.js, or html. It can really add a lot, to have a more visually broken down README.

Explain in simple terms:

A very short tl;dr of what the code does.
1. A picture or screen recording helps a lot. The easiest way to add an image or video file is to use Github’s web UI and drag it to the markdown file, and it’ll handle the upload for you.
How does someone install or use it?
1. What’s the API, or where can docs be found?
2. Mermaid.js support is built into Github’s markdown renderer, use the editor to make great visualisations of how the code works.
A label stating whether the code is production-ready or experimental.
Any known bugs or plans for the future.
How to contact the maintainers?
1. How to ask questions, report bugs, or submit pull requests?
Do you have any learnings to share?
1. Did it work as expected?
2. What would you do differently?
Are there better alternatives?

Licenses and legal

Which license you choose is generally up to you, but your company may have some opinions. Github has a list of licenses, and a handy website with a breakdown of each license. It’s generally recommended that you clear the code with your manager, if you think it would be sensitive or competitive to your company. If you’re not sure, ask. If you don’t, you may accidentally get into trouble with your employer. Your contract may also state those terms. Bigger companies will ask for a contributor license agreement (CLA), to handle the legal side of having external contributors. There’s a tool to help streamline this for you.

My personal default is BSD-3-Clause, though I do like MIT. If your repo is small and not intended for many external contributions, you probably don’t need a CLA. If you expect others to contribute code, it’s worth setting up a CLA for your organization.

Hosting and repo ownership

There’s a few options out there for hosting a code repo, but the default is Github. If the code you’re releasing is work related or built by people on your team, you may want it to be under the Github organization of your company. This will allow people to easily contribute from the company, which is vital if it’s a critical library used by one of your products. It’s a good draw to pull people into your company.

List the maintainers, how active the repo is, and if it can be scheduled for deletion or archiving. Keeping track of who owns what, and how important they are to the company, makes off-boarding a ton easier.

Archiving is almost always the better option. You never know when you might need some old code for inspiration. However, if a repo is empty or not useful, deleting it will help reduce the noise.

Tests

Whether you have a production-ready code base or an experimental one, having tests are a good idea. If you’re doing it from a test driven approach, you may have started with a bunch of tests. I typically wait until I have the shape of the API coded up before introducing tests, as an API can change a lot while you figure out how it should work.

There’s some basic principles I follow for tests:

They should be quick to run, locally and in CI.
They should be comprehensive enough that future refactors will be reliable.
There should be one command to run the tests.
If you’re using a compiled language, the test should include the build step.
If you’re using an auto-formatter, the test should verify that committed code is properly formatted.
CI should run the tests on each push.

If you’re using Github, Github Actions is the default option for running tests in CI. Github Actions’ pricing model includes 2000 minutes monthly for all accounts. In other words, if your test takes 2 minutes to run, and it’s run on every commit, you could commit 1000 times a month. 2 minutes is a good time to aim for. For reference, Derw has about 2000 tests over roughly 130 files, and on CI it runs in about 1m 20s. Other codebases using tech like end-to-end tests, visual regression testing, or building native binaries will likely take a lot longer. There are many events that you can configure to trigger running a Github Action, but typically you’ll want to stick to pull_request and push. The use case for some of the other events are for things like scheduled runs, handling issue discussions, or building binaries. It is possible to only manually run a job.

Security

Dependabot is a handy tool, if you’re active at merging the PRs it generates. It’s hard to stay up to date with all the library and security releases, so automating that is super handy.

It’s worth checking which libraries your repo depend upon. Are all of them needed? Do they have large dependency chains that open more potential attack vendors? There’s not much point in reinventing the wheel every time you face a problem, but sometimes if you’re just using a couple of small functions from a large library, it might make sense to drop the library and implement it yourself. On the other hand, if you’re using a giant framework which itself has many dependencies, you’re probably not able to coherently vet all the dependencies.

By default, forks of a repo do not trigger Github Actions workflows - they must be manually approved. This is pretty important as otherwise users could do bad things with their forks of your projects, so anytime you get a new contributor, make sure to vet their code and/or profile.

Understanding the community culture

Depending on the community your project fits in, there will be a specific community culture. Some typical things to consider are:

Where do the community congregate?
- e.g Discord, Slack, Discourse
- You’ll want to share your library with your audience, along with be approachable when people use your library.
How do the community talk to each other?
- This is about the writing style. Do people use full sentences with proper grammar? What emojis are used, if any? You don’t need to fully change your writing style, but adapting it for smoother communication will help you fit in.
What are the recommended practices?
- e.g linting, formatting, test frameworks, security, distribution platforms, versioning.
- You’ll want to align with these the best you can, unless you have a good reason for not doing so. It makes it easier for people to contribute, and makes it less likely for people to disparage your project if the code is hard to complain about.
Who are the people most active in the community? Who are the maintainers?
- You’ll want to keep them in mind and heed any guidance they give closely. But don’t idolise them.
- Interacting with them will often lead to your projects getting more attention, or a better direction.
Are you being irritating?
- If you publish a lot of low quality projects and spam them everywhere, it’ll likely annoy the community. Follow the other steps in this article and you should be in good hands, but don’t just spam links to it everywhere.
Attend meetups and conferences
- If there’s not a meetup, start one! If you don’t think the community is big enough for a dedicated meetup, it’s possible to give a talk at a semi-related meetup. Or perhaps just meet a couple of people for socialising once in a while. Remote meetups are much easier to do these days.
- These are a great way to learn new things, but expand your network and meet new people. You’ll often find new collaborators, employees, or employers here.

Handling pull requests and issues

If you’re expecting a lot of issues and pull requests, it can be a good idea to set up templates. Typical templates will provide contributors with checklists or guidance on how issues are managed in that repo. You can set up a Github Action to automatically respond to issues, if there’s something you want to do there. It’s worth investing in some kind of board management, typically setting up labels and removing ones you don’t need.

You’ll want to look into your notifications, and ensure you’re subscribed to repos you still care about. If you have a repo where you were the only maintainer but are no longer maintaining it, turns off notifications other than mentions, and put a note in the README (and then potentially archive it).

Use Git appropriately

Start git early, as soon as you’ve written some code or know what code to write. You don’t need to create a repo at this point, a simple git init will work - but then you’ll be able to fearlessly iterate on your solution without worrying about losing old alternative solutions. git commit -p —amend is your friend.

A simple one, but I see it missed a lot - use git tags to ensure that published versions have a tag in git. This is as simple as:

git tag -a 1.0.3 -m 'fixes x'
git push --tags

If you’re using npm, using npm version will help manage this for you, including everything from major/minor/patch to prerelease.

Likewise, good commit practices will help your users find where their bugs were fixed or introduced, along with making changelogs much easier to put together.

Give a working example

Most new libraries don’t have great docs. A lot of established libraries have poor docs. Users will often have to dive into the implementation code to understand all the edge cases, and hopefully when that happens, they submit pull requests to improve the docs. So having some example code that they can actually run and explore will really help bridge that gap. Make sure that the example compiles (it should be included in your test phase on CI), because it’s painful when there’s an example that once worked but no longer is compatible with the library code.

I would recommend having one good, fleshed out example, over many small examples.

Checklist

☐ Update your README with all the info someone might need.
☐ Clear the licensing with your company.
☐ Figure out whether you want to put it on your company’s org or your own.
☐ Write enough tests to make sure that when you revisit it in a couple of years, you can be confident it works and enable CI.
☐ Review the security practices.
☐ Clearly label the repo for contribution guidelines.
☐ Embrace the community and get involved.
☐ Tag releases in git.
☐ Have an example of usage.

Set your expectations wisely

You’ve reached this far, and probably put a good couple of hours more into your project than you thought you would’ve. There’s a tons of repos out there, with most having less than 10 stars. Your goal shouldn’t be to reach a thousand people, but to put out useful code or ideas. It can be useful to you, your company, or the wider community.

So, don’t be sad if your project doesn’t go anywhere. It’s the most likely outcome, after all.

Now, onto a case study of a recent experiment I open sourced. If you prefer to just dive into the code, check the repo for yourself.

Making a simple link aggregator

I sometimes get a burning idea, rolling our my head. I let them stew for a while, and if I see places where the idea would be useful, then I keep a note of it. I talk to people about them, half for rubber-ducking, half to get feedback. Once the idea hits a critical combination of me understanding the problem-solution fit, and enough usefulness, I experiment.

When I start implementing, I’ve already been thinking about it quite a lot, so implementation is usually quite easy. I identify the biggest unknowns, and tackle them once I have the rough setup working. I begin with the types, then move onto the functions.

My primary learning objective was to figure out how to use AppsScript for more complicated tasks. AppsScript supports creating Web Apps, which have API integration into your Google Drive documents. I’m interested in figuring out how to export Google Drive documents into formats that are more discoverable than through Drive’s search, and I’d experimented with a Google Doc export, so I knew it was possible. While I’ve done some cool stuff with AppsScript, I hadn’t pushed it to the limits. I also was curious about how the user permissions setup worked with a web app that authenticated to specific documents.

In this specific case, I’d been thinking a lot about news, and link aggregators. My experiment in summarizing news for the company is pretty successful, but I’ve built it all around a spreadsheet, which I use a script for to combine into the newsletter I send once a week. Readers provide feedback through emoji reactions. I really liked the idea of was a news aggregator for the same purpose, where others could submit the stories (with a summary and a description on how it’s relevant), and have custom feeds to just see the topics they’re interested in.

The plan

Combine those objectives, and the plan became a link aggregator web app in Apps Script that would use Google Sheets as the backend database, Apps Script as the backend server, and regular HTML/CSS/JavaScript as the frontend.

The UI would be simple. A list of stories, where each story has:

A title
An upvote button and a count of upvotes
Topics
A hand-written summary by the submitter
A hand-written relevancy summary by the submitter

There would then be an algorithm that would sort these stories into:

A normal frontpage, showing recent stories by score, then older stories.
Collection pages for viewing the stories by age, by score.
Collection page for viewing the stories with a certain topic, with a certain domain.

Additionally, the upvote button would tell the server to increment the score for that article.

The sorting logic for the frontpage or collection pages ended up looking something like this:

**An example of the sorting algorithms, if the current date was 20/05/2024**

I wrote all of the code for this in about 2 or 3 hours, with a working frontend and a trivial backend. I knew this wasn’t the hard part, so didn’t want to spend more time than necessary on it. I did make sure to have a dark and light theme, though, just in case it was successful later. Colours and buttons were heavily inspired by other sites I use frequently.

Apps Script limitations

Next up was to figure out how to deploy TypeScript projects on AppsScript, with dependencies. I knew it was possible since I had done some prior research, but was unaware exactly of the limitations. To create a web app in AppsScript, all you need to do is expose a doGet(event) or doPost(event) which return HTML or API data. One you’ve got those, you can handle routing more or less the same as you would in any other web framework. The HTML returned does need to be run through Google’s sanitizer, but I found that most code I sent through worked still. There’s a remote-call setup, where client-side JavaScript in the web app is able to directly call Apps Script functions and it handles all the message passing for you.

I had an extra self-imposed limitation, I wanted the library code to work both on Apps Script and Node. The reason being that I figured that Apps Script would be a high risk target, and a regular Node environment would be far more likely for actual usage.

Here’s some problems I encountered, with the solution I landed on:

Apps Script provides a different standard library than Node, though they’re both V8 based, and is missing some simple things like the URL constructor (there is a Google alternative API instead).
- Solution: recreate the URL parser via regex compatible both on Node and Apps Script.
- fs is missing, so no reading files without using Google’s APIs. Solution: put unique keys in the source, then replace them with the contents of another file when building. For example, when you want separate JavaScript files to insert into the client-side HTML, just inline it via a build script instead.
There’s no system for handling import systems.
- Bundling is a must, though that does introduce a new problem - doGet and doPost need to be publicly available at the top level (so export then unpack from the IIFE if using esbuild), and remote-call functions must be declared as a top-level function, no re-exporting tricks.
- Separating code between different files in Apps Script is handy to reduce complexity and separate your Docs-related code from the web app related code.
Apps Script is pretty slow whenever you need to call a Google API (e.g reading data from sheets), with my naive implementation taking 3.5 seconds to process.
- Solution: use the CacheService for caching the HTML response for 5 minutes after it’s been generated.
- Caveat: it’s still pretty slow (1.5 second processing time for a relatively simple cache).
Authenticating to the backend from the client code is non-trivial.
- Solution: upvoting via the remote call helper google.script.run, rather than a normal fetch request, works quite easily.
- Caveat: due to wanting to support both Node and Apps Script environments, I had to add logic to figure out if a script is in Apps Script or not, with two different handlers.
The iframe is sandboxed, and links do not work as you’d think they would.
- Solution: relative links need to get the parent's URL and use target="_top" for them to work (set this in the <base> html tag).
- Caveat: this was the most problematic problem I couldn’t find a consistent problem for. Sometimes it worked okay, sometimes it would randomly break for some links and not others.
Clasp, Google's CLI tool for managing AppsScript, is pretty nice with a familiar push/pull/deploy setup.
Adding a .claspignore file will default to including all files as you upload to Google, including node_modules and .git/. If clasp is running slowly, that might be why.
- Solution: ignore all files, only whitelist specific files that you want to deploy.
Deployments are confusing to manage, the UI for managing it is slow, and clasp works weirdly with web app deploys.
- Solution: use "Test development" during development and then deploy to production via modifying an existing deployment rather than making a new deployment.
Google's terminology is confusing: AppsScript, scripts, JavaScript, AppsScript Web Apps and web apps are all used interchangeably and are confusing.
- It does not seem like AppsScript is heavily used by people for web apps, so finding info or tips was difficult.
- A lot of the documentation out there doesn’t seem to target developers, or are missing extra details that would’ve answer many questions I had.
<script-id>/exec and <script-id>/dev work, while <script-id>/exec/ and <script-id>/dev/ give comprehensible errors.
Security roles are pretty good, being able to limit it to specific people.

Learnings

Once I’d got all that sorted though, it worked! I had data from my Google Sheet populating the frontend I’d made, fully hosted on AppsScript. All the different urls worked as expected, and upvoting worked. The main problem though, was that the AppsScript Web App itself was just super slow, with no real way to get around it. As a result, I won’t be using it for actually sharing the deployed link aggregator with people, though I will be taking some learnings away:

AppsScript is slow, very slow. For a Web App, it’s too slow, even with the cache.
Converting from Google Docs format into Markdown is pretty easy.
Deploying non-trivial code to AppsScript requires a bundler and to be aware that it’s not got all of Node’s or even the web’s APIs.
Running things on demand might be slow, but running timed events is easy.
AppsScript lets you have a backend without all the work required to deploy one.
The remote-call setup (google.script.run) is very powerful and can really simplify a lot of your code, though the API is an old style pre-async/await one. But still, being able to directly call server functions without needing to write an server code is awesome.

Problematic Playwright

The trickiest part when I was getting ready to open source it was Playwright — a modern library for end-to-end testing. I’ve used Playwright before a bunch of times, mostly to test projects where all the interactions happen on the client with no side-effects on the backend. Once I introduced testing of the upvote system (which would change the score in-memory on the server), I discovered that several of the tests were flaky, caused by:

All tests are parallel by default.
- Possible solution: limit workers to 1, but then it’s super slow.
No way to specify that a certain browser should only run at one time (i.e don’t test this file with multiple browsers at once).
- Possible solution: the only granular control on that level is to turn the tests in a specific file into non-parallel, but that doesn’t stop multiple browsers being run at one time - it only stops one browser from running 2 tests at the same time. The only way to block that is to use 1 worker.
Some expect() calls return a promise, others a primitive value.
- Spot the problem? Sometimes you await an expect, other times you don’t, and because you aren’t looking at the return value, the compiler doesn’t warn you.
- Possible solutions: await every expect, or always assign an expect to a variable. I went for awaiting everything.

What I ended up doing was splitting my tests into parallel, those without side-effects, and serial, those with side-effects, with two separate configs and directories. Parallel ones can use as many workers as they want without impacting how the tests run, and serial use just one worker. What I ideally would’ve wanted, is the ability to run certain tests parallel, and others serial not just in one project (Playwright’s term for browsers to run against), but across all projects. So don’t start Firefox on the serial tests at the same time as the Chrome tests. The issue requesting this was closed, but I may make a new issue there once I’ve created a minimal reproducible program.

Open sourcing

It was time to write up my learnings, share them internally, then open source the relevant part of the codebase. I spent a good while doing many things on the checklist above. Check the repo for yourself, if you’re curious what I would consider to be a good example of an open sourced experiment. It has a video of how it works. It’s not perfect, and doesn’t address every item on the above list, but I think it’s good enough. That’s the thing - spend time on what you think is valuable, rather than making everything perfect.

🗹 Update your README with all the info someone might need.
🗹 Clear the licensing with your company.
🗹 Figure out whether you want to put it on your company’s org or your own.
🗹 Write enough tests to make sure that when you revisit it in a couple of years, you can be confident it works and enable CI.
🗹 Review the security practices.
🗷 Clearly label the repo for contribution guidelines.
- Since I think it’s unlikely anyone will use this code other than for the learnings, I didn’t spend time setting up contribution guidelines.
🗹 Embrace the community and get involved.
🗹 Tag releases in git.
🗹 Have an example of usage.

Since this is an npm package doing some weird things, some tricks to remember:

npm version is handy for tagging and bumping versions
npm pack will let you see what files will be included with your package. Put those you don’t want to include (e.g large images) into .npmignore
Publishing a package under your username (e.g @eeue56/simple-link-aggregator) is probably better than top level.
npm’s package site doesn’t support all the same features for rendering markdown as Github.

So, that’s it! A long article for sure, but I hope that it can help you connect the abstract advice with a real example of how to put it into action.

The Tech Enabler

Discussion about this post