Archiving a Discourse Forum

Discourse is a forum platform, which allows threaded discussions. It looks nice, and works smoothly. However it is somewhat hard to archive such a forum.

There are a couple of posts showing how to archive Discourse:

In the end I made a new wget script to download a Discourse forum. The key thing which the other solutions lacked was that they did not include all page pre-requisites like the pace css. In order to do that, I tweaked the wget script as:

time wget --mirror \
      --page-requisites \
      --span-hosts \
      --domains=PRIVATE-DISCOURSE.COM,discourse-cdn.com \
      --convert-links \
      --adjust-extension \
      --compression=auto \
      --reject-regex "/search" \
      --no-if-modified-since \
      --no-check-certificate \
      --execute robots=off \
      --random-wait \
      --wait=1 \
      --user-agent="Googlebot/2.1 (+http://www.google.com/bot.html)" \
      --no-cookies \
      --tries=3 \
      https://YOUR.PRIVATE-DISCOURSE.COM

The key thing missing in the other scripts was --span-hosts to enable downloading CSS and other static content, and adding --domains=PRIVATE-DISCOURSE.COM,discourse-cdn.com to limit downloading content to domains directly associated with your own Discourse instance.

When you use the script, you need to replace YOUR.PRIVATE-DISCOURSE.COM and PRIVATE-DISCOURSE.COM with the URL and second level domain of your own instance. For Example: discussion.example.com and example.com.

The script will take it’s time, and you easily need to wait a couple of hours for the download to complete. This is by design to not overload your Discourse instance.

Good luck!

Atlassian Connect Security – April 2021 changes

Jira and Confluence app security and external apps [updated – see below].

Photo by Liam Tucker on Unsplash

This article describes how security for apps for Jira and Confluence works. Jira and Confluence are the two main Atlassian applications, and in use by large and small teams. A large ecosystem of apps working together with and enhancing Jira and Confluence is built by external partners and available at https://marketplace.atlassian.com . Atlassian needed to built an integration system (called Connect) in order to secure the interaction of the external apps and their own systems against unauthorized access. I was part of a group of programmers who discussed Atlassian security, and we decided to engage with Atlassian in order to improve external app security. We found a potential issue in an Atlassian protocol, and as a result of our findings Atlassian has published new requirements for external app security: https://community.developer.atlassian.com/t/action-required-atlassian-connect-vulnerability-allows-bypass-of-app-qsh-verification-via-context-jwts/47072/

The Connect security system relies on multiple security layers. In this article I describe some of the technical aspects of the system. Atlassian requires transport encryption through TLS for all data. That means that all data flowing between Atlassian and an external app is encrypted. However this is not enough to guarantee security. Additionally most data are secured by a JWT (JSON Web Token). A JWT is a cryptographically signed token, which contains claims / data. For example, a JWT could contain an expiration time, and a data payload like a customer ID, and data from e.g. a Jira issue or data from a Confluence document. The data in the JWT is not encrypted, and if an adversary could hack TLS, they could read the JWT data. However the JWT is cryptographically signed, and if an adversary would tamper with the JWT, an external app must detect this and reject the JWT.

Atlassian and the external app both have access to a (kind of) secret key to sign the JWT’s. As long as nobody else has access to this secret key, only Atlassian and the external app can sign the JWT which is then mutually accepted by Atlassian and the external app.

However not all interaction between an app and Atlassian is through a JWT. Some data are passed as URL arguments, and others are passed through POST requests. In order to secure the data passed as URL arguments, Atlassian calculates a cryptographic hash of the URL (called QSH – Query String Hash), and adds this hash to the JWT claims. When an external app receives such a JWT with a signed URL hash, the app can check if the URL has been tampered with. In principle it is straightforward to check all URL’s with this method. However in practice there are multiple edge cases, where you can’t calculate a hash of the URL securely.

Atlassian has published a notice on April 14 2021 (https://community.developer.atlassian.com/t/action-required-atlassian-connect-vulnerability-allows-bypass-of-app-qsh-verification-via-context-jwts/47072/), which requires external apps to improve their security. Here are the steps an external app developer has to take:

  1. If you use Atlassian Connect Express (ACE) or Atlassian Spring Boot: Update to the latest available version. These versions contain enhancements which enable additional security.
  2. Make sure you signal your readiness with the new requirements to Atlassian by adding:
    "apiMigrations": {
    "context-qsh": true
    }

    To your app descriptor (usually atlassian-connect.json).
  3. Check all code paths were you either send or receive a request to or from Atlassian’s servers, the Connect iframe, or your frontend. All direct server to server communication (i.e. you app server to/from an Atlassian server) and Connect calls for the iframe MUST check for and verify the QSH. If you use the Atlassian node sdk called ACE, your code must use:
    app.get('/path/to/webpage', addon.authenticate(), (req, res) => {
    // return your response / html
    });
    However if your app server is called by frontend code, you can’t check for the specific QSH, and you must do a different check, by calling either:
    app.get('/path/to/frontend/page', addon.authenticate(true), (req, res) => {
    // return your response / html / JSON
    });
    Or:
    app.get('/path/to/frontend/page', addon.checkValidToken(), (req, res) => {
    // return your response / html / JSON
    });
    Both methods are equivalent.
  4. If you use your own implementation of the Atlassian JWT signing,
    make sure you verify the QSH on the server. Do NOT attempt to sign
    a JWT with a QSH on your frontend, this can make you app insecure!

Update 2021-04-21: You can read more about the CVE score here: https://confluence.atlassian.com/pages/viewpage.action?pageId=1051986099 and here https://nvd.nist.gov/vuln/detail/CVE-2021-26073

Feel free to contact me if you have further questions.

Trello limits: Cards, Boards, …

What are the limits for Trello? It seems not many users know (or care). Per 2020-04-22 the limits for Trello are:

actions
totalPerMove
500000
admin
support
twoFactorReset
prefix
twoFactorReset
limit
10
bucketWidthMs
43200000
attachments
perApiRequest
36000
perCard
1000
perBoard
36000
size
regular
10485760
large
262144000
boards
totalPerMember
4500
perMembersApiRequest
10000
totalMembersPerBoard
1600
cards
openPerList
5000
totalPerList
1000000
openPerBoard
5000
totalPerBoard
2000000
perApiRequest
20000
totalPerMove
10000
withActionsPerApiRequest
300
checkItems
perChecklist
200
checklists
perApiRequest
5000
perCard
500
perBoard
2000000
customFieldOptions
perField
50
customFields
perBoard
50
email
e2bPerHour
1000
organizations
perApiRequest
5000
input
text
maxLength
16384
labels
perBoard
1000
lists
openPerBoard
500
totalPerBoard
3000
orgs
totalPerMember
850
totalMembersPerOrg
4000
freeBoardsPerOrg
10
reactions
perAction
1000
uniquePerAction
17
routes
enterprise
export
prefix
enterprise-export
limit
100
bucketWidthMs
120000
stickers
perCard
70

Why do I list these limits here? There are a couple of questions in the Trello community about this. Some of the answers to the questions are not correct.

However I have not found an easily accessible website listing the Trello limits.

Photo by Matthew Guay on Unsplash.

How to apply Atlassian’s Security Requirements for addons

Around November 2019 Atlassian has published new security requirements for cloud applications. The requirements come into force on January 1, 2020. However Atlassian has not published guidance on how to apply these security requirements in their officially supported javascript framework ACE (Atlassian Connect Express).

In this post I share my interpretation of the security requirements in the form of javascript code for ACE. I’ll only cover the requirements which need to be handled in a typical ACE installation on e.g. Heroku. Heroku already provides TLS, and things like not exposing secrets in source repositories fall outside the scope of this article. The code uses two npm packages: helmet and express-sslify. You normally need to add the code below to your app.js.

// security configuration https://developer.atlassian.com/platform/marketplace/security-requirements/
import helmet from 'helmet';
import enforce from 'express-sslify';
import nocache from 'nocache';

After importing these, you need to configure the libraries:

...
// define helmet middleware early
// use helmet only in poduction/prodtest
let prodEnv = false;
if ( app.get('env') == 'production' || app.get('env') == 'prodtest') {
    prodEnv = true;
}
console.log('prodEnv is: '+prodEnv+' env is: '+app.get('env'));

// Sets "Strict-Transport-Security: max-age=34560000; includeSubDomains".
// HSTS minimum is one year, use 400 days
// fourhundredDaysInSeconds := 400 * 24 * 60 * 60 = 34560000
const fourhundredDaysInSeconds = 34560000;
if (prodEnv) {
    // for Heroku: trustProtoHeader: true, otherwise not necessarily
    app.use(enforce.HTTPS({ trustProtoHeader: true }));
    app.use(helmet.hsts({
        maxAge: fourhundredDaysInSeconds,
        includeSubDomains: false
    }));
    app.use(helmet.referrerPolicy({
        policy: ['no-referrer']
    }));
    app.disable('x-powered-by');
    console.log('started security policy');
}

...

// Mount the static files directory
const staticDir = path.join(__dirname, 'public');
app.use(express.static(staticDir));

// use the nocache after mounting the staticDir
app.use(nocache());

A short note on using nocache after mounting the staticDir: the static files usually are resource files like css and javascript and don’t contain changing data, so caching is allowed. I think this use cases falls under the exemption in article 5. of the security requirements. Using noCache after staticDir does not set the Cache-Control header for these resources.

The requirement that the application must authenticate and authorize all requests can be easily handled by ACE:

app.get('/configure', addon.authenticate(), function(req, res) {
     ...
    });

I.e. use addon.authenticate() in your route definitions. However make sure you don’t require authentication for atlassian-connect.json.

One further hint: Don’t use helmet.frameguard. ACE apps run as an iframe in the Atlassian application, and must be embeddable in an iframe.

Photo by REVOLT on Unsplash

How to upgrade ACE to AUI 8.5.1

Atlassian has developed AUI, the Atlassian User Interface. This is a set of UI components and a front end library do develop applications according to the Atlassian Design Guidelines. As of 2019-11, version 8.5.1 is the most recent version.

A second component to build apps on Atlassian is ACE, or Atlassian Connect Express, a framework in javascript. This framework currently uses an outdated ACE version by default. In the head of the html delivered by ACE, AUI 5.8.12 is used:

<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="ap-local-base-url" content="{{localBaseUrl}}">
<title>{{title}}</title>
<link rel="stylesheet" href="//aui-cdn.atlassian.com/aui-adg/5.8.12/css/aui.css" media="all">
<link rel="stylesheet" href="//aui-cdn.atlassian.com/aui-adg/5.8.12/css/aui-experimental.css" media="all">
<!--[if IE 9]><link rel="stylesheet" href="//aui-cdn.atlassian.com/aui-adg/5.8.12/css/aui-ie9.css" media="all"><![endif]-->
<link rel="stylesheet" href="/css/addon.css" type="text/css" />
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script src="//aui-cdn.atlassian.com/aui-adg/5.8.12/js/aui-soy.js" type="text/javascript"></script>
<script src="//aui-cdn.atlassian.com/aui-adg/5.8.12/js/aui.js" type="text/javascript"></script>
<script src="//aui-cdn.atlassian.com/aui-adg/5.8.12/js/aui-datepicker.js"></script>
<script src="//aui-cdn.atlassian.com/aui-adg/5.8.12/js/aui-experimental.js"></script>
<script src="https://connect-cdn.atl-paas.net/all.js" type="text/javascript"></script>

The update to AUI 8.5.1 is not described in the documentation. Here is the header you need to use:

<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="ap-local-base-url" content="{{localBaseUrl}}">
<title>{{title}}</title>
<script src="{{hostScriptUrl}}" type="text/javascript"></script>
  <link rel="stylesheet" type="text/css" href="https://unpkg.com/@atlassian/aui@8.5.1/dist/aui/aui-prototyping.css"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.js"></script>
<script src="https://unpkg.com/@atlassian/aui@8.5.1/dist/aui/aui-css-deprecations.js"></script>
<script src="https://unpkg.com/@atlassian/aui@8.5.1/dist/aui/aui-prototyping.js"></script>
<link rel="stylesheet" href="/css/addon.css" type="text/css" />

It’s simple once you know what to do.

How to manage and grow a technical remote team

When you need to grow your team, be it from just yourself or from a few collaborators, you need to make sure the new team members fit in. This is even more important if you grow a remote team.

My experience in growing smallish teams, mostly in software development and scientific collaboration, is positive. I think most technical teams can work remotely. And I also believe that hiring without being in the same room is possible for many roles.

How to manage a remote team? If you work on a project with much coordination, I recommend to have a short daily (video) standup. Just let every team member tell what they did the day before, and what they plan to do today. That works surprisingly well to keep everybody up to date. And this also creates accountability for each team member.

Remote work can be lonely. Some team members thrive in such an environment, whereas others prefer much direct communication. Just be aware of the personal team member preferences, and help each team member find a role that fits their personality.

Pair team members on tasks: For each major task, pair two team members to work on this task. As an example, for a web application, let at least two people work on the front end, and also at least two people on the back end. Pairing helps team members to learn from each other, and helps the team to keep knowledge if a team member leaves.

Hiring and getting new members into the team should be handled by a proven, standard process. Be prepared that some hires don’t work out! Sometimes it’s not your fault, nor the fault of the new hire. Maybe it’s just the interaction between the new hire and the old team. What worked very well for me is to hire new team members on a project base first. When such a first project works out, you can hire them longer term.


10 Elements of Highly Effective SaaS Landing Pages

How can you build a Software as a Service (SaaS) website? A website which is an effective tool to acquire customers? This article lines out advice from Rob Walling and Mike Taber from their podcast Startupsfortherestofus, “Episode 256 – The 10 Elements of Highly Effective SaaS Landing Pages”.

Before diving into this advice, remember that this is just about the technicalities of the landing page.  Even more important is to write for the benefit of the user/reader/customer.  A very good introduction is Amy Hoy’s post “6 Critical Mistakes You’re Making with Your Landing Page” .

The method described below – using your main website as a landing page for new visitors – is targeted at smaller SaaS companies. If you are Google or Trello, a different set of rules apply. As with most advice, the method presented below depends on the context.


10 elements for effective SaaS landing pages

The elements are described as a form of recipe (or algorithm for the developers under us) to get to an effective landing page easily. These elements go from the top of the page to the bottom. It is a prescriptive order of starting with the headline, having a visual element either next to it or below. And then go on to elements 4, 5, … Once you have everything in place, and your website is live, observe, analyse and adjust to the actual visitors and users you get.

1: Design the landing page for first time visitors.

You want to guide the visitor down a specific path. For first time visitors, try to educate them, get their interest, have them figure out if it’s for them, and then get them to take the next step in the action. “The purpose of your website is not to get them to buy your app, it’s to get them to come back to your website.” Don’t make your login box prominent, your existing users will know how to log in.

2: Gripping headline.

Write your headline with this simple, three part formula:

  • make a promise in the headline
  • have an action word, e.g. a verb
  • either have a directly stated “You” or “implied you”, talk directly to the person reading the headline

Here’s an example headline from hittail.com: “Guaranteed to increase your organic traffic” and the promise is that it’s guaranteed to increase organic traffic, the action (verb), is “Increase”, and then there’s a “You”, it’s “your”. Don’t talk about your app! Talk about the user. Another example headline for HitTail that’s less good: “The best long tail SEO keyword tool.” It’s worse because there’s no promise, no action and no you.

3: Have at least one visual element at the top of the page.

You can use a short video, less than 90 seconds, or an image. For example an image that describes what your app does just with a couple of circles and some arrows edited onto a screenshot of your app. Don’t use an image of a random person sitting there staring at a screen.

Another option is to have an e-mail capture form related to the functionality of your app. That’s a lot harder to do, because most apps don’t have this functionality. The e-mail capture must actually start a demo of the functionality of your app. An example of this is the bidsketch.com homepage. There is a form where you enter your e-mail, and it sends you a proposal basically demoing the functionality of the app.

4: Benefits.

List your app’s benefits using the rule of three, either 3, 6 or 9 benefits. The fewer benefits, the better. An example benefit is “Get more estimates”, together with one sentence describing what that means. Another example is “Deliver better projects and grow your business”. If your app serves separate, identifiable market segments like SEOs, internet marketers, and e-commerce, you can have multiple sets of benefits, one set for each specific market segment.

Keep your benefits down to earth. The biggest mistake with benefits is to write them so high level that it doesn’t make sense, like “Saves you time, makes you more money”. This is to generic, because every app does this. Be specific, write to the goals your users have, e.g. “Win more estimates.”

If you have trouble stating benefits, it’s often because of a confusion between features and benefits. For example if you make a list of benefits and a lot of them seem like features, you then take that feature and say: “We built XYZ feature so that …” “So that” is the key phrase here and by completing that sentence you transform the feature into a benefit statement.

5: Social proof.

A Social proof element is not optional. Most SaaS websites use social proof because it is known that it enhances your app’s credibility. When you have many (happy) users, it’s quite easy to get social proof. However it’s hard when you get started. Social proof takes different forms:

  • Testimonials from customers, preferably with head shots and the name linked to their website. Don’t have anonymous testimonials. Make sure to have the testimonial edited down to just the core part, 10 to 15 words maximum.
  • Press logos, like a logo and “featured on …”, “used by …”.
  • Vanity metric: If your app has analysed a billion keywords, sent out $100 million of proposals, … You can be creative here, as long the metric is directly related to the overall benefits of your app.

6: Features (optional).

Traditionally, a list of features was central to most landing pages. Nowadays such a list is mostly placed on a “Features” page. When you put features on the landing page, use very specific features that set your app apart. And specifically say “No other app has this feature.” or “These are the features that our customers like most.” On the landing pages features are optional, but you have to be very specific somewhere by linking to a tour or a features page with the specific mechanics of your app. Because if you’re just talking about benefits, no one knows what your app actually does.

7: Top nav with four or fewer items.

The top navbar or menu should just include four or fewer items (including your home link!). For example you could have “Home” on the left. The next item would be one of “Tour”, “How it works”, “Features”, explaining the basic flow and workings of your app. Then you should have a link to “Pricing”. Optionally you can have one of “Why [App Name]?” or “Contact”.

8: Have an exit path at the bottom.

Having an exit path means having e.g. a button to a next step you want the user to take. It could a specific journey that you are leading a visitor through (“Home”, “Tour”, “Pricing”, “Free Trial”). Have one main and maybe a secondary call to action at the bottom of your page. By not having this exit path at the bottom, it makes your reader get to the bottom, look around, probably click on something in your footer or scroll all the way back to the top and then you’ve lost them. An alternative is to embedded a call-to-action or exit path in the text in the middle/end of the page, if it makes sense from a visitor flow perspective. Make such an exit obvious by e.g. using a big orange button that says “Increase your traffic” or “Take a tour”.

Such an exit path should be on all pages, not just the Home page.

9: Limit the number of links and buttons.

Basically limit the number of links and buttons that you have on the page. This limits the number of decisions that visitors need to make as they’re reading and navigating your landing page.

10: Put everything else into your footer.

Your app’s footer should contain links to important pages like “About”, “Contact”, “Terms of service”, “Blog”, “Affiliates” etc. Don’t put an e-mail sign-up form in the footer. Nobody will use it. It’s much better to use a little JavaScript widget and have it pop up at certain times and control when to make it visible and when not to. At last, a sitemap may make sense from an accessibility and SEO point of view.

Further reading


How to remove mojibake from mysql dump files

Sometimes a mysql dump file contains latin-1 text encoded wrongly as UTF-8 unicode encoding. That leads to some characters out of the ASCII range being garbled and displayed as two UTF-8 characters. This is called mojibake. E.g. “l’Oreal” for “L’Oreal”.
When you dump such a database using:

mysqldump --opt -h localhost -u dbuser -p db > dump.sql

You get the mojibake in this dump file.

How to correct the mojibake

Luckily there is a python3 tool called ftfy to remove mojibake and replace it with the garbled characters.  Ftfy is a nifty work of programming and educated guesswork, because ftfy must guess what the original encoding was, and which transformation to apply to the dump file.  To make this guessing work, ftfy uses a line by line approach to guessing any encoding mistakes.  However this approach does not work with standard mysqldump files, because they can contain extremely long lines. This causes the ftfy guessing algorithm to not work effectively, because the algorithm assumes that lines are not very long in order to guess if there are wrong character sequences in a line.

The right mysqldump

Reduce the length of the lines in mysqldump by using the command line parameter –skip-extended. This parameter writes multiple SQL INSERT statements per table, such that each INSERT is on a new line. The drawback is that dumping and restoring a database is slower.

mysqldump --opt --skip-extended -h localhost -u dbuser -p db > dump.sql

This dump still contains the mojibake and is now ready to be processed further.

Using ltfy

First install lftfy:

virtualenv -p python3 venv
source ./venv/bin/activate
pip install ftfy

Then prepare a file to call ftfy on your mysql dump file (with thanks to Pielo):

import ftfy
# Set input_file
input_file = open('dump.sql', 'r', encoding='utf-8')
# Set output file
output_file = open('dump.utf8.sql', 'w')

# Create fixed output stream
stream = ftfy.fix_file(
	input_file,
	encoding=None,
	fix_entities=False,
	remove_terminal_escapes=False,
	fix_encoding=True,
	fix_latin_ligatures=False,
	fix_character_width=False,
	uncurl_quotes=False,
	fix_line_breaks=False,
	fix_surrogates=False,
	remove_control_chars=False,
	remove_bom=False,
	normalization='NFC'
)

# Save stream to output file
stream_iterator = iter(stream)
while stream_iterator:
	try:
		line = next(stream_iterator)
		output_file.write(line)
	except StopIteration:
		break

Then you just need to call:

python dbconvert.py

Thereafter you can just restore the dump file into mysql:

mysql -h localhost -u dbuser -p db < backup.sql

Beware of duplicates introduced by removing mojibake in SQL

Sometimes removing the mojibake can result in duplicate rows in the database, even when there was a UNIQUE KEY constraint or UNIQUE index.  The reason is that different two character encodings for a single UTF-8 character can be recognized by ftfy.  This then leads to duplicate rows.

Normalize Python DB API calls between SQLite and MySQL/PGSQL

Somehow there is no single formatting for parameters/variables in python DB API calls.  Looking at the DB API specification, it seems that specific database drivers can be written  with ‘?’ or ‘%s’, or even other conventions.

paramstyleMeaning
qmarkQuestion mark style, e.g. ...WHERE name=?
numericNumeric, positional style, e.g. ...WHERE name=:1
namedNamed style, e.g. ...WHERE name=:name
formatANSI C printf format codes, e.g. ...WHERE name=%s
pyformatPython extended format codes, e.g. ...WHERE name=%(name)s

How do you know the ‘paramstyle’ of your database connection?  For SQLite use:

>>> import sqlite3
>>> sqlite3.paramstyle
'qmark'

Using a library like sqlalchemy circumvents this problem.  In case you need to use the DB API drivers, a simple function which formats SQL strings can be used to enhance portability. The function below does this. It is a quick 80% solution. Adapt as needed.

def fs(sql_string):
""" format sql string according to db engine used """
# normalize dbapi parameters, always use %s (MySQL, PG) in sql_string,
# set escape_string to r'?' for sqlite
# e.g. %s and ?
# escape_string = r'%s '
escape_string = r'? '
# autoincrement_string = "AUTO_INCREMENT" # for MySQL
autoincrement_string = "AUTOINCREMENT" # for sqlite
return_string = sql_string
return_string = re.sub(r'%s ', escape_string, return_string)
return_string = re.sub(r'AUTO_INCREMENT', autoincrement_string, return_string)
return return_string

It is possible (and likely) that the SQL for the DDL will be different between e.g. SQLite and MySQL.  For example, in MySQL you’d use ‘AUTO_INCREMENT’, whereas in SQLite you’d use ‘AUTOINCREMENT’.  By extending the above approach to also replacing these strings, you can further abstract the database code.