How to manage and grow a technical remote team

When you need to grow your team, be it from just yourself or from a few collaborators, you need to make sure the new team members fit in. This is even more important if you grow a remote team.

My experience in growing smallish teams, mostly in software development and scientific collaboration, is positive. I think most technical teams can work remotely. And I also believe that hiring without being in the same room is possible for many roles.

How to manage a remote team? If you work on a project with much coordination, I recommend to have a short daily (video) standup. Just let every team member tell what they did the day before, and what they plan to do today. That works surprisingly well to keep everybody up to date. And this also creates accountability for each team member.

Remote work can be lonely. Some team members thrive in such an environment, whereas others prefer much direct communication. Just be aware of the personal team member preferences, and help each team member find a role that fits their personality.

Pair team members on tasks: For each major task, pair two team members to work on this task. As an example, for a web application, let at least two people work on the front end, and also at least two people on the back end. Pairing helps team members to learn from each other, and helps the team to keep knowledge if a team member leaves.

Hiring and getting new members into the team should be handled by a proven, standard process. Be prepared that some hires don’t work out! Sometimes it’s not your fault, nor the fault of the new hire. Maybe it’s just the interaction between the new hire and the old team. What worked very well for me is to hire new team members on a project base first. When such a first project works out, you can hire them longer term.


How to remove mojibake from mysql dump files

Sometimes a mysql dump file contains latin-1 text encoded wrongly as UTF-8 unicode encoding. That leads to some characters out of the ASCII range being garbled and displayed as two UTF-8 characters. This is called mojibake. E.g. “l’Oreal” for “L’Oreal”.
When you dump such a database using:

mysqldump --opt -h localhost -u dbuser -p db > dump.sql

You get the mojibake in this dump file.

How to correct the mojibake

Luckily there is a python3 tool called ftfy to remove mojibake and replace it with the garbled characters.  Ftfy is a nifty work of programming and educated guesswork, because ftfy must guess what the original encoding was, and which transformation to apply to the dump file.  To make this guessing work, ftfy uses a line by line approach to guessing any encoding mistakes.  However this approach does not work with standard mysqldump files, because they can contain extremely long lines. This causes the ftfy guessing algorithm to not work effectively, because the algorithm assumes that lines are not very long in order to guess if there are wrong character sequences in a line.

The right mysqldump

Reduce the length of the lines in mysqldump by using the command line parameter –skip-extended. This parameter writes multiple SQL INSERT statements per table, such that each INSERT is on a new line. The drawback is that dumping and restoring a database is slower.

mysqldump --opt --skip-extended -h localhost -u dbuser -p db > dump.sql

This dump still contains the mojibake and is now ready to be processed further.

Using ltfy

First install lftfy:

virtualenv -p python3 venv
source ./venv/bin/activate
pip install ftfy

Then prepare a file to call ftfy on your mysql dump file (with thanks to Pielo):

import ftfy
# Set input_file
input_file = open('dump.sql', 'r', encoding='utf-8')
# Set output file
output_file = open('dump.utf8.sql', 'w')

# Create fixed output stream
stream = ftfy.fix_file(
	input_file,
	encoding=None,
	fix_entities=False,
	remove_terminal_escapes=False,
	fix_encoding=True,
	fix_latin_ligatures=False,
	fix_character_width=False,
	uncurl_quotes=False,
	fix_line_breaks=False,
	fix_surrogates=False,
	remove_control_chars=False,
	remove_bom=False,
	normalization='NFC'
)

# Save stream to output file
stream_iterator = iter(stream)
while stream_iterator:
	try:
		line = next(stream_iterator)
		output_file.write(line)
	except StopIteration:
		break

Then you just need to call:

python dbconvert.py

Thereafter you can just restore the dump file into mysql:

mysql -h localhost -u dbuser -p db < backup.sql

Beware of duplicates introduced by removing mojibake in SQL

Sometimes removing the mojibake can result in duplicate rows in the database, even when there was a UNIQUE KEY constraint or UNIQUE index.  The reason is that different two character encodings for a single UTF-8 character can be recognized by ftfy.  This then leads to duplicate rows.

Normalize Python DB API calls between SQLite and MySQL/PGSQL

Somehow there is no single formatting for parameters/variables in python DB API calls.  Looking at the DB API specification, it seems that specific database drivers can be written  with ‘?’ or ‘%s’, or even other conventions.

paramstyleMeaning
qmarkQuestion mark style, e.g. ...WHERE name=?
numericNumeric, positional style, e.g. ...WHERE name=:1
namedNamed style, e.g. ...WHERE name=:name
formatANSI C printf format codes, e.g. ...WHERE name=%s
pyformatPython extended format codes, e.g. ...WHERE name=%(name)s

How do you know the ‘paramstyle’ of your database connection?  For SQLite use:

>>> import sqlite3
>>> sqlite3.paramstyle
'qmark'

Using a library like sqlalchemy circumvents this problem.  In case you need to use the DB API drivers, a simple function which formats SQL strings can be used to enhance portability. The function below does this. It is a quick 80% solution. Adapt as needed.

def fs(sql_string):
""" format sql string according to db engine used """
# normalize dbapi parameters, always use %s (MySQL, PG) in sql_string,
# set escape_string to r'?' for sqlite
# e.g. %s and ?
# escape_string = r'%s '
escape_string = r'? '
# autoincrement_string = "AUTO_INCREMENT" # for MySQL
autoincrement_string = "AUTOINCREMENT" # for sqlite
return_string = sql_string
return_string = re.sub(r'%s ', escape_string, return_string)
return_string = re.sub(r'AUTO_INCREMENT', autoincrement_string, return_string)
return return_string

It is possible (and likely) that the SQL for the DDL will be different between e.g. SQLite and MySQL.  For example, in MySQL you’d use ‘AUTO_INCREMENT’, whereas in SQLite you’d use ‘AUTOINCREMENT’.  By extending the above approach to also replacing these strings, you can further abstract the database code.