[BBLISA-jobs] Part time contracting in Marlborough, Massachusetts
Konrad Rzeszutek Wilk
bblisa at darnok.org
Wed May 6 20:47:37 EDT 2015
The Xen Project has a number of internet-facing systems and also an
automated test facility.
We need someone to do our system administration. We have several
weeks' worth of setup/fixup work, with more later in the year, and
probably ~0.5-1 day a week ongoing admin tasks.
We would like to contract a small company, ideally. If we hire an
individual we will need some kind of backup for when they're
away/sick/whatever, because we don't have anyone else physically
nearby.
We need someone who is physically near enough Marlborough, MA, that
they can easily visit our rack there.
[...]
The resources to be managed in more detail:
1. Internet facing systems
About half a dozen Rackspace VMs (Rackspace give us complimetary
hosting), running Debian (mostly wheezy right now), performing a
variety of tasks:
* source code hosting (shell accounts, git, gitweb, some hg,
homegrown commit email generator)
* blog hosting (wordpress)
* wiki (mediawiki)
* role mail aliases, dns, etc.
* mailing lists (mailman, mhonarc)
* a few other minor VMs
We currently have no automation (!) for these and due to a shortage of
effort have been doing essential things only.
Obvious tasks that need doing:
* Review how everything is set up, fix any obvious problem
* Set up some automation
* Set up some monitoring
* Backups have been done by Citrix IT staff which ought to be fixed
* Ongoing maintenance (eg upgrades to jessie in due course),
firefighting, etc.
2. The test facility:
This is a single rack in Earthlink's data centre in Marlborough.
We plan to add a second rack this year.
In terms of hardware:
- Two moderate-sized servers running Xen with VMs for
infrastructure services, test controller, database, etc.
These servers have Rocketport multiport serial cards to
provide serial console logging etc.
- External (8-port) and internal (48-port) managed switches
- APC PDUs
- 24 x86 test boxes: 12 different kinds of machine, in pairs
- One 4U homebrew ARM crate containing 4x arndale and 4x cubietruck
devboards, PDU relay board, etc. etc.
- Rack is not very tidy; previous hardware installers were less than
ideal.
Software:
- VMs are Debian wheezy.
- Controller VM runs homegrown test system called `osstest'
- Also: postgresql, dhcp, apache2 (for publishing logs)
- I have a minimal ansible setup to do the things that I
needed to do
Tasks that need doing:
- Help with the procurement of a second round of test hardware,
including specifying support equipment (more switches, cable
management, PDUs), and perhaps help with specification of
test boxes.
- Install the second round of test hardware in the second rack
- Review the physical organisation of the rack and decide what needs
to be fixed and how, and work with us to fix it (given that this
is a live system).
- Double-check the switch and pdu configurations
- Review the software organisation and decide what needs fixing,
etc.
- Make sure that we have proper backups (!) and do a test restore.
- Check our arrangements for manual failover to running on only one
of the two servers if one of the servers should die. (Our uptime
requirement is fairly low but we wouldn't want to be down for days
or weeks.) Actually test the failover.
More information about the bblisa-jobs
mailing list