Automate WMF wiki creation
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	tstarling
	Feb 22 2017, 3:51 AM

Description

Wiki creation is quite an involved process, documented on wikitech. I think, at least for certain common cases, the task could be almost completely automated.

For uncomplicated creation of new language editions under existing projects, with default configuration, the following tasks need to be done, none of which require complex human decision-making:

Reconfigure many services by pushing configuration changes to Gerrit, and deploy those commits
- mediawiki-config: wikiversions, *.dblist
- WikimediaMessages
- DNS
- RESTBase
- Parsoid
- Analytics refinery
- cxserver
- Labs dnsrecursor
Run addWiki.php. This script aims to automate all tasks which can be executed with the privileges of a MW maintenance script.
Run Wikidata's populateSitesTable.php. It should probably be incorporated into addWiki.php.
Run labsdb maintain-views
Update wikistats labs

So at a minimum, you need to write and deploy commits to 8 different projects, run three scripts, and manually insert some rows into a DB in a labs instance.

Despite there being no human decision making in this process, the documentation requires that you involve people from approximately four different teams (services, ops, wikidata, analytics).

In my opinion, something is going wrong here in terms of development policy. The problem is getting progressively worse. In July 2004, I fully automated wiki creation and provided a web interface allowing people to create wikis. Now, it is unthinkable.

Obviously services are the main culprits. Is it possible for in-house services to follow pybal's example, by polling a central HTTP configuration service for their wiki lists? As with pybal, the service could just be ~~a collection of static files on a webserver~~ etcd. Even MediaWiki could profitably use such a central service for its dblists, with APC caching.

So let's suppose we could get the procedure down to:

Commit/review/deploy the DNS update
Commit/review/deploy a configuration change to the new central config service.
Run addWiki.php

Labs instances needing to know about the change would either poll the config service, or be notified by addWiki.php. WikimediaMessages could be updated in advance via translatewiki.net.

(Thanks to Milos Rancic for raising this issue with me.)

Details

	Subject	Repo	Branch	Lines +/-
	Run populateSitesTable.php on other wikidata client wikis	mediawiki/extensions/WikimediaMaintenance	master	+17 -0

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T165585 Make creating a new Language project easier
Open		None	T158730 Automate WMF wiki creation
Declined		None	T158751 Make populateSitesTable.php more robust
Declined	PRODUCTION ERROR	None	T122520 Error running `extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https` on jbowiki
Resolved		• Gilles	T144479 Ensure thumbor container access is preserved by mw filebackend setzoneaccess
Open		None	T235487 Enable Wikidata, cxserver, and Parsoid on new wikis as early as possible
Open		None	T238158 Identify which parts of the "Add a wiki" procedure can be integrated with the deployment pipeline
Resolved		Ladsgroup	T253439 Eliminate the toil in WMF wiki creation
Stalled		None	T264964 Automate creating the initial config patch for new wikis
Stalled		None	T223602 Define variant Wikimedia production config in compiled, static files
Resolved		Joe	T236104 Cache of wmf-config/InitialiseSettings often 1 step behind
Resolved		Krinkle	T239301 Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…)
Resolved		Krinkle	T169821 Try to make wmf-config/wgConf's per-wiki configuration cache redundant
Resolved		Krinkle	T246968 Decide intended config for wmgUseGlobalAbuseFilters (wmf-config dblist conflict)
Resolved		Ladsgroup	T308932 Iteratively clean up wmf-config to be less dynamic and with smaller settings files (2022)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 22 2017, 3:51 AM

Krenair subscribed.Feb 22 2017, 3:56 AM

Amire80 subscribed.Feb 22 2017, 6:50 AM

Nikerabbit subscribed.Feb 22 2017, 7:38 AM

Reedy awarded a token.Feb 22 2017, 9:17 AM

Reedy mentioned this in T158751: Make populateSitesTable.php more robust.Feb 22 2017, 10:32 AM

Reedy created subtask T158751: Make populateSitesTable.php more robust.

aude subscribed.Feb 22 2017, 10:38 AM

Change 339144 had a related patch set uploaded (by Reedy):
Run populateSitesTable.php on other wikidata client wikis

https://gerrit.wikimedia.org/r/339144

gerritbot added a project: Patch-For-Review.Feb 22 2017, 10:41 AM

Glaisher subscribed.Feb 22 2017, 1:49 PM

• mobrovac edited projects, added Services (watching); removed Services.Feb 22 2017, 2:45 PM

MF-Warburg subscribed.Feb 22 2017, 3:41 PM

Ebe123 subscribed.Feb 22 2017, 5:14 PM

Krinkle subscribed.Mar 1 2017, 9:25 PM

daniel subscribed.Mar 1 2017, 9:43 PM

Krinkle added a subtask: T144479: Ensure thumbor container access is preserved by mw filebackend setzoneaccess.Mar 2 2017, 5:01 AM

Much of this should be doable with our regular config management system. With puppet the deployment part however is not that easy to control and automate. Kubernetes might improve the situation in that regard.

Paladox subscribed.Mar 2 2017, 8:55 AM

tstarling mentioned this in T156924: Allow integration of data from etcd into the MediaWiki configuration.Mar 3 2017, 4:07 AM

Probably better to use etcd than a separate web host, since that seems to be the current standard solution. See above related task and also T149617: Integrating MediaWiki (and other services) with dynamic configuration

tstarling updated the task description. (Show Details)Mar 3 2017, 4:13 AM

@tstarling I agree, dblists is one of the things that could be stored in etcd and read from there. On the other hand, it's such a simple and relatively stable list that we could also decide to maintain this as a simple configuration file that we distribute across the cluster in a standard format, and we expect every application to read from disk.

Say we create on every node /etc/wmf/dblists.yaml (just a random name/format) which contains all the info that we need for each and every application, and then all apps read it and are able to autoconfigure themselves based on those values.

I think that a "rolling restart of applications to pick up the new config" is an acceptable step here (ops need to be involved anyways).

Aren't dblists already in a standard format (newline delimited plain text) that we distribute across the cluster via scap?

In T158730#3078459, @Legoktm wrote:

Aren't dblists already in a standard format (newline delimited plain text) that we distribute across the cluster via scap?

Yes, but scap only sends them to MW-related hosts. If we moved them to something like etcd or /etc/wmf/dblists.yaml like suggested above, every application would have this data. This could be useful for services that don't need to know/speakin MediaWiki or have its code but want to know a list of all wikis it needs to care about.

In T158730#3078564, @demon wrote:

In T158730#3078459, @Legoktm wrote:

Aren't dblists already in a standard format (newline delimited plain text) that we distribute across the cluster via scap?

Yes, but scap only sends them to MW-related hosts. If we moved them to something like etcd or /etc/wmf/dblists.yaml like suggested above, every application would have this data. This could be useful for services that don't need to know/speakin MediaWiki or have its code but want to know a list of all wikis it needs to care about.

Also: unless you have scap/multiversion on your system as well, the format for doing dblist math (all - something, etc) isn't available to you and you have to replicate the logic. A standard distribution/format for this avoids that issue.

• MZMcBride subscribed.Mar 7 2017, 2:57 AM

The point is that updating dblists via gerrit and running scap is one of the avoidable steps in the task description. I imagine etcd would have structured data about each wiki, and the canonical map from domain name to wiki ID. To figure out exactly what structured data should be in there, we need to survey all the services in my list above, but for mediawiki-config it is dblist membership (e.g. $wikiTags in CommonSettings.php line 165) and wikiversions.json.

In T158730#3070043, @Joe wrote:

I think that a "rolling restart of applications to pick up the new config" is an acceptable step here (ops need to be involved anyways).

I don't think there should be any intelligence involved in the technical process of creating a wiki. I'm not sure what you mean by "a rolling restart of all services" -- if you mean stopping each service and starting it again, then I suspect that would require a human to consider the consequences.

Looking at Parsoid for a case study, I see that it re-reads sitematrix.json on worker startup, and service-runner responds to SIGHUP by doing a rolling restart of local workers. So all we need is a way to replace sitematrix.json and send SIGHUP. Other service-runner users could be reconfigured similarly. If we can have a button labelled "send SIGHUP all services", and a brainless server monkey is allowed to press it at any time, then I guess that would be a solution. But ideally, the brainless server monkey would be replaced by a line in a script.

In T158730#3078688, @tstarling wrote:

The point is that updating dblists via gerrit and running scap is one of the avoidable steps in the task description. I imagine etcd would have structured data about each wiki, and the canonical map from domain name to wiki ID. To figure out exactly what structured data should be in there, we need to survey all the services in my list above, but for mediawiki-config it is dblist membership (e.g. $wikiTags in CommonSettings.php line 165) and wikiversions.json.

So my idea was to transform wikilist in this structured data, and store it to disc on every machine that might need it via puppet. As I said before, it should be easy enough to distribute a changed list that way.

My point about not storing this info in etcd is that we try to use etcd to manage dynamic state, not static configurations that will change a few times a year at most.

But either that or a file distributed via puppet to every relevant machine is ok anyways.

In T158730#3070043, @Joe wrote:

I think that a "rolling restart of applications to pick up the new config" is an acceptable step here (ops need to be involved anyways).

I don't think there should be any intelligence involved in the technical process of creating a wiki. I'm not sure what you mean by "a rolling restart of all services" -- if you mean stopping each service and starting it again, then I suspect that would require a human to consider the consequences.

Well human supervision is useful, but I'd expect the process to be as simple as doing a scap deploy. Ops are building a distributed execution framework (https://github.com/wikimedia/cumin) that seems like a perfect candidate for this role.

Looking at Parsoid for a case study, I see that it re-reads sitematrix.json on worker startup, and service-runner responds to SIGHUP by doing a rolling restart of local workers. So all we need is a way to replace sitematrix.json and send SIGHUP. Other service-runner users could be reconfigured similarly. If we can have a button labelled "send SIGHUP all services", and a brainless server monkey is allowed to press it at any time, then I guess that would be a solution. But ideally, the brainless server monkey would be replaced by a line in a script.

The idea would be "do a controlled rolling restart (or send a SIGHUP, depending on the software) of these services", and yes it should be a line in a script.

Liuxinyu970226 subscribed.May 3 2017, 1:04 PM

Amire80 mentioned this in T165585: Make creating a new Language project easier.May 17 2017, 11:06 AM

Amire80 added a parent task: T165585: Make creating a new Language project easier.May 17 2017, 12:08 PM

greg moved this task from INBOX to Backlog on the Release-Engineering-Team board.May 30 2017, 10:46 PM

greg edited projects, added Release-Engineering-Team (Backlog); removed Release-Engineering-Team.

greg moved this task from Backlog to Epics (ARCHIVED) on the Release-Engineering-Team board.May 30 2017, 11:51 PM

greg edited projects, added Release-Engineering-Team; removed Release-Engineering-Team (Backlog).

jhsoby subscribed.Jun 18 2017, 2:04 AM

Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 3 2017, 10:49 AM

• Gilles closed subtask T144479: Ensure thumbor container access is preserved by mw filebackend setzoneaccess as Resolved.Sep 26 2017, 1:26 PM

Liuxinyu970226 awarded a token.Feb 15 2018, 5:37 AM

Urbanecm subscribed.Oct 29 2018, 12:46 PM

rafidaslam subscribed.Oct 29 2018, 2:13 PM

• mobrovac added a project: Platform Team Legacy (Watching / External).Dec 20 2018, 12:01 PM

Zache subscribed.Apr 18 2019, 8:11 AM

Xbspiro subscribed.Apr 18 2019, 11:21 PM

Meno25 subscribed.Apr 30 2019, 11:56 AM

Yupik subscribed.May 17 2019, 7:44 PM

Yupik mentioned this in T223664: Process of adding new languages to wikiprojects without having to go through Wikipedia, translating the UI, etc..May 17 2019, 9:01 PM

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team.Jun 12 2019, 11:39 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Epics on the Release-Engineering-Team-TODO board.Jun 12 2019, 11:41 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

Amire80 mentioned this in T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes.Jul 23 2019, 12:42 PM

Amire80 mentioned this in T212881: addWiki.php broken creating ES tables.Aug 10 2019, 9:38 AM

I created around ten wikis so far, the process of creating wikis is extremely fragile, complicated and stressful. Before automation, this has to be fixed first. Let me give you some examples:

The first time I created a wiki, someone removed --wiki=aawiki from documentation in wikitech, I ran addWiki.php and it didn't work because it needed an existing "dummy" wiki to work on. Being my first time, I went with my home wiki (fawiki) thinking it doesn't matter (It shouldn't, right?). At the middle of creation, it exploded because it created the database on s7 (fawiki) but tried to read it from s3 (dblists, config). Someone told me it has to be aawiki so I ran it again and it worked fine-ish until replication to labs broke because now we have two hywwikis, one on s3 and one on s7.
The second time, I was like "Okay, I go with a wiki that's on s3, mediawikiwiki" (I needed mediawikiwiki because it was on group0, I didn't want to backport a change on addWiki.php). The creation exploded because mediawikiwiki had something special with OAuth or CentralAuth, took me hours to find out what's wrong.
For the next four or fives times I created, I ran into T212881: addWiki.php broken creating ES tables and had to manually change pointer of text table to some random thing so it doesn't fatal on main page. You can't do anything there, you can't edit the main page, you can't delete it (I made myself admin and had ask stewards to demote me). There's nothing you can do.
The only time that I created a wiki after the fix got deployed, it broke at the very end on updating interwiki cache because the created wiki was an RTL wiki and we didn't add the wiki to rtl.dblist (1- It wasn't documented that you need to do it otherwise things explode 2- I didn't even know the wiki I'm creating is an RTL wiki). Thankfully fixing this wasn't hard.
Every time creating the wiki breaks, it halts at middle of the long list of actions, you need to fix the bit, and then copy paste everything to eval.php or shell.php. It's extremely dangerous to run arbitrary code in production unless you know exactly what you're doing which brings me to the next point:
In order to handle this issues in real-time, in production, you need to have a very good knowledge of the infrastructure and everyone has strengths and weaknesses. If something breaks with ElasticSearch which addWiki.php also handles, I have no idea how to fix and proceed.

Hope these notes help.

Peachey88 subscribed.Sep 28 2019, 12:28 AM

Krinkle awarded a token.Sep 28 2019, 3:29 AM

Amire80 mentioned this in T234641: Wikimedia Technical Conference 2019 Session: Continuous Delivery/Deployment in Wikimedia: The Future of the Deployment Pipeline.Oct 7 2019, 7:13 PM

Krinkle unsubscribed.Oct 7 2019, 7:41 PM

Amire80 mentioned this in T235520: Wikimedia Technical Conference 2019 Session: Continuous Delivery/Deployment in Wikimedia: The future of the wiki creation process.Oct 15 2019, 2:49 PM

Amire80 mentioned this in T238158: Identify which parts of the "Add a wiki" procedure can be integrated with the deployment pipeline.Nov 12 2019, 9:58 PM

Amire80 added a subtask: T238158: Identify which parts of the "Add a wiki" procedure can be integrated with the deployment pipeline.

Amire80 mentioned this in T238255: Unconference: How can we automate new wiki creation.Nov 13 2019, 8:10 PM

Some discussion about it at T238255: Unconference: How can we automate new wiki creation.

Amire80 mentioned this in T239605: The N'Ko language doesn't appear in the iOS Wikipedia app.Dec 10 2019, 7:55 PM

@Amire80 I suppose it would be useful to summarise the outcome of the T238255 session on this task :)

Ladsgroup mentioned this in T245911: Create a wiki for Wikimedia Community User Group Greece.May 4 2020, 10:41 AM

Ladsgroup closed subtask T253439: Eliminate the toil in WMF wiki creation as Resolved.Oct 8 2020, 2:50 AM

An update from the subtask which might of interest here:

So with some recent changes the bot creates sub tickets for data storage, and parent tickets for RESTbase, pywikibot and wikidata (it doesn't automatically close them though). Also the bot creates patches automatically for anlaytics refinery, DNS, wikimedia messages and CX server. You can see the artwork of the bot in T264859: Create Inari Sámi Wikipedia. The only thing that's not done yet is automating initial config patch which I want to avoid as it's extremely complicated, we can pick it up once wiki configs are moved to yaml files (T223602: Define variant Wikimedia production config in compiled, static files). I create a follow up for that one.