Talk:DPL (Dynamic Page List)

What is it?
Basically, DPL is an extension that enables editors to write lists and tables that are automatically ("dinamically") updated with data extracted from one or more wiki pages. It does other things, too, but not now. :-)

Cool, what do you (Aldyron) want?
I've done some practical experimentation with it, and since it can be a bit daunting at first, and it has a few pitfalls as well, I thought I'd share what I've discovered here, so as to give it a little more visibility, and jump-start the brave ones that will try to wield it. This post will have a practical approach; the theory can and will follow if interest in the subject shows up.

What do I (an editor) need?
First of all, the manual. You won't get out of dpl alive, without it. You don't need it, just yet, but you will. When you do, also use the search function, the FAQ, bug reports and everything else. It's a wiki manual.

OK, where do we start?
Let's start with a working example of what you can do with dpl, the most polished one I have available at the moment: User:Aldyron/Sandbox/Category Waist items experiment, which I now consider production-ready. That page is intended as a replacement for the standard waist item category page as it was some time ago. Check out the two pages' sources. What's the difference? Creating a similar page for feet items, wrist items or head items is just a matter of copying the waist items' page and changing a couple of words. Since all ten pages are essentially identical, a template could be easily written to take care of that, so that the entire content of one of those pages would be something like.
 * The old page needs to be updated by hand after every game change. That is, a human editor has to check and possibly edit each item in the list after every update and patch, in addition to basically doing the same work (let's call that "duplicate work") on each individual item page. That's both onerous for the editor(s), historically unrealistic, and prone to error due to the sheer amount of data to check and update.
 * The new page removes the need for the duplicate work. The human editors still have to update the individual item pages, but then dpl does the other half of the job and automatically compiles the table *from* the individual item pages. Since the table gets cached for a (configurable) maximum of 1 day, the new page will always be at most 1 day out-of-date with the rest of the wiki. Besides, being a sortable table instead of a list, each user can click and shift-click the column headers to sort the data any way they need.

I noticed that works like a database
In case you didn't notice, dpl queries like the one in the new page basically treat the whole wiki's collection of pages as a database, selecting records (pages), rows (template instances and other things), pulling out columns (template fields and other things), joining tables (dpl queries).

Now what?
Now let's get a bit more ambitious. Instead of having ten pages for ten item slot lists, we can add one column ("Slot") to the tables and have only two pages, one for clothing and one for jewelry, each listing its own five categories in one table.
 * All belt, boots, cloak, gloves and helm pages employ Template:Named Clothing.
 * All bracers, goggles, necklace, ring and trinket pages employ Template:Named Jewelry.

Give me more
Now a bit more ambitious still.

Check out both templates' Usage sections: barring one field, the two field lists are the same, and they're functionally identical. We can merge the two pages into just one, listing all ten categories in one single table.

If we use some care in two certain fields in every monster page, we can write a template that, when transcluded into a quest page, will automatically find the relevant monster pages and output only the monsters appearing in that quest, with full details (possibly even picture thumbnail).

We can write a self-compiling master list of all monsters in the wiki, similar to this one.

We can?
No, we can't.

Not yet, at least. That's where the pitfalls I talked about at the start of this post come into play.
 * dpl queries are limited to 500 results. 500 monsters, 500 jewels, 500 weapons. There are almost 800 named weapons in this wiki. More than 2500 monsters. There are ways around and under this (configurable) limit, but they need work, or petitioning, or they defy the purpose IMHO. Keep it short.
 * I'm not 100% sure (I'm a four-month newbie, as far as wikiware goes), but there's a time limit to a wiki page generation. Slow query, time-out, no page. Keep it fast.
 * All wiki pages are generated in a certain way (browse template expansion limits), which has its own set of limitations: page too complex, page too wide, page too long, or all three. Hit those limits and the rest of the page doesn't come out well, if at all. Compound that with a dpl table like the waist item category experiment I linked above: all the complexity of a single item (well under the limitations), multiplied by n items on one single page, can easily hit the complexity or expansion limit. Keep it simple.
 * Some tables I have in my mind would require some manual work on the raw data (the individual pages) in order to show up "all straight and aligned", so to speak, in lists or tables. Several pages that deal with the same kind of topic (say, consumables) are way too heterogeneous right now to query via dpl. Keep it consistent.

So why are you bothering me with this useless thing?
In my experimentations (here and here), which I consider largely successful, I've nonetheless obtained mixed results, sometimes confusing (e.g.: a more complex query turning out lighter on the parser; one little, blasted single page that disrupts a table, and I can't find anything out of place in it). I'm tracking the causes and in the process I'm learning a lot on templates, categorization, wikiware in general.

dpl was fairly easy to start with, and I got IMHO great results in a surprisingly short time, with just a few flaws here and there, but killing those flaws is taking a lot of time. That's why I still haven't put anything in production for the general public's use. Il meglio è l'inimico del bene. I was trying to streamline and polish everything before publishing, but then I realized the only way to see if dpl can serve our users well is doing a user test: putting it at the user's disposal and hoping they like it. You editors are users too, so you are involved in the outcome. dpl can take the way you look at your pages' sources and turn it inside out, for a purpose.

I believe it's worth it. Do you? No, I changed my mind, don't tell me. ;-)

Discussion

 * To be totally honest, I'm hoping to kill the post template before too long. It's old, no longer effective, and not used for its original purpose properly anymore (was only meant to be used in forums and not on all talk pages).  I'm currently waiting for the new Flow extension to come out and be stable to see if perhaps we should switch to that on talk pages or switch to the already existing LiquidThreads extension which is what is used on MediaWiki wiki itself currently.  The limits of DPL are what kills this for me.  I don't like the idea of breaking master list down to the point where it would be useful.  I would prefer to write a pywikibot to compile the lists and post them or write a JavaScript that uses near blank pages (just a general description for those without JS) and creates the table in real-time (similiar to DPL but using JS instead of PHP).  I'll have to dig through the guts of the DPL and see if it can be made to work, but I'm not optimistic at this time. ShoeMaker (Contributions • Message) 07:39, November 26, 2013 (EST)

No Post template, aye. Fine with me. @ Cru121 Aldyron (Contributions • Message) 10:50, November 26, 2013 (EST)
 * @ ShoeMaker
 * 1) Column selection: yes, you can, and I can tell you that after a couple of head-stratching tries, it takes more time to explain it than to do it.
 * 2) Since doing that requires editing and saving a template (a shared resource), I encourage you to copy User:Aldyron/Sandbox/Template Named Clothing phantom (i.e. the phantom template) and User:Aldyron/Sandbox/Category Waist items experiment (i.e. the query page) to your sandbox, and editing the include parameter in the dpl query to point at your copy of the phantom template (explanation follows). This will a) enable you to learn by doing instead of not learn by watching, and b) avoid unexpected editing clashes in case two or more editors experiment on the phantom template at the same time.
 * 3) Edit the query to remove the last four column *headers* (not the columns themselves, yet) from the table parameter. That is, delete ",Is an Epic version of?..,Crafting,Crafting upgrade,N". If you request a preview now, dpl will detect the code was modified, the query cache will be refreshed with the results of the new code, and you'll see that the headers are gone, but the columns are still there.
 * 4) Edit the phantom template to remove the fourth-, third- and second-to-last columns. That is, delete the last three rows. N.B.: do not assume that, in phantom templates, one field (a.k.a. column) will always be all alone by itself on a row; for example, the special abilities field (the first one) is on two rows for formatting purposes. If you request a preview *of the query page* now, you most likely won't see any difference, because the query was cached last time you edited *the query code itself* (not the phantom template), so you'll see either the original query or the last query preview you requested. In that case, scroll the page down and click on the refresh link immediately after the table (if you can't read the words "result from cache" and there's no "refresh" link in that line, the table you're seeing is fresh; in doubt, request another preview). Uh, if this is not clear enough, I'll try to explain again. :-)
 * 5) Edit the query to remove the format parameter. Delete the whole row. That will remove the additional formatting I had put in, "in every row, add a column containing a center-aligned progressive record number to the right of all other columns". BTW, it was there to check if I was hitting the system-wide record limit (500 belts in this case... we're not even close), otherwise I agree it's superfluous and undesirable.
 * 6) Would a page with less columns load faster: yes, but in most cases unnoticeably so. If the table or list is both long (many rows) and complex (many templates for the MediaWiki parser to expand), then yes, it is noticeable. In theory, you can measure how much time is spent in dpl and how much outside of dpl.
 * 7) The line placed just after a freshly cached table shows the time spent by the server in dpl, e.g. "time elapsed = 12.306780099869". Every time *the query* gets refreshed, the server spends some resources (CPU, RAM, disk I/O, time) to refresh it. On the other hand, every time *the page containing the query* (and not the query itself) gets refreshed, the query comes from dpl's own cache, and the server needs much less resources to do that. AFAIK, there's no indication of the time needed to get the query from the cache. BTW, the cache can be disabled on a query-by-query basis, but I wouldn't do it except for very small queries or experiments; of course if you do, every time the query or the page get refreshed, the server re-assembles the table from scratch.
 * 8) Almost at the end of the HTML source (not the wikitext source) of the complete end-user page, the HTML comment (e.g. "Served in 15.158 secs.") shows the time spent by the server both in and out of dpl.
 * 9) N.B.: the two times are not necessarily comparable. For example, if you use Firefox to refresh the query (which implies the server putting a fresh query in the dpl cache and sending Firefox a new HTML page) and then ask Firefox for the HTML page source, Firefox sends another HTTP request to the server, which sends back another, newer copy of the requested page, which will contain the query taken from the refreshed-a-moment-ago dpl cache. Now, if you instead make sure that the HTML source you are seeing is of *the very same page* Firefox is currently showing you, then you can meaningfully compare the two times. See here for a pointer; I can expand the concept if needed.
 * 10) N.B.: the times mentioned above are a function of some factors you have no control on whatsoever, such as number of users using the server, kind of user activity, whatever else the (physical) server is doing, and ultimately instantaneous server load: the more loaded, the slower anything coming out of it, including your pages and queries. That means the times will vary somewhat no matter what you do (or do not).
 * 11) Can you combine fields: yes, you can. In particular, can you show me a hand-crafted mock-up of what you have in mind? In general, now, there are a few ways to do that, too.
 * 12) See the source of User:Aldyron/Diagnostics/Items consistency/Pages using Template Parrying. In that query, I used a technique which is awkward, but that spares you from writing a phantom template; I find that adequate until it gets too unwieldy. I employ the tablerow parameter, which gives you a quick way to define the formatting and content of each field. It needs a comma-separated list of arguments. %% means "the current field's content as it is", so the first %% represents the Namespace field, the second %% is Title, the third %% is Parameter 1 (Magnitude). Now, just before outputting the third %% (center-aligned), I store Parameter 1 in a variable called "P Magnitude", using a parser function which will contribute no visible output to the table. That variable will be valid until re-set (say, next table row) or cleared. I then use the variable in the next field (which thus is not taken directly from individual page data) to insert Template:Parrying with the right value in the table row.
 * 13) User:Aldyron/Tutorial/DPL parameters sports a step-by-step, fully-walked-through, more advanced example of that, including a sub-query. That example is also more powerful, more awkward and decidedly more unwieldy. That was my main motivator to explore phantom templates. ;->
 * 14) User:Aldyron/Sandbox/Quests experiment is a work in progress, so it's not fully functional, but see the second column, "Adventure pack". That's a combined/compound/remixed column resulting from User:Aldyron/Sandbox/Template Adpack phantom (see the source, please). Reason? Check out User:Aldyron/Diagnostics/Quest consistency/Pages using Template Adpack (no need to read the source, if you don't want to), and note that there are two different parameters you can use in Template:Adpack to show the same result: no problem in an individual quest page, problem in a table of quest rows. I'm not sure what I did was the right way to solve that, but it works, it's reasonable and I still haven't found anything better. :->
 * 15) Does refreshing take a lot of resources: no, it doesn't. But.
 * 16) The largest query in my mind (complete and unabridged master monster table) surely would. I can't currently see any non-crazy way to achieve that particular goal.
 * 17) The medium ones (c.a.u.m. item table, c.a.u.m. weapon table) likely would. Possibly achievable by optimizing my experiments. Indeed, that's my main line of investigation.
 * 18) Anything else goes from practically instant to barely above the MediaWiki timeout limit (so, no good). *I personally* as a user would raise the timeout (I assume that's configurable, but hadn't the time to check). *I personally* as a programmer would heartily concur. *I personally* as a system administrator would try anything else to slim the queries down before allowing that. :-D
 * 19) The limitations are there for three main reasons, all three important: a) keeping the user experience snappy (or at least as snappy as possible); b) preventing intentional (vandals) and unintentional (thoroughly non-perfect queries) denial-of-service situations; c) limiting server billing (someone pays for the power line). That said, I believe writing useful queries that stay within the bounds is possible (or will be in the near future), except for the system-wide 500-record limit. That one I'll petition to get raised, when everything else is taken care of (otherwise it would be kind of pointless to raise it).

Will reply tomorrow: daily allotment of so-called "free time" depleted. :-) Aldyron (Contributions • Message) 10:50, November 26, 2013 (EST)
 * @ ShoeMaker

My points: --Cru121 (Contributions • Message) 12:46, November 26, 2013 (EST)
 * Thanks for the detailed explanation. However, I'd prefer you to do the DPL, I was just wondering what is/is not possible. :) I am not a developer, I'd prefer you to figure out the complicated stuff and I'll fix typos.
 * As for the limits: I don't care about the limits. Let's use DPL where the article is within limits. Let's not use it if it's off limits. I frankly don't care about list of all monsters. List of trinkets for example is useful though.
 * Regarding Shoemaker's plans: When we have a better tech, we test it, if it's better, let's use the new thing. Until then, let's use what we have.
 * Would template substitution instead of transclusion help alleviate some load issues? Is it even possible?

Not a developer, OK, then let's say what I wrote above was intended for whoever will want to give dpl a try. :-) Regarding substitution, read my reply to Shoemaker below, coming soon. Aldyron (Contributions • Message) 16:57, November 27, 2013 (EST)
 * @ Cru121

Aldyron (Contributions • Message) 18:18, November 27, 2013 (EST)
 * @ Shoemaker
 * 1) Limits of DPL kill it: agreed, sort of. I *love* master lists, especially because I have a hard time getting the big picture without them. Indeed I'm doing everything I can (within a reasonable free-time frame :-) to make them work as needed. Experimenting was the only way to find out about the limits in the first place and then to try either to overcome them or work within them. After banging my head on them for a while, I "sense" there is room for improvement in my queries, my developer sense is tingling, if that makes any sense to you, but I still have to find out what it is I'm not considering. Some things don't add up, and fortunately they point in the good direction. I'll find out, eventually.
 * 2) JS instead of PHP: I'm game, if you think that would work well. My only objection (based on absolutely nothing but general considerations) is that whatever the language chosen, assembling times couldn't be lower than the total database I/O time, unless I'm mistaken. Wouldn't that bring us to the same time-limit pitfall as dpl's?
 * 3) Since I am (as I may have already said once or twice) a wikibeginner, instead of waiting 'till I've become more wikiproficient myself, I'd like to throw around a few ideas (some of them just crazy longshots) that take off from everything written above, and see if any sound feasible to you.
 * 4) In an effort to lighten server load, I've experimented a bit on static data in order to check out its viability. Copy-and-pasted up to 6000 rows, with no problem at all, and everything stays way under all limits, both parserwise and timewise. That table is admittedly very simple, and thus not significant enough. I was planning to concoct a version of that page full of templates to expand (which would of course exceed parser limits), and another one with those same templates already expanded (substitution, maybe, as Cru121 suggested), to check whether a table which is static both in the database and in the parser stays within boundaries. Trusting my developer sense, my *guess* is it would, but guessing isn't enough, verification is needed. I haven't found any quick-and-dirty way to do so, and the only promising lead I have is Extension ExpandTemplates. Do you have any other ideas how to quickly do that? Otherwise, I'm thinking of a couple slow ways, which I'll gladly take care of, not wanting to burden anyone else.
 * 5) Assuming the previous idea turns out good, maybe dpl itself could be exploited to churn out a table or list with templates already expanded. That would require juggling subst clauses inside dpl: that's an example of me throwing longshots, since I have no idea if that's even possible, yet. Since I believe the idea has its benefits for any wiki anyway, I'm going to file an RFC and possibly a feature request to Gero (dpl's current maintainer).
 * 6) Regardless, I believe applying that to our data set would exceed reasonable time limits for a good-behavioured on-the-fly page... but not for a schedule-based page. Say, trigger once every 61 minutes, check if pages referenced by static table are newer than static table itself, if yes reassemble static table. Is that something akin to what you were talking about? I have no idea how MediaWiki schedules (if any) work, and I haven't searched for it, yet. Could you jump-start me by giving me a pointer? In particular, assuming they do exist, can scheduled pages be allowed to exceed the normal page-generation time limit?
 * 7) Ah, thanks in advance. :-)