Suggestions for Drupal core

From yesterday on until March 24, Google is accepting student applications for the Google Summer of Code 2007. Google pays each successful student 4,500 USD for his work on Drupal during the summer, and we try to assign each student two top-class Drupal mentors to work with.

I'm particular interested in students that want to work on Drupal core, Drupal's main distribution, so I decided to share five ideas that I think are particularly core-worthy:

  1. Improved file handling. Make it possible to mix database and file system storage on a per post basis, further abstract the storage model so we can support distributed storage solutions (like Amazon S3), apply node-level permissions to a node's files through a light-weight file.php layer. Do not implement files as nodes.
  2. Hierarchical page structuring. Remove the "book page" node type, and turn the book module into a module that is specialized at creating hierarchical content trees. The module should automatically extract a menu from the node hierarchy, set the correct breadcrumbs, help you define a hierarchical URL structure with clean URLs, etc. This module will be the de facto standard to structure pages and to create hierarchical navigation schemes in Drupal.
  3. Lightweight image/asset management. Make it easier to embed images in posts: write a lightweight asset manager to make it easy to re-use previously uploaded images, and provide a lightweight WYSIWYG solution to drag-and-drop (position) images into posts. More advanced WYSIWYG editors should be able to overwrite the one in core, and more advanced document management solutions should be able to overwrite or extend the basic asset management solution in core.
  4. Streamlined installation procedure. Remove the Drupal welcome page, and replace it by additional installation steps in the installer. The additional installation steps should query the user for basic site settings (i.e. site name, site slogan), and should provide a dedicated and simpler user interface to create the administrative user account. I'd also like to ship core with an optional install profile optimized at jump starting your installation -- for example, one that sets up a working contact form and creates a dummy about page with a clean and human-readable URL.
  5. Improved data models. If you look around, it's quite obvious that websites are becoming a selection of independent components: OpenID, Amazon S3, etc. If we have well-defined data models in Drupal core, integration with web services (like those build with Adobe's Flex) will become easier. As a first step, we need a data API that we leverage internally, so we can get better at distributed search, import/export functionality, user profiles, custom content types, internationalization, and basic scaffolding. Start with Drupal's form API and massage it into the beginning of a data API. It's crucial for Drupal's future, but the work might not be for the faint hearted.

Keep in mind that each of these ideas need to be fleshed out more before they can be considered to be a solid Google Summer of Code proposal. Also note that this is not an exhaustive list of ideas, so feel free to submit additional ideas for consideration.

Comments

xamox (not verified):

A file manager would be awesome!

zoon_unit (not verified):

I might suggest three other projects. (one of which would be a continuation of last year's project)

1) A final and robust import/export capability for all Drupal datasets. Started last summer, this utility would be a HUGE advantage for developers who want to migrate data from other sources. But it needs to be comprehensive. Views and CCK are already showing the worth of the import/export model.

2) Some way to streamline and reduce the heavy query load when rendering pages for logged in users. It's no secret that Drupal can render slowly on systems with moderately fast mysql access. (197 queries for a single page!) Perhaps adding selective node and block caching to core would be a great start. Another approach might be staged rendering. (Render the underlying page first, then the nodes, then the blocks)

3) A robust theming project. Take the integration of the Garland theme and the color module and push it to another level: the creation of a truly flexible, business level theme for core that can use sophisticated techniques such as drop down menus, icons, layout resizing, and an increased number of regions to add flexibility. Advanced theming is the single biggest hurdle Drupal faces to gain the public acceptence it so richly deserves.

Philippe Jadin (not verified):

Hierarchical content trees is important for the mass adoption of Drupal IMHO. Most people are used to hierarchical file managers, and they usually understand quickly how a CMS UI works if it provides a tree management system.

I'd be interested to provide UI help. (We've made this for Thinkedit, and tried a lot of solutions, including "AJAX"-trees.)

On a side note, it would be great that this content tree allows anything (like views, contact forms, files, ...) in order to have a complete navigation available.

venkat-rk (not verified):

So much of work has been done on the category module for bringing hierarchical content structures into Drupal.

It may be worth while for anyone thinking of coding this to perhaps work with bdragon, the current module maintainer to implement a slightly stripped down version of category (minus some of its complex logic) for inclusion into core. Or at least consider starting with its code.

Dries:

The category module implements categories as nodes, which is not something I want to consider for inclusion in core. I don't think the category module is a good base to start from -- much better to start from the book module, as outlined in my post.

venkat-rk (not verified):

Oh, yes, I saw that part about files not being nodes too. Thanks for the clarification.

Benj (not verified):

As mentioned above, I think serious efforts should be put on reducing load for logged in users.

I suggest integration of a robust, simple to use block caching mechanism.

Seth (not verified):

The blockcache module does this already, in 4.7.x and 5.x versions. Works quite well, and for one site I'm using it on, it was the only way to stop the site from crawling, due to massive DB queries on every page load.

pwolanin (not verified):

Do you envision this new module including the navigation elements that the current book module offers?

I put time into working on a better version of book module that (as you may remember) wasn't quite core worthy by the time the last code freeze hit.

Anyhow, I have sandbox code that could be a start. As for as the nice URLs -- would that mean adding support for the pathauto API into this core module? Is pathauto being added to core?

Dries:

I know that you've worked on this, and I'd like to see us/you pursue this further. If I remember correctly, your patch was still trying to do too much (i.e. book permissions) so I think we'll want to break this down into smaller steps, and get them included in core.

As for handling URL aliases, it's been a while since I looked at the pathauto module's code. What I'd like to see happen is that when you create an "about" page (i.e. ?q=about) with a child page called "management", that the new module automatically prefixes the alias with ?q=about/ (i.e. ?q=about/management).

pwolanin (not verified):

I think part of the problem was not making clear enough what the proposed patch *did* as opposed to what it *enabled*. The altered module does very little more than the current book module, but enables per-book (per-hierarchy) permissions, etc. since each is now readily distinguishable. I'm hoping it will also have much better performance since it would let you load all the hierarchy info in a single query.

Dries:

One can argue whether per-book permissions should be implemented by the book module -- it is better to use node-level permission for that. So rather than adding things on top of books, we need to fundamentally rethink the concept of books (and remove the "book page" node type) ...

Dries:

I haven't looked at the relativity module yet, but based on its description, it looks like a good starting point.

pwolanin (not verified):

Yes, I agree with removing the node type. My sandbox code is already node-type agnostic.

I also don't think any permissions should be implemented by this module- rather by being able to distinguish two books/hierarchies - for example the "customization and theming" handbook vs. "About Drupal documentation"- you can either use different node types for each hierarchy or use a contrib access control modules to allow different users permission to post in the different hierarchies.

greggles (not verified):

What you describe about paths and pathauto is currently possible with the pathauto module (and has been since before I took it over).

When I saw the outline module get created I immediately recognized that it would be ideal for it to replace the book module and submitted a reminder that, when it is ready, outline should implement the pathauto hook and provide similar patterns to the [book] and [bookpath] which do exactly what you desire.

I believe the outline module is about what you are talking about here - a SOC applicant who took the outline module to completion would probably be a good way to get this project implemented.

Dries:

Looks like the outline module would be a great module to start from. We'd still need a UI review, a performance analysis (does it use vancode?), a migration path, a bit of a rewrite, etc.

dldege (not verified):

#1 and #2 are basically what I was proposing in my submission - https://drupal.org/node/120677.

File management would use file API(a3, local, db, etc.)
Document was controlling the hierarchy - as proposed it was just for files and folders but could be a general node hierarchy.

Files were nodes but this was to facilitate document sharing, allow RSS, taxonomy, etc. on files. What is the difference between a file node vs. a node with a file attachment if its done in such a way that you can still just get the file when you need it w/o a node load.

Then #3 was going to be addressed by some type of file node reference to allow you to put any files(s) into other nodes using some jQuery interface or similar.

I'm not getting much feedback on the proposal and I'd be happy to keep refining it based on input. Please help me understand the flaws of my current proposal.

Thanks.

Dries:

Let me bounce the question: what is your motivation to implement files as nodes? I can't find that in the proposal.

Here are two reasons why not to implement files as nodes:

  1. For high traffic websites, serving files should be fast and lightweight. Implementing files as nodes adds an additional layer of complexity.
  2. From a user point of view, it also complicates various node listing pages.

I've read your proposal and it looks good, although it wouldn't hurt to flesh it out a bit more. It would make it easier to provide constructive feedback.

This work is important, and much of it would be eligible for inclusion in core. In fact, submitting incremental or intermediate patches for inclusion in core should be part of your proposal (if that's what you intend to do). It will take time and energy to get your patches committed, and that should be taken into account. It also makes the proposal more valuable as it is guaranteed to affect more end-users.

With the 'files as nodes' discussion in mind, here is one suggestion: make 'files as nodes' your last goal, rather than your first goal. Only start working on 'files as nodes' once the other parts of the proposal made it into the development version of Drupal. This buys us more time to weigh the advantages and disadvantages of 'files as nodes'.

Hope that helps, and good luck with your proposal!

dldege (not verified):

Dries, thanks for the feedback. The motivation for files as nodes is as follows:

  1. There are many times when the file content (image, video, audio, etc.) is the focus of the node - that is the file is the content and as such isn't really an attachment to another node. This is seen currently with the audio, video, image, and other modules that implement node types just to provide specialized handling of media types. So the idea was why not just make all files nodes and provide APIs to allow developers to do the custom viewing, etc. without needing to implement file upload, storage, etc. as each currently do.
  2. With files and folders as nodes its now easy to take advantage of node facilities - RSS, Taxonomy, Search, etc. For instance you might want to subscribe to a feed of the latest audio files (file nodes) added to a site.
  3. With the files as nodes and a transparent (flat) backend file storage solution (fileapi with a3, etc.) its easier to implement the hierarchical node management since the files in the store need not care about this hierarchy.

Conceptually, this just seems to make sense in the Drupal idea of content = node. The fact that there are so many contrib modules implementing file type X as node seems to back that up to some degree.

Now, I've struggled with this - where is the line? Should user pictures be nodes (conceptually it might make sense) but in practical terms, I'd say no.

Regarding your two reasons against files as nodes.

1. For high traffic websites, serving files should be fast and lightweight. Implementing files as nodes adds an additional layer of complexity.

I thought about this and while you could serve files directly from A3, etc. you would need some processing during page requests that had file nodes on them to map the information about the file (node_load()?) to an A3 path. I can see this as a scaling issue.

2. From a user point of view, it also complicates various node listing pages.

Could you elaborate on this?

It sounds like a better approach might be more in line with what you suggest.

  1. A file manager that allows for files in drupal that aren't nodes but aren't just node attachments either.
  2. A Drupal interface for managing the above.
  3. A method for attaching files to a node from the above file store - not the current upload attach method.
  4. A specialized file node (my current proposal) that instead of having you upload the file there, lets you browse the file store, attach a single file (see #3) , and then uses the document API hooks I'm proposing to allow document extension modules to specialize the display of the file. Maybe 3 and 4 are all part of one file attachment module and a file node is still not necessary?
  5. Remove folder node from proposal and don't address file /folder hierarchy and allow that to be addressed by a separate project which is a generalized node tree module.
styro (not verified):

What do others think of 'multi heirarchies' for book navigation? (Just like taxonomy terms can do.)

That way a node could appear in more than one book. This would help with content reuse, and allow books to be better focused on a specific topic without needing to duplicate content (i.e. background information) between them. I'm thinking of DITA topic maps as an example of that concept supporting content reuse.

It does complicate things a bit though, i.e. if you load a node that is in multiple books, what context do you show for it? A node could possibly show the specific book navigation when accessed through a book URL i.e. /book/bookid/nodeid, and loading /node/nodeid> without context (i.e. from a search or taxonomy listing) could show which books it was in maybe.

Rick (not verified):

I know that this has been beaten to death but what about better forums in core? Nothing fancy, just something that is more forum like that doesn't turn people off right off the bat.

Rick

Dries:

I support patches that improve Drupal core's forum and comment module. It makes sense to integrate concepts from the flatform module and forum access module, especially if we can refactor the comment module's display modes at the same time. E-mail notifications would be useful too -- both for forums and blogs.

Jakob Petsovits (not verified):

The subscriptions module already handles e-mail notifications for forums, blogs and/or other content types and taxonomy terms.

I'm not sure if providing e-mail notifications in the forum and blog modules themselves is the best idea ever, I'd prefer a separate notification API (be it in core or in contrib) that those other modules can easily hook into. Increases code sharing and consistency.

webchick (not verified):

Just want to point out a couple things:

  1. This is NOT the optimal place to be talking about this, since most students aren't going to see it. ;-)
  2. There are two places to put SoC ideas, where students are being directed: proposal ideas (proposals that need work) for vague concepts, ideas, etc. and proposed projects for actual fleshed-out project proposals. You might also want to cross-post on the SoC-2007 group when you have a full-fledged proposal.
  3. You need to hurry if you want to propose an idea for a student to take on, as applications are already coming in, and applications are only accepted through March 24 (a week from today).
Robert Douglass (not verified):

I want to second webchick's point in #3: students interested in participating in GSoC this year need to concentrate very intensely on getting applications turned in.

Drupal is, in my opinion, one of the very best OS projects to work with when doing GSoC, so it really is worth your effort to write a good application and work hard at finishing your project. I'm not sure how aware students are of the timeline. There isn't a lot of time left to apply.

Thanks for the great ideas and direction, Dries! We'll keep our eyes out for project proposals that address any of the points you've made.

Barry Jaspan (not verified):

Dries, on the same day you published this post, I published my article on my Schema module:

http://jaspan.com/schema-project-database-abstraction-reflection-and-mi…

My goals for the initial initial of schema.module are (1) provide inspection/reflection of the database model by representing tables in a data structure and (2) eliminate the need for maintaining separate CREATE TABLE statements for every separate database. There are of course many more things that can be accomplished in later versions.

Your feedback will be valuable. What do you want to see in a data API? Does my approach meet your goals?

Dries:

I only glanced over it but it looks like a good start. We might want to look at other database schema definition languages for ideas on syntax, or even to adopt them.

Personally, I'd prefer to approach this from the opposite direction; let us start by figuring out the internal representation of the data (as used by themes and modules), and then work towards a database abstraction layer. The internal representation should be optimized by speed and ease of use, and might not necessarily map directly onto the proposed database abstraction definition, although it would be nice if it would.

Phil Ewert (not verified):

Just coming out of my guts : How about a Ruby (on Rails) integration or support for Zend or other Frameworks?

More peripheral : How about a 'Wiki' with what I would call 'micro-discussion features with multi-decission-modes'. It would work like that :

Someone starts a Discussion with a proposal Wiki-Text. So he writes an proposal-text of lets say a headline, an introduction, the argument as a text of -say- 3 paragraphs and a resume.

Now everybody is allowed to make suggestions on every level of the text : e.g. on a single letter (for orthographie), a single word, a half-sentence, a complete sentence, a paragraph, a part of the complete proposal (e.g. the Resume or the introduction) and on the complete Text.

In this way, discussions can be held at every level of granularity.

Out of this a text version for every single suggestion would emerge.

One could have a certain decission-mode for all suggestions (like democratic voting, total agreement, etc. ) or for some this way for others that way.

Like that discussion should become much more efficient.

Sorry for my daydreams ;o) and all the best to you dudes !

bismigalis (not verified):

What we now know as 'node' in Drupal, is`t really a node but 'page'.
If we will have 'node' as the really 'node', we will not have problem to have "node as file", "node as term", "node as image" etc

Goutam Dey (not verified):

I would like to suggest a project for the installer

Problem
- Creating Install Profile from current installation

Why
- All Drupal users are not developer, and we should provide and take care of the requirements of all those non-developers and or implementors of the drupal (I personally think that because of them drupal achieved the fame as best of breed CMS)

Proposal
- the module should be capable of as follows:
-- Export the entire bundle of Drupal Distro as a tar ball
-- Take care of the settings of Distro like mark as default etc.