Bug: TOC Fails for Identical Heading Names

Moved below content over from SectionTitles.

-- PeterThoeny - 16 Jun 2001

Found a bug in this feature: as the tag is generated from the actual text of the header, if there are several identically named headers then only the first one is a valid target, the rest are ignored.

Might want to change it to use consecutively numbered tags (the html doesn't need to be human readable right??), or just add a sequence number to the actual tag text.

-- EdgarBrown - 11 May 2001

Is that a bug? I simply assumed that there are no identical headers in a document.

I prefer to have human readable URLs, so in case you want to support duplicate headers you could add a consecutiv numbers to the existing anchor name.

-- PeterThoeny - 17 May 2001

Yes, it is a bug, just consider the following case found in: DataStorageForMultiLevelWikiWebs#Table_of_contents

[This topic appeared in this TWiki page, so I felt compelled to re-factor this to make clear what I meant EB: 15 Jun, 2001]

Note: The headings have been numbered (manually) as a workaround since the time this example was first incorporated on this page — to understand the original problem, imagine that "Advantages" appears three times in the TOC without any numbers to distinguish one occurrence from another. -rhk

The second set of labels make a lot of sense in a document, it is not really duplicated headers, it is the same name in the sub-headers, however you cannot distinguish from the html.

And yes, consecutive numbering would solve the problem.

-- EdgarBrown - 17 May 2001

How about a combination: in the sample given, "Advantages" is used several times so TWiki could emit "Advantages1", "Advantages2" etc.

-- DavidLeBlanc - 19 Jul 2001

How about Subwebs_Incorporated_in_FileNames/Advantages, Database/Advantages etc?

-- MartinCleaver - 20 Jul 2001

(Darn, I didn't realize until now that the TOC on that page wasn't working properly (in that you can't jump to the correct subsection). Guess I need to do more testing.)

The TOC in question and all the topics are on one TWiki page (DataStorageForMultiLevelWikiWebs). While on that page, I wouldn't want the display of either the TOC or the headings for sections to be burdened with any additional text beyond, for example, "Advantages". But, maybe, each entry in the TOC can be one of those links with a display different than the actual link, like:

[ [Subwebs Incorporated in FileNames/Advantages][Advantages]]

(Extra space added between first two "["s to defeat automatic linking.)

And of course, then the linking mechanism has to be able to find and link to the correct "Advantages" section -- the first one after "Subwebs Incorporated in FileNames".

Aside:

It will be nice someday when the automatic numbering feature works, then the first "Advantages" will be 1.1 (or I.A.), and the second will be 2.1 (or II.A.), and so on. (IIRC, such an automatic numbering feature is being discussed.) When that happens, maybe there will be a way to simplifiy the link from the TOC.

If the page in question were INCLUDEd in another page, under the first higher level heading on that page, it would be nice if the sections were then numbered as 1.1.1 (I.A.i.), 1.2.1 (I.B.i.), and 1.3.1 (I.C.i.).

-- RandyKramer - 20 Jul 2001

I haven't tried this, and I won't be able to try it today, but a workaround might be to put the three sections with identical subsection names on three separate pages and then include them on one page and put the "main" TOC on that page.

20010720 Update: Nope, this does not work! And, in retrospect, it seems there was no good reason to expect it to have worked. Oh, well.

-- RandyKramer - 21 Jul 2001

Did I miss a resolution to this bug? If not, could the line number the header occurs on be used in some way to create a unique name? Is the line number in the original source (or included source files) available to makeAnchorHeading and handleToc which call makeAnchorName? If the line number is available, and the same number can be retrieved when these functions get called then this could be solved by adding the line number as a parameter to makeAnchorHeading.

If that is not possible, how about having makeAnchorName use an associative array to store all of its headings when called from handleToc (add a parameter to makeAnchorName to tell it how its being called). Something like this may work:
# For each call to makeAnchorName that comes from makeToc execute:
$anchorName =~ s/^(.{30})(.*)$/$1/o;  # limit to 30 chars
$SectionNames{$anchorName)++;
push(@anchornames, $anchorName . $SectionNames{$anchorname})
return ("$anchorName" . $SectionNames{$anchorName})

When it gets called from makeAnchorName, it will just shift the name off the left side of the @anchornames array, leaving it empty (we hope) for the next go around. We need to clear the associative array after the page is rendered. If we can guarantee that all the anchorName will be shifted off the stack by makeAnchorHeading, then we can just erase the %SectionNames array when the @anchornames stack is empty. It would still make me feel better if there was some positive way of clearing these variables any time a new page was viewed. This variable clearing is critical for proper operation under mod_perl, but doesn't matter for cgi operation.

This method also assumes that every call to makeAnchorName done by handleToc is done in the exact same order by makeAnchorHeading, otherwise the links from the TOC will be really messed up. I think this is true since they both operate off the same input data and seem (in my limited viewing of the code), to have the same decision criteria, but I am probably missing something.

-- JohnRouillard - 26 Aug 2002

Here's a workaround: Use invisible HTML comments to make the headings unique. Example:

<H4>Advantages<!--SubWebs Are Directories--></H4>
[...]
<H4>Advantages<!--SubWebs Incorporated in File Names--></H4>

Seems to work for me (still using the ol' 2001 TWiki code, though).

-- ClausBrod - 06 Aug 2003

This problemn has far reaching implications in the code. I would advise that if you really want to do this, you use something like the TocPlugin which is designed to handle these sorts of complexity. TOC headers are really very simple.

-- CrawfordCurrie - 17 Feb 2005

Hm, guess I should file this as bug on develop, right? wink

-- FranzJosefSilli - 20 Sep 2005

This should be solved together with RelativeHeadingLevelsforINCLUDE. Maybe some kind of hidden AutomaticallyNumberedHeadings would help to make TOC headers unique.

Why do we need special markup for different hierarchy levels anyway? couldn't this be made some more generic by introducing start and stop elements.
%STARTSECTION{name="sectionname" title="Heading 1" numbered="on"}%
This is a named and numbered section.

%STARTSECTION{title="Heading 2" numbered="on"}%
This is a numbered subsection.

%STARTSECTION{"Heading 3"}%
This is an unnumbered subsubsection.
%STOPSECTION% <!-- End of subsubsection -->
%STOPSECTION% <!-- End of subsection -->
%STOPSECTION% <!-- End of section -->
This would render (depending on which heading level it's included) for example like this:

0.1 Heading 1

This is a named and numbered section.

0.1.1 Heading 2

This is a numbered subsection.

Heading 3

This is an unnumbered subsubsection.
Of course there could/should be a more Wiki-like short-notation and the TopicObjectModel should be kept in mind.

Hm, I'm dreaming of something like:
===#.sectionname Heading 1
This is a named and numbered section.
===# Heading 2
This is a numbered subsection.
===+ Heading 3
This is an unnumbered subsubsection.
+===
#===
#===
While dreaming: why not additionaly provide a markup for starting the numbering at special characters
===7 Heading starts at number 7
==={-}C Heading starts with "-C" (only usefull to support special characters between numbers of different heading levels)
==={Chapter }ix Heading starts with "Chapter ix"
Well, I guess the START- and STOPSECTION approach is more realistic. wink

-- FranzJosefSilli - 30 Nov 2006

The ExplicitNumberingPlugin provides autonumbered headings, as described above. I shall verify whether the heading number is included in the label...

Your dreams can easily be made reality, using that plugin...

-- ThomasWeigert - 30 Nov 2006

FreetownReleaseMeeting2007x03x12

Agreement on implementing this. The spec was agreed which maintains compatibility for people that have linked to the old Anchors. See below.

[22:55] <Lavr> A topic which we have not touched much is TOC and there is the Codev/TocFailsForIdenticalHeadingNames
[22:56] <PeterThoeny> we have very few people and have thus an informal meeting
[22:56] <HaraldJoerg> Hi.  Sorry - I still tried to connect to #twiki_edinburgh
[22:56] <Lavr> http://twiki.org/cgi-bin/view/Codev/TocFailsForIdenticalHeadingNames
[22:56] <HaraldJoerg> Hehehe.  That would qualify as a bug under normal circumstances
[22:57] <HaraldJoerg> I haven't looked at it yet, though
[22:57] <PeterThoeny> it's a no brainer that it needs to be fixed
[22:57] <PeterThoeny> question is compatibility
[22:57] <Lavr> One principle problem to address is HOW to fix this. An obvious fix is to just add a counter suffixed. The BIG question is - do we - and if we do - how do we - maintain compatibility
[22:57] <Lavr> ?
[22:57] <PeterThoeny> people send out links based on toc
[22:57] <HaraldJoerg> Compatibility to *what*
[22:58] <ArthurClemens> just add a counter at the next duplicate
[22:58] <HaraldJoerg> If you send broken links based on broken TOC, what do you get?
[22:58] <Lavr> People abuse the autogenerated TOC anchors.
[22:58] <PeterThoeny> if the old link was like #Description, and the new link would be #Description_25
[22:58] <PeterThoeny> it would break links
[22:59] <HaraldJoerg> We must note in the docs that it is a really, really bad idea to rely on autogenerated labels
[22:59] <ArthurClemens> no
[22:59] <ArthurClemens> the first link will still be #Description
[22:59] <HaraldJoerg> PeterThoeny: If the new link would be #Description_25, then the old has been broken anyway
[22:59] <Lavr> But if - as Arthur suggests - we append the numbers only on duplicate then we do not break any compatibility.
[22:59] <HaraldJoerg> Yes, of course.
[22:59] <ArthurClemens> the second header Description will be linked as #Description_2
[22:59] <PeterThoeny> yes, that was the point i was going to make
[22:59] <HaraldJoerg> What's the problem?
[23:00] <ArthurClemens> none
[23:00] <HaraldJoerg> OK
[23:00] <ArthurClemens> I don't know how expensive it is in Perl to keep track of the processed header names and check for duplicates each next header
[23:00] <PeterThoeny> keep it compatible for non-duplicates, autnomber for duplicates, starting from second one
[23:00] <HaraldJoerg> Yep
[23:00] <Lavr> So it seems we quickly agree that appending a counter on 2nd, 3rd etc duplicate is the spec to follow.
[23:00] <PeterThoeny> yes, that is how i would do it
[23:01] <PeterThoeny> this is not so expensive at run time
[23:01] <PeterThoeny> simply feed a hash
[23:01] <ArthurClemens> a yes
[23:01] <PeterThoeny> key is heading name, value is number of occurence
[23:02] <Lavr> I will copy paste this fraction of the meeting to the topic for reference.
[23:03] <HaraldJoerg> Key is heading key, isn't it?
[23:03] <HaraldJoerg> Sometimes different headings will collapse to the same key due to normalisation of the href attribute
[23:04] <PeterThoeny> key could be heading name or anchor name
[23:04] <ArthurClemens> there have been discussions to move out TOC as plugin
[23:04] * RickMach has joined #twiki_release
[23:04] <Lavr> Welcome Rick
[23:04] <HaraldJoerg> ArthurClemens: For what benefit?
[23:04] <RickMach> Hello
[23:05] <PeterThoeny> yes, unique anchor name is probably better as key
[23:05] <PeterThoeny> anyway, we do not need to discuss implementation details here

-- KennethLavrsen - 12 Mar 2007

About two years ago CrawfordCurrie wrote: "This problem has far reaching implications in the code.", and after looking into the code I agree with him. On the other hand, while TocPlugin doesn't have the problems with identical anchors, it would require radical changes in topics from "traditional" TWiki headings to TocPlugin's variables. So I still think something ought to be done with the traditional TOC handler. Since this topic is pretty crowded now, and I've found other issues with TOC, I'll collect the observations in a separate Topic FixAnchorHandling.

-- HaraldJoerg - 13 Apr 2007

As far as I know this was never implemented. And we are passed the feature freeze for Freetown. So this is deferred to GeorgetownRelease.

-- KennethLavrsen - 03 Jun 2007

This bug is also being tracked in Bugs:Item1607.

-- SvenDowideit - 11 Jan 2008

This item has been accepted a long time ago, but is not yet implemented. I removed the names in the ConcernRaisedBy since we agreed on the spec (keep anchor links compatible). Anyone interested in coding this?

-- PeterThoeny - 28 May 2008

Yes. smile cf. attachment(s) to Bugs:Item1607

-- MarkusUeberall - 18 Aug 2008
 
Topic revision: r3 - 17 Feb 2012, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy