webdevRefinery Forum: Parsing BBCode and substr() - webdevRefinery Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Rate Topic: -----

User is offline Marked 

  • Group: Members
  • Posts: 114
  • Joined: 21-May 10
  • LocationChch, NZ
  • Expertise:HTML,CSS,PHP,SQL

Posted 13 July 2012 - 07:14 AM (#1)

Parsing BBCode and substr()


Hi all,

I've got an issue I'm unsure how to tackle. I have this system where announcements are made on the forums, and this announcements are also displayed on the homepage. I want to display the topics like a blog, so one after another, but I want to cut them to a certain length with a "read more".

This issue is that these topic posts are full of bbcode, so the issue is how do I cut them to a certain length without breaking html tags?

I can cut the post and then parse bbcode, and vice versa, but the posts are saved to the database with
<br />
tags. So that means I'm at a risk of cutting one in half and this REALLY messes with the html. At least in firefox.

Any ideas as to how to go about safely cutting a post with bbcode?

Thanks in advance,
Mark.
0


User is offline NeilHanlon 

  • Group: Members
  • Posts: 886
  • Joined: 08-July 10
  • LocationRowley, Massachusetts
  • Expertise:HTML,CSS,PHP,Java,Graphics

Posted 13 July 2012 - 08:47 AM (#2)

Try this:

	function word_limiter($str, $limit = 100, $end_char = '…')
	{
		if (trim($str) == '')
		{
			return $str;
		}

		preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);

		if (strlen($str) == strlen($matches[0]))
		{
			$end_char = '';
		}

		return rtrim($matches[0]).$end_char;
	}


Edit:

After reading Kyek's post, I'll admit I didn't think about what if you're in the middle of a tag.
Thanks,
兄ニール

Website | Blog | @NeilHanlon | About.Me | Facebook | LinkedIn
0


User is offline Cyril 

  • Group: Members
  • Posts: 2544
  • Joined: 03-August 10
  • Expertise:HTML,CSS,PHP,Javascript,Graphics

Posted 13 July 2012 - 08:50 AM (#3)

I'd suggest cutting it after a punctuation mark (a dot, interrogation / exclamation mark), and even better, if it has a line break after it. So basically, look through your post until you find something like this:
.<br>


Get the index of it, and split the string there. I'll code up something a little later.

website :: github :: twitter :: dribbble :: forrst
html, css, php, javascript, graphics
1


User is offline Kyek 

  • Founder of wdR
  • Group: Administrators
  • Posts: 5078
  • Joined: 20-February 10
  • LocationPhiladelphia, PA, USA
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 13 July 2012 - 08:50 AM (#4)

Interesting issue! This DEFINITELY warrants more thought than I've given it, but here's the logical process I'd follow:
  • Determine the character position at which you'd like to cut the string (since this contains HTML, this might be an entire topic itself since HTML characters probably shouldn't be counted)
  • Call strrpos() to search backward from that position, looking for the last occurrence of '<'. Then do it again, looking for the last occurrence of '>'.
  • If both return false, you're safe to cut.
  • If the index of the last '<' is greater than the index of the last '>', use strpos (with an offset) to find the next index of >. Move your cut point after that.
  • Now you need to make sure you're not cutting in between an opening tag and a closing tag -- for example, this would be bad:
    <strong>How to code [CUT] in PHP</strong>
    . You have a couple different options here.
    • Use an HTML parsing library. This is easiest.
    • Cut the string at your cut point. Use preg_match to count the number of <tags> WITHOUT a '/' character after the <, then use it again to find the number of </tags> WITH a slash character after the <. Make sure to ignore tags that self <close/>. If the numbers aren't equal, you're probably cutting before a closing tag.

  • If you used an HTML parser to do the above, you may be able to find out what tags weren't closed and close them automatically. Otherwise, you're stuck doing another really ugly round of preg_match to figure that out on your own.

All in all, this is a really tough one. I've never looked for or used an HTML parser in PHP, but it sounds like that might be the better way to go if you can find one.

Edit: Just read Cyril's post above. That's probably the best idea so far, but *only* if you can guarantee that you won't have opened tags that span <br/>s.
0


User is offline Cyril 

  • Group: Members
  • Posts: 2544
  • Joined: 03-August 10
  • Expertise:HTML,CSS,PHP,Javascript,Graphics

Posted 13 July 2012 - 09:16 AM (#5)

View PostKyek, on 13 July 2012 - 08:50 AM, said:

Interesting issue! This DEFINITELY warrants more thought than I've given it

:lol:

View PostKyek, on 13 July 2012 - 08:50 AM, said:

Edit: Just read Cyril's post above. That's probably the best idea so far, but *only* if you can guarantee that you won't have opened tags that span <br/>s.


Well, that's just a question of doing a check; as you said, simply search backward from the position where it's going to cut to see if there are any open tags. (Also, you could simply close the tag instead of trying to find another cutting point. Shouldn't be an issue, as you'd usually find punctuation and line breaks in paragraph tags.)

website :: github :: twitter :: dribbble :: forrst
html, css, php, javascript, graphics
0


User is offline callumacrae 

  • {{ post.author }}
  • Group: Members
  • Posts: 2862
  • Joined: 20-January 11
  • LocationWarwickshire, England
  • Expertise:HTML,CSS,PHP,Javascript,Node.js,SQL

Posted 13 July 2012 - 09:18 AM (#6)

Use this to convert to HTML, then this to convert to tree. Cycle through elements counting characters, cut text from the element that the text ends on, and then delete all nodes after that.

You'll want to cache that.
Front-end developer and writer
Twitter | GitHub | phpBB Contributor and Website Team Member | lynxphp
0


User is offline Rob 

  • Group: Members
  • Posts: 207
  • Joined: 08-March 10
  • Expertise:HTML,CSS,PHP,Javascript,SQL,Graphics

Posted 18 July 2012 - 12:57 AM (#7)

Another, perhaps cleaner way would be to render the text, including BBCODE in memory and then strip all of the tags, determine where you want to insert the break. Then you could optionally re-insert the BBCODE tags or leave them out.

Rob
0


User is offline callumacrae 

  • {{ post.author }}
  • Group: Members
  • Posts: 2862
  • Joined: 20-January 11
  • LocationWarwickshire, England
  • Expertise:HTML,CSS,PHP,Javascript,Node.js,SQL

Posted 18 July 2012 - 02:05 AM (#8)

That's not as easy as you make it sound, though.
Front-end developer and writer
Twitter | GitHub | phpBB Contributor and Website Team Member | lynxphp
0


Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

2 User(s) are reading this topic
0 members, 2 guests, 0 anonymous users


Enter your sign in name and password


Sign in options
  Or sign in with these services