webdevRefinery Forum: PHP Levenshtein - webdevRefinery Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Rate Topic: -----

User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 25 June 2012 - 06:14 AM (#1)

PHP Levenshtein


I just found this the other day, while working on an AI chat type thing. Wow. This is really cool.

REALLY cool.

You know, one thing I really like about PHP is the number of methods and functions it has, "right out of the box", making a programmer's life easier. Anyway - getting to the point.

-----------

Is there anyway to use this function in Sqlite3 - specifically using PDO (if that makes a difference - oh and I finally learnt PDO!! wow.....how on EARTH was I using mysql_ :P).

I want to find for instance - the closest match in a database to a given string. This function seemed to be the sort of thing I was looking for. I know that it can be done with MySQL (I think...), but I'm looking for a solution with Sqlite. If there are any other ways to do this - please let me know!

-----------
0


User is offline gibbonweb 

  • 兄ヨハネス
  • Group: Members
  • Posts: 2063
  • Joined: 23-June 10
  • LocationMunich(DE)
  • Expertise:HTML,CSS,PHP,Javascript,Python,SQL,Graphics

Posted 25 June 2012 - 06:29 AM (#2)

WAT are you talking about.
0


User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 25 June 2012 - 06:41 AM (#3)

View Postgibbonweb, on 25 June 2012 - 06:29 AM, said:

WAT are you talking about.


http://php.net/manua...levenshtein.php

This. I found it while looking through the PHP docs, and realised it was exactly what I needed. Now I just need to implement it into a database somehow.
0


User is online TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5167
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 25 June 2012 - 07:11 AM (#4)

Oh, that's.cool. It's an auto correct algorithm. I believe this is also what aspell does? Really should check out what spell does since our app is it XD but yeah there we a lot of "better" ways to do it because they are more cross platform so they have more work done on them.
Reserved.
0


User is offline gibbonweb 

  • 兄ヨハネス
  • Group: Members
  • Posts: 2063
  • Joined: 23-June 10
  • LocationMunich(DE)
  • Expertise:HTML,CSS,PHP,Javascript,Python,SQL,Graphics

Posted 25 June 2012 - 07:13 AM (#5)

Ah! Cool.

Apparently, your best bet would be to implement the levenshtein distance as a MySQL function.

Edit: Dang, you were asking for sqlite... not sure if it's possible to implement functions like this in sqlite? IF it is, it must be quite similar but I'm not sure if it's possible at all.
0


User is online Lemon 

  • I have a dream...
  • Group: Members
  • Posts: 694
  • Joined: 24-February 11
  • Expertise:HTML,CSS,PHP,Javascript,Node.js,SQL

Posted 25 June 2012 - 08:59 AM (#6)

I'd actually say the inclusion of this function in the global namespace is actually a problem with PHP rather than a nicety. I agree it's a handy function to have built into the language, but consider how often do you think you are likely to use it in everyday coding? Obscure, rarely used functions like this just shouldn't be in the global namespace. A lot of other languages have similar algorithm functions, but organise it all far neater with namespaces, modules or includes; so it puzzles me that PHP continues to fail to sort out how its functions are loaded.

As for how to implement this function with SQLite (without a query all), the only example I've found so far requires compiling a custom function into an SQLite extension and then loading it at runtime. Judging from the code though, it's just a simple download, make and then load, so as long as you have a compiler this should be really simple.
2


User is offline Ruku 

  • I do Linux and that Internet thing.
  • Group: Members
  • Posts: 1367
  • Joined: 17-April 10
  • Location/root
  • Expertise:HTML,CSS,PHP,Javascript,Python,SQL

Posted 25 June 2012 - 02:35 PM (#7)

This kind of bollocks is the exact reason I hate working with PHP. It has an awful lot of useless junk cluttering up the global namespace because it didn't start out with a proper module system (hell, it began life as a procedural language...). It seems that some of this junk doesn't even belong in the core distribution!

PHP doesn't make a programmer's life easier. Its lame attempt at extensions makes code difficult to integrate (dependency hell, especially when you're working with ionCube encoded shite written by competitors) and the wildly varying defaults between different versions even on the same OS mean that much boilerplate "unfuckery" is required to make the damn language usable. THANKS FOR THE MAGIC QUOTES, PHP.NET!!!!!11
Luke Carrier
I poke fun at IE and do web stuff. Find me on my blog, my code on GitHub or my angry rants on Twitter.
1


User is offline Daniel15 

  • dan.cx
  • Group: Moderators
  • Posts: 3435
  • Joined: 17-April 10
  • LocationMelbourne, Australia
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 25 June 2012 - 10:39 PM (#8)

You could try Soundex instead. For each text column you want to closest matching for, add a new database column for the soundex key. Whenever that column is updated, also update the Soundex field.

Then you just need to
SELECT ... FROM table WHERE soundex = 'soundex of what you're looking for'
to find similar stuff.
Daniel15! :D
Posted Image

Repeat after me: jQuery is not JavaScript. It is not the answer to every JavaScript-related question. When you have to write some JavaScript, do not instantly react with "Oh, I'll do that with jQuery!"

Spoiler
0


User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 26 June 2012 - 02:13 AM (#9)

View PostDaniel15, on 25 June 2012 - 10:39 PM, said:

You could try Soundex instead. For each text column you want to closest matching for, add a new database column for the soundex key. Whenever that column is updated, also update the Soundex field.

Then you just need to
SELECT ... FROM table WHERE soundex = 'soundex of what you're looking for'
to find similar stuff.


Hey that looks interesting! Might have a look as I think it would be useful for other stuff too.

Just wondering though - seeing as it indexes phonetics of sounds - would it work as well as something like Levenshtein which is used for spell checking?

---------------

EDIT: Well I guess if people spell something phonetically in an attempt to spell it correctly - this will work.

Thanks for the find! Will try it out!
0


User is offline Daniel15 

  • dan.cx
  • Group: Moderators
  • Posts: 3435
  • Joined: 17-April 10
  • LocationMelbourne, Australia
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 26 June 2012 - 02:34 AM (#10)

Something like Soundex is more appropriate for search. If you use something like the Levenshtein distance, far as as I know, you'd need to calculate the distance between the searched word and every single word in the database, which is quite expensive. On the other hand, all the Soundex keys are precomputed so they're a lot faster to search on.

If it's search you're doing, and you think your database may become large, maybe look into something like Sphinx, Lucene or Apache Solr. These are designed for search and have implemented some very good algorithms internally.
Daniel15! :D
Posted Image

Repeat after me: jQuery is not JavaScript. It is not the answer to every JavaScript-related question. When you have to write some JavaScript, do not instantly react with "Oh, I'll do that with jQuery!"

Spoiler
0


User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 26 June 2012 - 03:18 AM (#11)

View PostDaniel15, on 26 June 2012 - 02:34 AM, said:

Something like Soundex is more appropriate for search. If you use something like the Levenshtein distance, far as as I know, you'd need to calculate the distance between the searched word and every single word in the database, which is quite expensive. On the other hand, all the Soundex keys are precomputed so they're a lot faster to search on.

If it's search you're doing, and you think your database may become large, maybe look into something like Sphinx, Lucene or Apache Solr. These are designed for search and have implemented some very good algorithms internally.


Hmmm. Hadn't thought of that. And yes, the database will become very large. Possibly over 5,000 entries. Now that you mention it, calculating the Levenshtein distance would be ridiculous. Unless of course, the Levenshtein distance was calculated before hand, and put into anothe table - the same way Soundex would have worked - although Soundex looks like a much better way of doing what I was wanting.

Yep, searching was what I was going to do. Thanks for the tips. Sphinx and Solr look quite interesting. I'll check 'em out! Have you had any experience with either - and could pass on any tips or problems you had with them? (Things to look out for etc.?)
0


User is online TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5167
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 26 June 2012 - 04:44 AM (#12)

View PostDaniel15, on 25 June 2012 - 10:39 PM, said:

You could try Soundex instead. For each text column you want to closest matching for, add a new database column for the soundex key. Whenever that column is updated, also update the Soundex field.

Then you just need to
SELECT ... FROM table WHERE soundex = 'soundex of what you're looking for'
to find similar stuff.

:S
SELECT ... FROM table WHERE soundex = 'soundex of what you\'re looking for'

Reserved.
0


User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 26 June 2012 - 05:24 PM (#13)

View PostThatRailsGuy, on 26 June 2012 - 04:44 AM, said:

:S
SELECT ... FROM table WHERE soundex = 'soundex of what you\'re looking for'



Or

SELECT ... FROM table WHERE soundex = "soundex of what you're looking for" 


------

Having said that - is there any proper use? Is "escaping" quotes the best way to go?
0


User is offline Daniel15 

  • dan.cx
  • Group: Moderators
  • Posts: 3435
  • Joined: 17-April 10
  • LocationMelbourne, Australia
  • Expertise:HTML,CSS,PHP,Java,Javascript,Node.js,SQL

Posted 26 June 2012 - 10:07 PM (#14)

You guys. My code was very obviously not working code. No need to escape the apostrophe :P

Quote

And yes, the database will become very large. Possibly over 5,000 entries

5,000 isn't that large for a database, and anything should be able to handle that number of records. 500,000 is large :P

Quote

Have you had any experience with either - and could pass on any tips or problems you had with them? (Things to look out for etc.?)

I haven't personally used Solr, but I've seen people use it with data sets containing hundreds of thousands of records. It's very fast.
Daniel15! :D
Posted Image

Repeat after me: jQuery is not JavaScript. It is not the answer to every JavaScript-related question. When you have to write some JavaScript, do not instantly react with "Oh, I'll do that with jQuery!"

Spoiler
0


User is offline gibbonweb 

  • 兄ヨハネス
  • Group: Members
  • Posts: 2063
  • Joined: 23-June 10
  • LocationMunich(DE)
  • Expertise:HTML,CSS,PHP,Javascript,Python,SQL,Graphics

Posted 27 June 2012 - 01:20 AM (#15)

View PostDaniel15, on 26 June 2012 - 10:07 PM, said:

5,000 isn't that large for a database, and anything should be able to handle that number of records. 500,000 is large :P


when you need unsigned BIGINT for the ID, THAT's large :P
0


User is offline TheMaster 

  • *-c0de master-*
  • Group: Members
  • Posts: 764
  • Joined: 24-May 10
  • LocationAustralia
  • Expertise:HTML,CSS,PHP,Java

Posted 27 June 2012 - 05:29 AM (#16)

View PostDaniel15, on 26 June 2012 - 10:07 PM, said:

5,000 isn't that large for a database, and anything should be able to handle that number of records. 500,000 is large :P


:o LOL well the biggest database I've ever had was <100 :(

Far out, 500,000 entries. That is a crap ton.

View Postgibbonweb, on 27 June 2012 - 01:20 AM, said:

when you need unsigned BIGINT for the ID, THAT's large :P


I don't even want to imagine that....
0


User is offline Rob 

  • Group: Members
  • Posts: 207
  • Joined: 08-March 10
  • Expertise:HTML,CSS,PHP,Javascript,SQL,Graphics

Posted 20 July 2012 - 08:59 PM (#17)

I have used Sphinx and it works quite well for full text searches. However, I do not believe that Sphinx supports SQLite, though the developers of Sphinx could have added support since I last read the documentation. As far as I know, Sphinx requires the use of the INNODB table type and both INNODB and Sphinx would probably be overkill for such a small number of records. Is there a reason that you chose to use SQLite instead of MySQL or Percona?

If you were to use MySQL and the ISAM table type, you could use the built in full text indexing of the MySQL database engine. One site that I have worked on has approximate 20M records in a single table and Sphinx allows search results to be returned in less than 2 seconds, but as with all things having to do with Databases, time spent designing, planning, testing and tuning will determine the performance... In other words, YMMV.
0


Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

3 User(s) are reading this topic
0 members, 3 guests, 0 anonymous users


Enter your sign in name and password


Sign in options
  Or sign in with these services