Discussion Forums



Thread: Possible workaround for sorting / ordering problem.

Welcome, Guest Help
Login Login


Permlink Replies: 6 - Pages: 1 - Last Post: Mar 3, 2008 9:14 AM by: Mocky Habeeb
mrtwig

Posts: 2
Registered: 2/17/08
Possible workaround for sorting / ordering problem.
Posted: Feb 24, 2008 10:11 PM PST
  Click to reply to this thread Reply

I was playing around with simpleDB trying to figure out how to do sorting. Here are my findings based on the few test cases I had:

1. The values of a multivalued attribute are already sorted in ascending order.

2. If an attribute say "views" contains the no of views, then doing a query ['views' > '0']
returns the item numbers in ascending order of views.

3. If you want to run a query and sort it by "views" column just add "intersection ['views' > '0'] " to the end of the query. This seems to return the result in the ascending order of the "views" column.

This seems to solve ascending part of ordering. Now for Descending:

A.)  for number types have another column say "viewsDesc" and the value should be whatever is the highest positive number of the "views" column is MINUS the value in the "views" column. Amazon forces you to know the highest positive integer beforehand and prefix smaller  numbers with zero's. This actually helps in keeping the sort order. Then you can use this column to sort in the descending order.

B.)  for Date types I used This format "20071024". this format seems to be stored in the ascending order.  For descending use another column and use a larger date or number say "20200000" and MINUS the date with this number. Then you have descending date order for your queries.

C.) For alphabets they are also already sorted in ascending order. For descending order you might need to know how long the values are because sorting for whole sentences is not good. I have'nt exactly figured this out yet may u can use the same principle as above. Like if u have a word "zzzzzzzzzz" find some way to invert it to "aaaaaaaaaa" or "0000000000". Then that should give descending order.

The queries ran at the same speed with or without sorting. May be we will notice a difference when there 1000's of rows.

Please correct me if I am wrong and these are based on few test cases with limited data.







David Butler

Posts: 28
Registered: 2/26/07
Re: Possible workaround for sorting / ordering problem.
Posted: Feb 25, 2008 7:43 AM PST   in response to: mrtwig
  Click to reply to this thread Reply

After some initial testing, this astoundingly actually seems to work.  I used a data set of 300, and the results were returned in correct order, even when the data was split into several result sets with NextTokens.

That said, I would caution against relying on this mechanism at the moment, as Amazon does not guarantee a sorted order right now.  It may be that the algorithm used to run the queries has the side-effect of sorting the data, but if a faster method should happen to be implemented behind the scenes, that ordering could go straight out the window.

Very cool discovery though.


Mocky Habeeb
RealName(TM)


Posts: 548
Registered: 12/28/07
Re: Possible workaround for sorting / ordering problem.
Posted: Feb 25, 2008 12:41 PM PST   in response to: mrtwig
  Click to reply to this thread Reply

I have noticed a number of these things myself. In addition, listDomains also comes back with a sorted list. 

I wouldn't be surprised to hear that internally the data is stored sorted, given the query interface they have exposed. That said, I have seen this assumption break down when fetching a query with multiple next tokens, which did surprise me.

These are interesting and useful undocumented features, but I wouldn't suggest that anyone rely on them. I would love to see them officially supported. Sorting a large query result is something sdb is probably much better suited for than clients, and something it can do today with little (apparent) trouble in the tightly controlled phase of beta testing.



Adam Fisk

Posts: 32
Registered: 2/8/06
Re: Possible workaround for sorting / ordering problem.
Posted: Feb 27, 2008 5:46 AM PST   in response to: mrtwig
  Click to reply to this thread Reply

This is a fantastic discovery.  I'll likely be using it for data where sorting would be nice, but where it's not the end of the world if the technique breaks down.

I hope you don't mind I blogged about your post last night at:

http://adamfisk.wordpress.com/2008/02/27/simple-db-sorting-solution-from-forums/

Let me know if you'd like me to give you credit by name.  I really see this as the biggest missing piece in the Simple DB puzzle!!

Thanks.

-Adam Fisk


justinetheredge

Posts: 10
Registered: 1/2/08
Re: Possible workaround for sorting / ordering problem.
Posted: Feb 28, 2008 6:16 AM PST   in response to: Adam Fisk
  Click to reply to this thread Reply

Keep in mind that if Amazon does not advertise that their platform is supposed to operate in this way then you are relying on something that can and probably will change in the future.


D. C Smith
RealName(TM)


Posts: 100
Registered: 11/25/07
Re: Possible workaround for sorting / ordering problem.
Posted: Mar 2, 2008 11:17 AM PST   in response to: Adam Fisk
  Click to reply to this thread Reply

I have found some ColdFusion specific remedies to sorting.  One of the cool things in ColdFusion is being able to perform a query of queries: http://www.sitepoint.com/article/query-queries

So I am able to take the entire simpleDB output and put it in a ColdFusion query, then make SQL queries against it and manipulate it any which way I want.  I honestly don't know if other languages have this (I'd be surprised if they didn't since ColdFusion is no longer king of the dynamic app space).

Another nice thing is that CF automatically caches queries so that data is available as long as I need it without making more calls to SDB.  For an interesting look at using ColdFusion cached queries vs. memcache (and how you can actually use both) go here: http://www.flexablecoder.com/blog/index.cfm/2007/12/11/external-caching-for-coldfusion-with-memcached

I'm still testing, and not sure if this defeats the purpose of SDB, but in my mind the only real advantage to SDB (so far) is that it removes a point of failure in my system.  Aside from that -- and perhaps I'm not thinking big enough! -- I haven't found a reason why it would be better/smarter/faster than just using MySQL replication for a relatively "standard" web appplication.  Maybe for scientific research?  I dunno...I am continually learning, however, and happy to be educated by the more seasoned developers on this board.

Mocky Habeeb
RealName(TM)


Posts: 548
Registered: 12/28/07
Re: Possible workaround for sorting / ordering problem.
Posted: Mar 3, 2008 9:14 AM PST   in response to: D. C Smith
  Click to reply to this thread Reply

The real weakness with sdb's current query solution and the real benefit explained by this threads OP is in cases where you need sorted query results from a data set too large to pull down all the data.

It's great in some situations if your query results are cached, but if all you need is the top few there comes a point where you don't want to spend either the bandwidth or the time or the memory to pull down the the whole query and do a get on each item and sort it yourself, caches not withstanding. 

50,000 results, 100,000, 1,000,000 at some point it becomes too much. And since they already have sorted indices internally for every attribute, they might as well return the sorted result for you while they are in there. And in fact they do, albeit in a undocumented, unofficial and implicit way.

I haven't followed your links, and I'm not bashing CF, cached queries or memcache. I'm just pointing out that sdb internal sorting solves a super-set of the problems that client side caching solves with regard to sorting.




Point your RSS reader here for a feed of the latest messages in all forums