remove outdated stuff

author: Daniel Stenberg <daniel@haxx.se> 2008-12-20 17:16:45 +0000
committer: Daniel Stenberg <daniel@haxx.se> 2008-12-20 17:16:45 +0000
commit: 19c9b7c8034498d925672bcb0a5c316250a75dc4 (patch)
tree: 5ad05ad3987c4ddf1fd5ef5df77750b070e3128c /hiper/STATUS
parent: ffd08df863b13ac8ca276e5d6f0a74ed28b55423 (diff)
1 files changed, 0 insertions, 300 deletions
diff --git a/hiper/STATUS b/hiper/STATUS
deleted file mode 100644
index 918611b8f..000000000
--- a/hiper/STATUS
+++ /dev/null
@@ -1,300 +0,0 @@
-Date:   January 5, 2006
-Author: Daniel Stenberg
-
-       Status of project Hiper - high performance libcurl modifications
-       ================================================================
-
-What is Hiper
-
-  You won't find such a description in this document. See
-  http://curl.haxx.se/libcurl/hiper/ for further details.
-
-Live Progress Info
-
-  During my work, I've posted occational updates on the curl-library mailing
-  list but more importantly done frequent updates of
-  http://curl.haxx.se/libcurl/hiper/schedule.html
-
-Schedule
-
-  I took time off my regular job during Decemember 2005 and the first week of
-  January 2006 to work on hiper full-time.
-
-Step 1 - Measure the Existing Solution
-
-  I started full-time work on project Hiper on December 1st 2005. I began by
-  putting together a test application that used the existing API to allow me
-  to properly and with accuracy measure execution and transfer speeds when
-  doing a large amount of transfers.
-
-  I soon discovered that it was impossible to do any sensible measurements by
-  using live and actual URLs since the transfers were too unrelialble and
-  uncontrolled. I then enhanced the current HTTP server in the curl test suite
-  and made that support a large amount of transfers and some extra magic
-  "commands" that would make the server either just sit "idle" or "stream"
-  (continuously sending data in a never-ending stream). I then wrote up two
-  files using the curl test suite file format and by acessing the properly
-  formatted URLs on my localhost the HTTP server would either run "idle" or
-  run "stream".
-
-  Having this working, I patched libcurl to always only recv() a single byte
-  off the network each time, just to make sure that the time spent on reading
-  data is constant and never very long.
-
-  I adjusted the test application (actually called 'hiper') to create Y idle
-  transfers and Z stream transfers, had it run for N seconds and then quit and
-  produce a summary on stdout. Now I got very solid and repeatable results. I
-  started to run repeated tests and save the results when I ran into the
-  dreaded 1024 socket maximum limit.
-
-  One side of the problem is that the fd_set type only allows 1024 file
-  descriptors (on my Linux), which I had to solve by simply making my own type
-  with room for more connections and do ugly typecasts in the code. The other
-  side of the problem is that user applications have a limit imposed by the
-  system on the maximum amount of file descriptors it can have open and I had
-  to work around that by writing a special tool that runs setuid root that
-  increases the limit, downgrades to a normal user again and then run the
-  command line of your choice. This second approach has to be used for both
-  'hiper' and the test HTTP server. (You need to build the HTTP server with
-  CURL_SWS_FORK_ENABLED defined to have it do forks since it isn't desirable
-  to do so when running the normal curl tests.)
-
-  Now I could run my test program without problems. I decided to run the tests
-  with 1 stream connection and a varying amount of idle ones. I did 1001,
-  2001, 3001, 5001 and 9001 connections and measured how long select() and
-  curl_multi_perform() (including the curl_multi_fdset() call) would take in
-  average, over a period of 20 seconds. I ran each test 5-6 times and I used
-  the average time of all the runs.
-
-  The times in number of microseconds:
-
-    Connections  multi_perform  select
-    1001         3504           951
-    2001         7606           1988
-    3001         11045          2715
-    5001         16406          4024
-    9001         32147          8030
-
-  Test system
-    CPU:     Athlon XP 2800 
-    RAM:     1 GB
-    Linux:   2.6
-    glibc:   2.3.5
-    libcurl: 7.15.1
-
-  The only reason I stopped at 9001 connections is that my test machine ran
-  out of avaiable memory by then as I ran the test server on the same machine,
-  and I didn't want to risk the test result accuracy by having it start using
-  the swap during the tests.
-
-  It means that at 9000 connections we spend 40ms for each socket action, even
-  when only one socket ever have action.
-
-  With these 32000 microseconds curl_multi_perform() takes for 9000
-  connections, it loops 18000 laps which makes less than 2 microseconds per
-  lap. (Of course counting time/laps is an oversimplification, but anyway.)
-  Hopefully we should achieve less than 10 microseconds for each call to
-  curl_multi_socket() for an active connection.
-
-  The timing graph displayed on the libevent site (duplicated on the hiper
-  project page) suggests that libevent is pretty much fixed at 50 microseconds
-  (although I don't know what test box was used in their testing, we can
-  compare the select()-times from my tests and see that they are at least
-  resonably close).
-
-  Summing up, the current ~40 ms spent at 9000 connections could then possibly
-  be lowered to something around 60 us!
-
-Step 2 - Implement curl_multi_socket API
-
-  Most of the design decisions and debates about this new API have already
-  been held on the curl-library mailing list a long time ago so I had a basic
-  idea on what approach to use. The main ideas of the new API are simply:
-
-   1 - The application can use whatever event system it likes as it gets info
-       from libcurl about what file descriptors libcurl waits for what action
-       on. (The previous API returns fd_sets which is very select()-centric).
-
-   2 - When the application discovers action on a single socket, it calls
-       libcurl and informs that there was action on this particular socket and
-       libcurl can then act on that socket/transfer only and not care about
-       any other transfers. (The previous API always had to scan through all
-       the existing transfers.)
-
-  The idea is that curl_multi_socket() calls a given callback with information
-  about what socket to wait for what action on, and the callback only gets
-  called if the status of that socket has changed.
-
-  In the API draft from before, we have a timeout argument on a per socket
-  basis and we also allowed curl_multi_socket() to pass in an 'easy handle'
-  instead of socket to allow libcurl to shortcut a lookup and work on the
-  affected easy handle right away. Both these turned out to be bad ideas.
-
-  The timeout argument was removed from the socket callback since after much
-  thinking I came to the conclusion that we really don't want to handle
-  timeouts on a per socket basis. We need it on a per transfer (easy handle)
-  basis and thus we can't provide it in the callbacks in a nice way. Instead,
-  we have to offer a curl_multi_timeout() that returns the largest amount of
-  time we should wait before we call the "timeout action" of libcurl, to
-  trigger the proper internal timeout action on the affected transfer. To get
-  this to work, I added a struct to each easy handle in which we store an
-  "expire time" (if any). The structs are then "splay sorted" so that we can
-  add and remove times from the linked list and yet somewhat swiftly figure
-  out 1 - how long time there is until the next timer expires and 2 - which
-  timer (handle) should we take care of now. Of course, the upside of all this
-  is that we get a curl_multi_timeout() that should also work with old-style
-  applications that use curl_multi_perform().
-
-  The easy handle argument was removed fom the curl_multi_socket() function
-  because having it there would require the application to do a socket to easy
-  handle conversion on its own. I find it very unlikely that applications
-  would want to do that and since libcurl would need such a lookup on its own
-  anyway since we didn't want to force applications to do that translation
-  code (it would be optional), it seemed like an unnecessary option. I also
-  realized that when we use underlying libraries such as c-ares (for DNS
-  asynch resolving) there might in fact be more than one transfer waiting for
-  action on the same socket and thus it makes the lookup even tricker and even
-  less likely to ever get done by applications. Instead I created an internal
-  "socket to easy handles" hash table that given a socket (file descriptor)
-  returns a list of easy handles that waits for some action on that socket.
-
-  To make libcurl be able to report plain sockets in the socket callback, I
-  had to re-organize the internals of the curl_multi_fdset() etc so that the
-  conversion from sockets to fd_sets for that function is only done in the
-  last step before the data is returned. I also had to extend c-ares to get a
-  function that can return plain sockets, as that library too returned only
-  fd_sets and that is no longer good enough. The changes done to c-ares have
-  been committed and are available in the c-ares CVS repository destined to be
-  included in the upcoming c-ares 1.3.1 release.
-
-  The 'shiper' tool is the test application I wrote that uses the new
-  curl_multi_socket() in its current state. It seems to be working and it uses
-  the API as it is documented and supposed to work. It is still using
-  select(), because I needed that during development (like until I had the
-  socket hash implemented etc) and because I haven't yet learned how to use
-  libevent or similar.
-
-  The hiper/shiper tools are very simple and initiates lots of connections and
-  have them running for the test period and then kills them all.
-
-  Since I wasn't done with the implementation until early January I haven't
-  had time to run very many measurements and checks, but I have done a few
-  runs with up to a few hundred connections (with a single active one). The
-  curl_multi_socket() invoke then takes 3-6 microseconds in average (using the
-  read-only-1-byte-at-a-time hack). If this number does increase a lot when we
-  add connections, it certainly matches my in my opinion very ambitious goal.
-  We are now below the 60 microseconds "per socket action" goal. It is
-  destined to be somewhat higher the more connections we have since the hash
-  table gets more populated and the splay tree will grow etc.
-
-  Some tests at 7000 and 9000 connections showed that the socket hash lookup
-  is somewhat of a bottle neck. Its current implementation may be a bit too
-  limiting. It simply has a fixed-size array, and on each entry in the array
-  it has a linked list with entries. So the hash only checks which list to
-  scan through. The code I had used so for used a list with merely 7 slots (as
-  that is what the DNS hash uses) but with 7000 connections that would make an
-  average of 1000 nodes in each list to run through. I upped that to 97 slots
-  (I believe a prime is suitable) and noticed a significant speed increase.  I
-  need to reconsider the hash implementation or use a rather large default
-  value like this. At 9000 connections I was still below 10us per call.
-
-Status Right Now
-
-  The curl_multi_socket() API is implemented according to how it is
-  documented. The man pages for curl_multi_socket and curl_multi_timeout are
-  both committed to CVS and are available online for easy browsing:
-
-    http://curl.haxx.se/libcurl/c/curl_multi_socket.html
-    http://curl.haxx.se/libcurl/c/curl_multi_timeout.html
-
-  The hiper-5.patch I made available early morning January 5th, 2006 should
-  apply fine on a recent CVS checkout (at the time of this writing curl 7.15.1
-  is the latest public curl release but the hiper patch does not apply fine on
-  that).
-
-What is Left for the curl_multi_socket API
-
-  1 - More measuring with more extreme number of connections
-
-  2 - More testing with actual URLs and complete from start to end transfers.
-
-  I'm quite sure we don't set expire times all over in the code properly, so
-  there is bound to be some timeout bugs left.
-
-  What it really takes is for me to commit the code and to make an official
-  release with it so that we get people "out there" to help out testing it.
-
-What is Left for project Hiper
-
-  1 - Add HTTP pipelining support
-
-  2 - Add a zero (or at least close to zero) copy interface
-
-  Neither of these points have been planned or detailed exactly how they will
-  be implemented.
-
-Roadmap Ahead
-
-  I plan and hope to return to full-time hiper work later on this spring or
-  possibly summer to continue where I pause now. Of course some spare time
-  might also be spent until then to get us moving forward.
-
----------------------------------------------------------------------------
-
-April 11, 2006
-
- While sitting staring on my screen trying to write up a *nice* sample script
- using libevent, it strikes me that since libevent is pretty much based around
- its structs that you setup for each event/file descriptor, my application
- wants to figure out the correct struct that is associted with the file
- descriptor that libcurl provides in the socket callback.
-
- This feels like an operation most applications will need when using the
- multi_socket API, so it feels like I should better try to figure out a decent
- way to offer this basic functionality already in libcurl - and the fact that
- we already have the file descriptors in a hash we can probably just as well
- extend it somewhat and store some custom pointers as well.
-
- We need to offer the app a way to set a private pointer to be associated with
- the particular file descriptor, and then be able to provide that pointer on
- subsequent callback calls.
-
----------------------------------------------------------------------------
-
-April 20, 2006
-
- I was wrong when I previously claimed we could have more than one easy handle
- using the same socket. I've cleaned up and simplified code now to adjust to
- this.
-
----------------------------------------------------------------------------
-
-July 9, 2006
-
- TODO: We need to alter how we use c-ares for getting info about its sockets,
- as c-ares now provides a callback approach very similar to how libcurl is
- about to work.
-
- I'm adding a function called curl_multi_assign() that will set a private
- pointer added to the internal libcurl hash table for the particular socket
- passed in to this function:
-
- CURLMcode curl_multi_assign(CURLM *multi_handle,
-                             curl_socket_t sockfd,
-                             void *sockp);
-
- 'sockp' being a custom pointer set by the application to be associated with
- this socket. The socket has to be already existing and in-use by libcurl,
- like having already called the callback telling about its existance.
-
- The set hashp pointer will then be passed on to the callback in upcoming
- calls when this same socket is used (in the brand new 'socketp' argument).
-
----------------------------------------------------------------------------
-
-July 30, 2006
-
- Shockingly stupid (of me not having realized this before), but we really need
- to add a 'running_handles' argument to the curl_multi_socket() and
- curl_multi_socket_all() prototypes so that the caller can get to know when
- all the transfers are actually done!
author	Daniel Stenberg <daniel@haxx.se>	2008-12-20 17:16:45 +0000
committer	Daniel Stenberg <daniel@haxx.se>	2008-12-20 17:16:45 +0000
commit	19c9b7c8034498d925672bcb0a5c316250a75dc4 (patch)
tree	5ad05ad3987c4ddf1fd5ef5df77750b070e3128c /hiper/STATUS
parent	ffd08df863b13ac8ca276e5d6f0a74ed28b55423 (diff)