Somethings to rejoice about
We’re really excited to check out the new erlang R13A release , here’s a summary of what caught our attention.
-
Highlight #1
OTP-7500 The runtime system with SMP support now uses multiple, scheduler specific run queues, instead of one globally shared run queue.
Not inducing concurrency into your programs can hinder the effectiveness and optimal utilization of your machine . That said - just because erlang can spawn a million process’s doesn’t quite warrant making every call concurrent. Most experiments don’t take into consideration that the same system may well be yet another web server, building your own crons using flowcontrol through message queues , loading data into or out of the database - at the same instant of time when your benchmarking algorithm wants you to start a multi-node concurrent version of every function that you come across. Infact it’s quite a misnomer. Sure you can spawn a mighty - concurrent sorting algorithm across 100 nodes. But what if each of those 100 nodes tried to do the same thing. Would it be more efficient to spawn more process’s on the same node versus RPC‘ing and handling latencies from different nodes. Now I have nothing against location transparency, and mnesia’ ability to decide which tables need to be on RAM, completely transparent from any node, or only accessible form a third node - that granularity is from another planet. Same goes with gen_server’s being in memory, being able to configure to use trees, vs lists vs queue’s vs sets. it rocks. period. But here’s where improvements in SMP and infact the very emphasis of your code relying on being able to add jobs to nodes, but internally making sure that X node run’s process’s at 2X courtesy of concurrency helps, as oppose to spawning process’s for every element in the list on every node in the cluster. It’s simply not controlled enough , and sometimes good old serially running jobs , but making sure they run faster - makes a hell of a difference. Well thats what we’ve been trying to do, and has held us in amazing stead . Something like replacing every lists:map with a parallel version is left to when you have way too much money to spend buying extra nodes that you can replicate your data on. With that in mind that said, even with the R13A we intend to continue mnesia reads even being made synchronous for things like crons, while the handling of that data immediately after - that massages the data into whatever form it wants - is encourages to be made more concurrent. This works out well when we want to compare datasets stored in DB. One way we’ve tried to optimize the cores is to go through our code to see if We can typically avoid unnecessary list fold’s , in favour of list map’s which are much more concurrent friendly since the input to each map function doesn’t depend on the previous or next values ( as opposed to foldl / foldr ). But wouldn’t it be nice to know when the last map function ( by making a lists:map concurrent , each concurrent process may take different time and complete in any order, which sort of resonates with what i meant by asynchronous.
I’ve compared some techniques from many pmap implementations out there, and here are some notes…
While it’s common practice to send the origin pid or to make a reference to the callee pid, most could easily reuse a variable .
lists:map fun(Msg)->
Pid ! { self() ,Msg }
end , Data ).
vs
%% Neater and should be faster , since reducing recomputing self N times
Self = self(),
lists:map fun(Msg)->
Pid ! { Self ,Msg }
end , Data ).
Now an interesting part wrt receives is where it is placed, and there are a fair share of different ways of placing it, while the most commonly used method is :
Pids = lists:map fun(Msg)->
Pid ! { Self ,Msg }
end , Data ),
lists:map ( fun(Pid) ->
receive
Pid -> ok
end
end, Pids).
or
Pids = lists:map fun(Msg)->
Pid ! { Self ,Msg }
end , Data ),
[ receive Pid -> ok end || Pid <- Pids].
I’ve always been content on avoiding the extra lists:map, and receiving results back form the concurrent process’s - within the same scope / lists:map
lists:map fun(Msg)->
Pid ! { Self ,Msg },
receive
Pid -> ok
end
end, List ) .
And to testing the above on R12B on a 32bit, R12B on a 64bit & the spanking new R13A on a 64-bit node, I asked hacked up a simple module that took a list of elements, and attempted :
UPDATE To reiterate what the module is for …“How would you get a counter of how many times some success() happened for each element of a list , and also store the results of each as well . Typically this would be a lists:foldl . Could there be someway faster than a lists:fold ( approach serial_something1) ?”
So, my explorations were that if you were to use a lists:map where one spawned process is used to maintain state, I found that results were much better. ( approach something1, something2 ) .
- something1 concurrently get the number of attempts and status of each attempt, using receive inside the first map
- something2 concurrently get the number of attempts and status of each attempt, using receive in a separate lists:map
- serial_something normal lists:map that synchronously get the number of attempts and status of each attempt ( no concurrency )
I then attempted to pass in elements of various sizes , 10, 100, 1000, and 10000.
erl , on R12B-1 on a 32bit node
Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:0] [kernel-poll:false]
Eshell V5.6.1 (abort with ^G)
1> something:multi_bench().
[{[{161,something1}, {29,something2}, {7,serial_something}]}, %% 10 elements
{[{543,something1}, {347,something2}, {80,serial_something}]}, %% 100 elements
{[{5672,something1}, {12840,something2}, {5227,serial_something}]}, %% 1000 elements
{[{50898,something1}, {64534,something2}, {617422,serial_something}]}] %% 10000 elements
erl , on R12B on a 64bit node
Erlang (BEAM) emulator version 5.6.1 [source] [64-bit] [smp:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.6.1 (abort with ^G)
1> something:multi_bench_something().
[{[{40,something1}, {15,something2}, {5,serial_something}]}, %% 10 elements
{[{260,something1}, {243,something2}, {83,serial_something}]}, %% 100 elements
{[{2633,something1}, {2600,something2}, {5794,serial_something}]}, %% 1000 elements
{[{23505,something1}, 28484,something2}, {622124,serial_something}]}] %% 10000 elements
erl , on R13A on a 64bit node
Erlang R13A (erts-5.7) [smp:4:4][rq:4] [async-threads:0]
Eshell V5.7 (abort with ^G)
1> something:multi_bench().
[{[{1,something1}, {1,something2}, {1,serial_something}]}, %% 10 elements
{[{1,something1}, {1,something2}, {1,serial_something}]}, %% 100 elements
{[{1,something1}, {1,something2}, {1,serial_something}]}, %% 1000 elements
{[{15000,something1}, {30999,something2}, {531999,serial_something}]}] %% 10000 elements
{[{234000,something1}, {265000,something2}, {68109999,serial_something}]}] %% 100000 elements [BONUS]
What’s surprising is that with R13A, the first approach of avoiding an extra lists:map fn for receive’s seems to working to consistently work out better across all platforms for large datasets. Here could be some key takeaways:
if the list is < 1000 elements then it’s better to infact use another lists:map ( doesn’t make a difference on R13A though )
if the list > 10000 elements, expecting the receive in another lists:map could turn out twice as slower ( across all releases )
The code used to run the tests , including the two different approaches to the receives is something.erl on github.
-
Highlight #2
OTP-7648 Support for Unicode is implemented as described in EEP10. Formatting and reading of unicode data both from terminals and files is supported by the io and io_lib modules. Files can be opened in modes with automatic translation to and from different unicode formats. The module 'unicode' contains functions for conversion between external and internal unicode formats and the re module has support for unicode data. There is also language syntax for specifying string and character data beyond the ISO-latin-1 range. The interactive shell will support input and output of unicode characters when the terminal and operating system supports it.
And with the awesomeness of binary pattern matching, there’s nothing stopping us now . That said we’ve already had a good run with unicode ( despite the myths ) . Our prayers were answered through a common glue between erlang,python as well as erlang. It’s dreadfully simple, yet a life-saver.
%% how we handled unicode pretty printing prior to R13A ( 3 cheers for Base64 encoding ! )
out(Arg)->
B64 = base64:encode( SomeBinaryKeyword () ),
Kw = {script, [], [
"document.write(Base64.decode('",
B64,
"'));"
]},
{ehtml, Kw }.
Since Binary can be spitted out by the yaws ehtml structure , there is no need to convert from binary back to string . Wrt unicode, I hope this also means that we will be able to actually work with unicode in the erlang shell, inside code, and so on. Erlang’s nature of storing strings as integers turned out to be a boon for us, because it now does’nt matter if out python crawler crawls english or swahili - it;s all integers stored as binary - which makes it cheaper than strings, and more parser friendly. Having a look at leex/yecc for building unicode compilers might be a hack for another day.
-
Highlight #3
Message passing has been further optimized for parallel execution. Serial message passing is slightly more expensive than before, but parallel send to a common receiver is much cheaper.
I’ll be looking forward to also comparing gen_server2 which has been doing the rounds in some projects.
-
Highlight #4
OTP-7804 The BIFs atom_to_binary/2, binary_to_atom/2, and binary_to_existing_atom/2 have been added.
-
Highlight #5
OTP-7826 Nodes belonging to different independent clusters can now co-exist on the same host with the help of a new environment variable setting ERL_EPMD_PORT.
Naming nodes ( hostname vs fully qualified ip ) tends to get little tricky if you don’t have static IP ’s. Especially when different hostnames try to interconnect. For those who are’nt familiar - creating a distributed cluster is as simply as
net_adm:ping( 'name@ip').
or
net_adm:ping('justhostname') .
and if the nodes set the same cookie ( just a security identifier ) and if all goes well - it should return pong. If there are problems, pang. If the nodes were all connected on the same Lan and sharing the same cookie - there would ‘nt be too many hassles. And until now , I don’t think nodes from different clusters could communicate. I wonder if this means that we could have nodes talking to each other from different lan networks. This is big.
-
Highlight #6
OTP-7641 When chunk reading a disk log opened in read_only mode, bad terms could crash the disk log process.
Some time back, we noticed that the nodes were crashing, mostly “out of memory” errors which would bring down the VM, sometimes even the node , which isn’t pretty at at all. We followed it by more logging, to no avail. I don’t know how we missed it -but it turned out that the “error_logger” module will registered one and only one process to receive and handle all log messages. ( at least till 1.77 ) had a bottleneck ,so infact logging to find out what was wrong kept forcing more crash’s . Since then we’re simply cut down logging to almost nil - and we haven’t faced the same issue since then. It was quite embarrassing when I did try it out locally. Simply running [ error_logger:info_msg( "writing ", [ X ] ) || X<- lists:seq(1, AHighEnoughNumber) ] brought yaws down. I hope this is fixed either will the latest yaws release or R12B, if that’s what they meant by this fix. Pity we did’nt debug this earlier.
-
Highlight #7 (mnesia)
OTP-7753 With bad timing several api functions could return or exit with a bad error message when mnesia was shutting down. OTP-7835 mnesia:clear_table/1 cleared all nodes table content even if the table was local_content only type.
We’ve also noticed inconsistencies in R12B, wrt clearing fragments of mnesia tables. What our module which we’ll talk about later - frag_mnesia does is spawn process’s that will address each fragment explicitly and clear it. We’ve got similiar functions for querying disk-only tables as well. Here are some results
%% wrt time taken an mnesia dirty_match_object on a fragmented table ( eg: 8 fragments + the base table ) %%is slower than spawning as many process's as there are fragments and dirty matching over them %%is slower than mnesia reads
When you have data that changes over time, - perhaps it could change from more that one point of entry, or you’re perhaps accepting a whole bunch of tags. Now the smart thing to do for editing is infact making the mnesia table of type set, and simply overwriting. Infact the create & edit records are often the same function at some point . But when you have data that is one is to many - eg: meta tags of a webpage . There maybe multiple meta tags for the same webapge. We decided to have the key as
%Approach1 : Table is type set, Key as tuple , no conflict resolution, but slower
{ Url , Kw }
%Approach2 : Table is type bag, Key is either Kw or Url , conflict resolution, but much quicker
Kw
%sample entries for a mnesia set table
{ <<"unitednations.com/">> , <<"india">> } , MetaData, TimeT1
{ <<"unitednations.com/">> , <<"india">> } , NewMetaData, TimeT2 %% overwrites TimeT1 which is great
%sample entries for a mnesia bag table
<<"unitednations.com/">> , <<"india">> , MetaData, TimeT1
<<"unitednations.com/">> , <<"india">> , NewMetaData, TimeT2 %% two verions of data
For a crawler, recrawls of webpages are something you need to cater to. If the table above was of tpye set, then the record TimeT2 would simply overwrite the first row with new data which is ideal. if the table above was of type bag, then there would have been two rows for the same url and keyword combination. We did’nt want to have to deal with the inconsistent rows , so we kept the table as sets. Which drastically reduced query speeds on large fragmented tables. There are two approaches that we might look at from here on. Keeping another table, where the index infact is the keyword or the other approach could be storing the tables as bags and dealing with conflict resolution of different versions of data. Amazon’s paper on dynamo talks about reading multiple versions of data (infact an S3 object is typically replicated on 3 nodes, reads are only allowed if two nodes respond, writes are always accpted. But they say this ratio changes from webapp to webapp within Amazon itself. But the interesting part was that of the conflict resolution was done during the reads. not during the writes. infact vector clocks are used to merge two different versions of data.
Another design decision we took early was how we sharded the data. I guess we broke every rule in the book, when we decided that every user would have their own set of tables. Wont there be too many tables ? Yes, and that many lesser table locks to deal with , letting us have more workers accessing multiple tables that have absolutely no transaction locks between them or drastically reduce the number of process’s trying to access the same data. Tell me more ! Backing up is easier, having user specific rule are easier, analysing and awarding more workers on the fly to high-priority publishers becomes a charm ( much like spawning ec2 instances i assume ) . Wait did you say you stored crawler data in mnesia ? That’’s right , sharded from the word go, disc-only fragmented from the word go ( allowing us to spawn process’s to even search each fragment which turns out to be several magnitude times faster that non-concurrent match’s. What’ll be interesting is how far we can go with mnesia. But how do you get in so much data so fast ? It is’nt lighting speed, but it takes around 4 seconds for everything from crawling/creating a whole bunch of index’s , and loading into mnesia from heat seeking ( will crawl based on popularity of pages ) hybridqueue . We also have another singleton crawler that’s more of a traditional techniques. User specific Metadata is often replicated across all nodes , and are disc_copies when are required in most calls. Stat tables are typically disc-only tables, and we’re currently have awstats configured over yaws logs to spruce us some http analytics juice. We’re also working on an rrdstats implementation that uses a gen_server to fill in data within the step frequency , still need to spend some more time before it’s production ready. And the rest of the table are mainly disc_copies or in RAM? Where does that leave us ? Denormalization - works great you need to render amazing amounts of data to some of India’s biggest sites in under 200-300 ms .Loads of inverted index’s being generated through extremely well-controlled and predictive flowcontrols. But we’re still exploring our own custom cache worker’s that are typically ring buffers of fixed size , but we’d love to spruce things up with perhaps squid servers, memcached , merle , or a distributed key-value store ?
-
Highlight #8 ( stdlib )
OTP-7230 The functions lists:seq/1,2 return the empty list in a few cases when they used to generate an exception, for example lists:seq(1, 0). See lists(3) for details. (Thanks to Richard O'Keefe.)
-
OTP-7740 When a process started with proc_lib, gen_server, or gen_fsm exits with reason {shutdown,Term}, a crash report will no longer be generated (to allow a clean shutdown, but still provide additional information to process that are linked to the terminating process).
All in all, it’s interesting times at hover.in with R13A coming up, and the buzz is definitely in the air wrt erlang. Too bad I could’nt make the CFP for the erlang factory’s Bay Area 2009 conference, it looks like it’s going to be the biggest and most widely anticipated erlang conference yet! Can’t wait for the videos of the talks to roll out. There are also few talks by companies like Facebook, and Mochiweb. and ofcourse the amazing guys at ProcessOne.
And there are some more things to rejoice about at hover.in too :
- Peaking at 50,000 hovers a single day, where a hover is everytime a viewer moves their cursor over a word that the publisher / blogger chooses to see content . We’re now happy to see around 15,000 - 20,000 hovers a day from the few publishers that have signed up with us.
- Interesting our scaling experiences with erlang & yaws has been …..reducing one node from our 4 node cluster : ) We ‘re more excited about pushing how much we can do with how less. ( 2-3 quad- core machines serving up as production servers ) with anywhere between 3 and 30 requests per second, we’re just touched quarter of a million hovers in under 2 months, with zilch marketing or PR, and established large-size Indian portals ( Sify Finance , Oneindia and more to be announced )
- full support for unicode languages, which means we’re the first intext company to hoverize regional languages. Go have a look. http://thatstamil.oneindia.in . A big shoutout to the folks at OneIndia for making it happen, and believing in the potential of the product. We however are looking for an NLP hacker to help us move especially in parts of speech recognition, classification and clustering for not just english, but pioneer with regional languages as well.
- Building the next generation of contextual , and in-text engaging content , be it a crunchbase hoverlet , a map hoverlet, music from last.fm or videos from youtube, a deck from slideshare, graphs from compete, conversations from twitter, and virtually any widget or 3rd party gadget out there. We typically work on erlang, python , javascript ( based on the excellent YUI ), and will be launching our facebook / wordpress / firefox plugins soon to be launched.
- the 26th alliance, was announced in our typical low-key affair, on January 26th and attempts to bring together a growing set of Indian consumer- web companies, trying to get together on common ground to pursue a singular cause - to develop and distribute widgets , and help expose and utilize upcoming API’s .
It’s been a fantastic ride through the past 18months , and hopefully we’ll be able to keep in touch within the eco-system of bloggers, content-providers and developers in the near future. take a look at the product demo’s on the the main hover.in blog, keep a track of hover.in on twiitter .
Thank’s for dropping in, and a big shoutout to everyone behind erlang, everyone at #erlang and github and the blogosphere promoting erlang - on behalf of our team of 5 developers, including myself : )
~
Keep Clicking,
Bhasker V Kode , aka bosky
Co-Founder & CTO
- Posted by Bosky at 08:18 am
- Permalink for this entry
- Filed under: posts
- RSS comments feed of this entry
- TrackBack URI

I think that this code snippet doesn’t do what suspected.
Pids = lists:map ( fun(Msg)->
Pid ! { Self ,Msg }
end , Data ),
lists:map ( fun(Pid) ->
receive
Pid -> ok
end
end, Pids).
Pids will contain [{Self, Msg} | ...] but not Pids. Then receive is not guaranteed in same order (If expected) but only if Data members are unique.
Anyway I don’t understand what comes concurrently. There is not any spawn, messages are send in serial order and received in serial order.
Your example is (totally) flawed. I stared at it for a while to see how it could possible be concurrent, and then I realized it’s not
You can verify it by adding a timer:sleep(1) inside the do_something(_X) function.
You are serially sending your messages and serially receiving them again. Just like hynek mentions above. Instead, you should spawn your ‘workers’ inside the map loop and then receive outside the map loop separately.
thanks for the inputs thijs & hynek!
Pid = spawn( Fun ) creates a new concurrent process that evaluates Fun. Moreover Joe Armstrong explains that in Pid ! Msg, message sending is asynchronous.
Well i should have done a better job of explaining what the example intended to do.
“How would you get a counter of how many times some success() happened for each element of a list , and also store the results of each as well . Typically this would be a lists:foldl . Could there be someway faster than a lists:fold ( approach serial_something1) ?”
So, my explorations were that if you were to use a lists:map where one spawned process is used to maintain state, I found that results were much better. ( approach something1, something2 ) .
The method of spawning one process for each data element, like both of you have suggested - works out great when the results don’t are not inter-connected or share some relationship. But if you can find some way how they can ,let me know.
That said - I think I must test out how the benchmarks would be suppose I did spawn a new process for each element with a more trivial problem definition.
Keep Clicking,
~B
PS: have updated the mentioned problem set into the post as well.
Thanks Bosky for your explanation. Your example is indeed correct for the case you wanted to use it
After reading your italic explanation two times, it became clear to me what you wanted to do. Quite unusual
I think Hynek and me were expecting you want to do some concurrent computations in this code and the current code was just a placeholder for some real work. Keep up the good Erlang posts!
dear bosky
nice post
For today, it’s a new, simple web framework for Erlang called BeepBeep. Written by Dave Bryson, it uses MochiWeb, which most Erlang Web developers are familiar with, and ErlyDTL, which is an Erlang port of the Python Template Language. Travis Slicegood has a demo application written in BeepBeep grabbing photos from Flickr, and Dave Bryson has a sample Blog Application as well.
“We however are looking for an NLP hacker to help us move especially in parts of speech recognition, classification and clustering for not just english, but pioneer with regional languages as well.”
Very tempting
If I weren’t so busy I’d probably give it a shot
Good work guys, It is great to see solid technology at work in an Indian startup (for whatever values of “indian” is true at Hover.
@thijs
cheers
@LB
thanks , the beepbeep post on tsung was a great primer as well, our experiences with tsung are over at http://developers.hover.in/blog/2009/yui-tsung-to-choose-a-cdn/
@ravimohan
pleasure to have you drop in, we’re very much an indian startup. too bad we couldnt meet at devcamp bangalore yesterday, maybe we could have talked more about your AI work, erlang at hover.in, and the NLP & parsing challenges upahead. Looking forward to bump into you in the near future…
Anyhow, here’s the link to the talk I gave…
http://www.slideshare.net/bosky101/erlang-at-hoverin-devcamp-blr-09
~B
[...] built Tsung 1.3.0 from source on erlang R13A ( built from source, catch the earlier post on first impressions ) on an AMD 64-bit , 4 GB Ram machine running ubuntu. Beebole has a nice intro on how to setup [...]