Home Forums Hive / HCatalog Hive: unable to create thread – HDP2.1

This topic contains 7 replies, has 5 voices, and was last updated by  Noam Cohen 1 month ago.

  • Creator
    Topic
  • #53552

    Steve
    Participant

    After upgrading to HDP2.1, I’m now seeing the hiveserver2 thread count continually increase over time as we use it (a periodic cron job that is loading data; it’s running & then exiting.)

    We ended up hitting the default(?) centos6.5 limit of 1024 processes/user for hive, at which point hive started throwing:
    java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:84)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)

    The same cron process was running the same way against the HDP2.0 cluster, and the thread count in hiveserver2 never budged. (We’ve got graphs!)

    This appears to be a HDP2.1 regression. Anyone have any tips on troubleshooting (what are these threads doing?)?

Viewing 7 replies - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #57910

    Noam Cohen
    Participant

    I encountered this problem as well. Just wondering – do you have a bug opened for it? Is it solved in HDP2.1.3?

    Collapse
    #56955

    Yang Yang
    Participant

    we added

    -hiveconf hive.exec.pre.hooks=” \
    -hiveconf hive.exec.post.hooks=” \
    -hiveconf hive.exec.failure.hooks=” \

    to hive command line, and added

    set hive.exec.pre.hooks=”;
    set hive.exec.post.hooks=”;
    set hive.exec.failure.hooks=”;

    to the front of the hive script, but still got the same problems

    Collapse
    #56954

    Yang Yang
    Participant

    we are also seeing the ATS problem.

    2014-07-08 12:40:46,144 INFO hooks.ATSHook (ATSHook.java:run(120)) – Failed to submit plan to ATS: com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectE
    xception: Connection refused
    at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
    at com.sun.jersey.api.client.Client.handle(Client.java:648)
    at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
    at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
    at org.apache.hadoop.hive.ql.hooks.ATSHook.fireAndForget(ATSHook.java:173)
    at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:110)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
    Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
    at sun.net.www.http.HttpClient.New(HttpClient.java:308)

    Collapse
    #53903

    Steve
    Participant

    Thanks – that workaround fixed it. If you can, let me know when it’s fixed and/or a bug to track.

    As a side note, Ambari wouldn’t actually let me set this to empty (complained that “it must be set” – this may be an ambari bug) so I set it to a comma (,) instead (because the comma is a split character between the jar names, and nothing’s on either side of the comma, it seems to be treating it as equivalent to ‘empty).

    Collapse
    #53785

    Thejas Nair
    Participant

    Looks like an issue with ATS system integration for reporting hive query progress.
    You can disable that by setting following configurations to empty
    hive.exec.pre.hooks
    hive.exec.post.hooks
    hive.exec.failure.hooks

    Thanks for reporting this. We are looking into the issue.

    Collapse
    #53709

    Steve
    Participant

    Thanks, that helped (at least get to the next thing).

    Both values are at their defaults:
    hive.server2.async.exec.threads = 100
    hive.server2.thrift.max.worker.threads = 500

    Jstack, at this point (thread count has been climbing steadily over past ~48 hours) shows 546 threads of “ATS Logger 0″. (All other named threads are only running one copy).

    Example entry (they all basically look the same):
    “ATS Logger 0″ daemon prio=10 tid=0x00007f6e11a65800 nid=0x35a4 waiting on condition [0x00007f6dd994b000]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    – parking to wait for <0x00000000ee0fcac8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
    at java.lang.Thread.run(Thread.java:662)

    Also, I do see entries like the following in the hiveserver2.log:
    2014-05-14 02:10:23,699 INFO [ATS Logger 0]: hooks.ATSHook (ATSHook.java:run(120)) – Failed to submit plan to ATS: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:90)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

    Also …

    2014-05-14 00:10:25,367 INFO [pool-5-thread-5]: impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(86)) – Timeline service address: http://{resourcemanager}:8188/ws/v1/timeline/

    However – nothing is listening on resourcemanager:8188

    This config comes from the “Custom yarn-site.xml” specified here:

    http://docs.hortonworks.com/HDPDocuments/Ambari-1.5.1.0/bk_upgrading_Ambari/content/ch02s03.html

    The cluster is managed via Ambari (v1.5.1.110), and according to the Ambari console everything is running – but apparently it’s not starting the ATS/history(?) server.

    Is Ambari expected to start the history server?

    Collapse
    #53672

    Vaibhav Gumashta
    Participant

    Hi Steve,

    In HiveServer2, we use the config hive.server2.thrift.max.worker.threads (default = 500) to set the max number of handler threads and hive.server2.async.exec.threads (default = 100), to set the number of async (background) threads. What are your config values for these two parameters? Can you also take a jstack for the HiveServer2 process when the issue occurs and attach the results here?

    Thanks,
    –Vaibhav

    Collapse
Viewing 7 replies - 1 through 7 (of 7 total)