Thursday 2 June 2016

How to resolve Flume Twitter Streaming issue

Problem with Flume Twitter Issue:

2016-06-02 12:36:21,765 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api

2016-06-02 12:36:21,766 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Waiting for 20000 milliseconds
2016-06-02 12:36:21,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Waiting for 20000 milliseconds]
2016-06-02 12:36:38,685 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:conf/flume-twitter.conf for changes
2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api

2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.
2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Establishing connection]





Solution for Flume Twitter issue:

1. create "kalyan-twitter-agent.conf" file with below content

kalyan-twitter-agent.sources = Twitter
kalyan-twitter-agent.channels = MemChannel
kalyan-twitter-agent.sinks = HDFS
 
kalyan-twitter-agent.sources.Twitter.type = com.orienit.kalyan.hadoop.training.flume.KalyanTwitterSource
kalyan-twitter-agent.sources.Twitter.channels = MemChannel
kalyan-twitter-agent.sources.Twitter.consumerKey = **********
kalyan-twitter-agent.sources.Twitter.consumerSecret = **********
kalyan-twitter-agent.sources.Twitter.accessToken = **********
kalyan-twitter-agent.sources.Twitter.accessTokenSecret = **********
 
kalyan-twitter-agent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
 
kalyan-twitter-agent.sinks.HDFS.channel = MemChannel
kalyan-twitter-agent.sinks.HDFS.type = hdfs
kalyan-twitter-agent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/%Y-%m-%d-%H-%M
kalyan-twitter-agent.sinks.HDFS.hdfs.fileType = DataStream
kalyan-twitter-agent.sinks.HDFS.hdfs.writeFormat = Text
kalyan-twitter-agent.sinks.HDFS.hdfs.batchSize = 1000
kalyan-twitter-agent.sinks.HDFS.hdfs.rollSize = 0
kalyan-twitter-agent.sinks.HDFS.hdfs.rollCount = 10000
kalyan-twitter-agent.sinks.HDFS.hdfs.useLocalTimeStamp = true

kalyan-twitter-agent.channels.MemChannel.type = memory
kalyan-twitter-agent.channels.MemChannel.capacity = 10000
kalyan-twitter-agent.channels.MemChannel.transactionCapacity = 100


2. Copy "kalyan-twitter-agent.conf" file in "$FUME_HOME/conf" folder

3. Copy "kalyan-twitter.jar", "twitter4j-core-3.0.3.jar", "twitter4j-media-support-3.0.3.jar", "twitter4j-stream-3.0.3.jar" files into "$FLUME_HOME/lib" folder

Download the above jar files from below links:







4. Execute below command to extract data from twitter using flume

$FLUME_HOME/bin/flume-ng agent -n kalyan-twitter-agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-twitter-agent.conf -Dflume.root.logger=DEBUG,console

5. Verify the data in console

 

6. Verify the data in hdfs location is "hdfs://localhost:8020/user/flume/tweets"
























Saturday 30 April 2016

If mapreduce jobs are not possible to run, issue with yarn because of less ram

If mapreduce jobs are not possible to run, issue with yarn because of less ram . 

update "yarn-site.xml" with below properties and restart "yarn"

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
Related Posts Plugin for WordPress, Blogger...