Showing posts with label Flume. Show all posts
Showing posts with label Flume. Show all posts

Thursday 2 June 2016

How to resolve Flume Twitter Streaming issue

Problem with Flume Twitter Issue:

2016-06-02 12:36:21,765 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api

2016-06-02 12:36:21,766 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Waiting for 20000 milliseconds
2016-06-02 12:36:21,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Waiting for 20000 milliseconds]
2016-06-02 12:36:38,685 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:conf/flume-twitter.conf for changes
2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist.
Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api

2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Waiting for 20000 milliseconds]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.
2016-06-02 12:36:41,766 (Twitter Stream consumer-1[Establishing connection]) [DEBUG - twitter4j.internal.logging.SLF4JLogger.debug(SLF4JLogger.java:67)] Twitter Stream consumer-1[Establishing connection]





Solution for Flume Twitter issue:

1. create "kalyan-twitter-agent.conf" file with below content

kalyan-twitter-agent.sources = Twitter
kalyan-twitter-agent.channels = MemChannel
kalyan-twitter-agent.sinks = HDFS
 
kalyan-twitter-agent.sources.Twitter.type = com.orienit.kalyan.hadoop.training.flume.KalyanTwitterSource
kalyan-twitter-agent.sources.Twitter.channels = MemChannel
kalyan-twitter-agent.sources.Twitter.consumerKey = **********
kalyan-twitter-agent.sources.Twitter.consumerSecret = **********
kalyan-twitter-agent.sources.Twitter.accessToken = **********
kalyan-twitter-agent.sources.Twitter.accessTokenSecret = **********
 
kalyan-twitter-agent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
 
kalyan-twitter-agent.sinks.HDFS.channel = MemChannel
kalyan-twitter-agent.sinks.HDFS.type = hdfs
kalyan-twitter-agent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/%Y-%m-%d-%H-%M
kalyan-twitter-agent.sinks.HDFS.hdfs.fileType = DataStream
kalyan-twitter-agent.sinks.HDFS.hdfs.writeFormat = Text
kalyan-twitter-agent.sinks.HDFS.hdfs.batchSize = 1000
kalyan-twitter-agent.sinks.HDFS.hdfs.rollSize = 0
kalyan-twitter-agent.sinks.HDFS.hdfs.rollCount = 10000
kalyan-twitter-agent.sinks.HDFS.hdfs.useLocalTimeStamp = true

kalyan-twitter-agent.channels.MemChannel.type = memory
kalyan-twitter-agent.channels.MemChannel.capacity = 10000
kalyan-twitter-agent.channels.MemChannel.transactionCapacity = 100


2. Copy "kalyan-twitter-agent.conf" file in "$FUME_HOME/conf" folder

3. Copy "kalyan-twitter.jar", "twitter4j-core-3.0.3.jar", "twitter4j-media-support-3.0.3.jar", "twitter4j-stream-3.0.3.jar" files into "$FLUME_HOME/lib" folder

Download the above jar files from below links:







4. Execute below command to extract data from twitter using flume

$FLUME_HOME/bin/flume-ng agent -n kalyan-twitter-agent --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/kalyan-twitter-agent.conf -Dflume.root.logger=DEBUG,console

5. Verify the data in console

 

6. Verify the data in hdfs location is "hdfs://localhost:8020/user/flume/tweets"
























Related Posts Plugin for WordPress, Blogger...