Combining multiple data sources in SPL (2024)

Depending on your use case or what you are looking to achieve with your Search Processing Language (SPL), you may need to query multiple data sources and merge the results.

The most intuitive command to use when these situations arise is the join command, but it tends to consume a lot of resources - especially when joining large datasets. This article describes the following additional commands and functions that can be applied when combining data from multiple sources, including their benefits and limitations.

  • OR
  • Append
  • Multisearch
  • Union

OR boolean operator

The most common use of the OR operator is to find multiple values in event data, for example, foo OR bar. This tells Splunk platform to find any event that contains either word. However, the OR operator is also commonly used to combine data from separate sources, for example (sourcetype=foo OR sourcetype=bar OR sourcetype=xyz).

Additional filtering can also be added to each data source, for example, (index=ABC loc=Ohio) OR (index=XYZ loc=California). When used in this manner, Splunk platform runs a single search, looking for any events that match any of the specified criteria in the searches. The required events are identified earlier in the search before calculations and manipulations are applied.

Learn more about using the OR operator in Splunk Docs for Splunk Enterprise or Splunk Cloud Platform.

Syntax for the OR operator

(<search1>) OR (<search2>) OR (<search3>)

Pros

  • Merges fields and event data from multiple data sources
  • Saves time since it does only a single search for events that match specified criteria and returns only the applicable events before any other manipulations

Cons

  • Only used with base searches
  • Does not allow calculations or manipulations per source, so any further calculations or manipulations need to be performed on all returned events

In the example below, the OR operator is used to combine fields from two different indexes and grouped by customer_id, which is common to both data sources.

Combining multiple data sources in SPL (1)

Append command

Append is a streaming command used to add the results of a secondary search to the results of the primary search. The results from the append command are usually appended to the bottom of the results from the primary search. After the append, you can use the table command to display the results as needed.

The secondary search must begin with a generating command. Append searches are not processed like subsearches where the subsearch is processed first. Instead, they are run at the point they are encountered in the SPL.

Learn more about using the append command in Splunk Docs for Splunk Enterprise or Splunk Cloud Platform.

Syntax for the append command

<primary search> ... | append [<secondary search>]

Pros

  • Displays fields from multiple data sources

Cons

  • Subject to a maximum result rows limit of 50,000 by default
  • The secondary search must begin with a generating command
  • It can only run over historical data, not real-time data

In the example below, the count of web activities on the Splunk user interface is displayed from _internal index along with count per response from the _audit index.

The last four rows are the results of the appended search. Both result sets share the count field. You can see that the append command tacks on the results of the subsearch to the end of the previous search, even though the results share the same field values.

Combining multiple data sources in SPL (2)

Multisearch command

Multisearch is a generating command that runs multiple streaming searches at the same time. It requires at least two searches and should only contain purely streaming operations such as eval, fields, or rex within each search.

One major benefit of the multisearch command is that it runs multiple searches simultaneously rather than sequentially as with the append command. This could save you some runtime especially when running more complex searches that include multiple calculations and/or inline extractions per data source. Results from the multisearch command are interleaved, not added to the end of the results as with the append command.

Learn more about using the multisearch command in Splunk Docs for Splunk Enterprise or Splunk Cloud Platform.

Syntax for the multisearch command

| multisearch [<search1>] [<search2>] [<search3>] ...

Since multisearch is a generating command, it must be the first command in your SPL. It is important to note that the searches specified in square brackets above are not actual subsearches. They are full searches that produce separate sets of data that will be merged to get the expected results. A subsearch is a search within a primary or outer search. When a search contains a subsearch, the Splunk platform processes the subsearch first as a distinct search job and then runs the primary search.

Pros

  • Merges data from multiple data sources
  • Runs searches simultaneously, thereby saving runtime with complex searches
  • There is no limit to the number of result rows it can produce
  • Results from the multisearch command are interleaved, allowing for a more organized view

Cons

  • Requires that the searches are entirely distributable or streamable
  • Can be resource-intensive due to multiple searches running concurrently. This needs to be taken into consideration since it can cause search heads to crash

In the example shown below, the multisearch command is used to combine the action field from the web_logs index and queue field from the tutorial_games index using the eval command to view the sequence of events and identify any roadblocks in customer purchases. The results are interleaved using the _time field.

Combining multiple data sources in SPL (3)

Union command

Union is a generating command that is used to combine results from two or more datasets into one large dataset. The behavior of the union command depends on whether the dataset is a streaming or non-streaming dataset. Centralized streaming or non-streaming datasets are processed the same as append command while distributable streaming datasets are processed the same as multisearch command.

Learn more about using the union command in Splunk Docs for Splunk Enterprise or Splunk Cloud Platform.

Syntax for union command

| union [<search2>] [<search2>] … OR … | union [<search>]

However, with streaming datasets, instead of this syntax:
<streaming_dataset1> | union <streaming_dataset2>

Your search is more efficient with this syntax:
... | union <streaming_dataset1>, <streaming_dataset2>

Pros

  • Merges data from multiple data sources
  • Can process both streaming and non-streaming commands, though behavior will depend on the command type
  • As an added benefit of the max out argument, which specifies the maximum number of results to return from the subsearch. The default is 50,000 results. This value is the maxresultrows setting in the [searchresults] stanza in the limits.conf file.

The example below is similar to the multisearch example provided above and the results are the same. Both searches are distributable streaming, so they are “unioned” by using the same processing as the multisearch command.

Combining multiple data sources in SPL (4)

In the example below, because the head command is a centralized streaming command rather than a distributable streaming command, any subsearches that follow the head command are processed using the append command. In other words, when a command forces the processing to the search head, all subsequent commands must also be processed on the search head.

Combining multiple data sources in SPL (5)

Comparing OR, Append, Multisearch, and Union

The table below shows a comparison of the four methods:

OR Append Multisearch Union
Boolean Operator Streaming command Generating command Generating command
Used in between searches Used in between searches Must be the first command in your SPL Can be either the first command or used in between searches. Choose the most efficient method based on the command types needed
Results are interleaved Results are added to the bottom of the table Results are interleaved Results are interleaved based on the time field
No limit to the number of rows that can be produced Subject to a maximum of 50,000 result rows by default No limit to the number of rows that can be produced Default of 50,000 result rows with non-streaming searches. Can be changed using maxout argument.
Requires at least two base searches Requires a primary search and a secondary one Requires at least two searches Requires at least two searches that will be “unioned”
Does not allow use of operators within the base searches Allows both streaming and non-streaming operators Allows only streaming operators Allows both streaming and non-streaming operators
Does only a single search for events that match specified criteria Appends results of the subsearch to the results of the primary search Runs searches simultaneously Behaves like multisearch with streaming searches and like append with non-streaming

Next steps

Want to learn more about combining data sources in Splunk? Contact us today! TekStream accelerates clients’ digital transformation by navigating complex technology environments with a combination of technical expertise and staffing solutions. We guide clients’ decisions, quickly implement the right technologies with the right people, and keep them running for sustainable growth. Our battle-tested processes and methodology help companies with legacy systems get to the cloud faster, so they can be agile, reduce costs, and improve operational efficiencies. And with hundreds of deployments under our belt, we can guarantee on-time and on-budget project delivery. That’s why 97% of clients are repeat customers.

The user- and community-generated information, content, data, text, graphics, images, videos, documents and other materials made available on Splunk Lantern is Community Content as provided in the terms and conditions of the Splunk Website Terms of Use, and it should not be implied that Splunk warrants, recommends, endorses or approves of any of the Community Content, nor is Splunk responsible for the availability or accuracy of such. Splunk specifically disclaims any liability and any actions resulting from your use of any information provided on Splunk Lantern.

Combining multiple data sources in SPL (2024)

FAQs

How do I combine multiple data sources? ›

Data blending is typically used for ad hoc reporting and rapid analysis. Traditionally, teams combined data sets through a process known as extract, transform, load (ETL). With this approach, you copy (extract) data from different sources, standardize (transform) it, and migrate (load) it into a data warehouse.

What is the process of combining data from multiple sources into one? ›

Data blending is the process of combining data from multiple sources into a functioning dataset. This process is gaining attention among analysts and analytic companies because it is a quick and straightforward method used to extract value from multiple data sources.

How do you combine data from multiple data sources in power query? ›

Select a cell in the query, and then select Query > Merge. In the Merge dialog box, select Products as the primary table, and select Total Sales as the secondary or related query to merge. Total Sales will become a new structured column with an expand icon.

What is the recommended method for joining data from multiple sources? ›

Joining Data from Multiple Sources
  1. Inner Join: Joins data records with identical values in the join fields.
  2. Left Outer Join: Unlike the inner join, all data records in the left-hand table occur in the result, even if no corresponding data record is found in the right-hand table.

How do I combine multiple datasets into one? ›

To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).

What are the risks of combining data sources? ›

Duplication: Combining different data sets can lead to duplicates, especially when each source might independently capture the same information.

What are the methods of combining data? ›

There are two main ways to combine existing data: through meta-analysis of summary statistics, and through Integrative Data Analysis using individual participant data.

What is the process of combining a data source? ›

The mail merge process involves taking information from one document, known as the data source, combining with another document known as the main document. The data source is a document, spreadsheet or database that contains personalized information such as names, addresses, and phone numbers.

How do you combine data in a query? ›

Perform a Merge operation
  1. To open a query, locate one previously loaded from the Power Query Editor, select a cell in the data, and then select Query > Edit. ...
  2. Select Home > Merge Queries. ...
  3. Select the primary table from the first drop-down list, and then select a join column by selecting the column header.

How to combine data from multiple columns into one Power Query? ›

Select two or more columns that you need to merge. To select more than one column contiguously or discontiguously, press Shift+Click or CTRL+Click on each subsequent column. The order of selection sets the order of the merged values. Select Transform > Merge Columns.

How do I combine data files in Power Query? ›

To select the files you want, filter columns, such as Extension or Folder Path. To combine the files into single table, select the Content column that contains each Binary (usually the first column), and then select Home > Combine Files.

What is combining data from multiple data sources called? ›

Data blending is a process whereby big data from multiple sources are merged into a single data warehouse or data set.

How do I join multiple data sources together with SQL? ›

To combine data from two tables we use an SQL JOIN clause, which comes after the FROM clause. Database tables are used to organize and group data by common characteristics or principles. Often, we need to combine elements from separate tables into a single tables or queries for analysis and visualization.

What is the process of combining data from multiple sources into one comprehensive data set? ›

Data blending: The process of combining multiple data sets into a single data set for analysis. However, unlike data integration, blended data often combines native data—that is, data that has not been transformed or cleansed—from multiple sources.

How do you integrate multiple sources? ›

5 Steps to Integrate Data from Multiple Sources
  1. Identify Which Data Sources to Integrate. Data sources come in many different formats and reside in many locations. ...
  2. Prepare Data for Integration. ...
  3. Choose a Data Integration Method. ...
  4. Implement the Integration Plan. ...
  5. Ensure Data Quality.
Jan 4, 2024

What is the combination of information from multiple sources into one? ›

Synthesizing information is the way that students can take information from multiple sources and bring the information together. When these sources are combined together, it creates one cohesive idea. This is typically how students learn new ideas, theories and information in class.

References

Top Articles
Temple Killeen Craigslist
Top 20 Telugu Movies of 2023 | Best Telugu Films 2023 - Times of India
Spasa Parish
Rentals for rent in Maastricht
159R Bus Schedule Pdf
Sallisaw Bin Store
Black Adam Showtimes Near Maya Cinemas Delano
Espn Transfer Portal Basketball
Pollen Levels Richmond
11 Best Sites Like The Chive For Funny Pictures and Memes
Things to do in Wichita Falls on weekends 12-15 September
Craigslist Pets Huntsville Alabama
Maine Coon Craigslist
How Nora Fatehi Became A Dancing Sensation In Bollywood 
‘An affront to the memories of British sailors’: the lies that sank Hollywood’s sub thriller U-571
Tyreek Hill admits some regrets but calls for officer who restrained him to be fired | CNN
Haverhill, MA Obituaries | Driscoll Funeral Home and Cremation Service
Rogers Breece Obituaries
Ems Isd Skyward Family Access
Elektrische Arbeit W (Kilowattstunden kWh Strompreis Berechnen Berechnung)
Omni Id Portal Waconia
Kellifans.com
Banned in NYC: Airbnb One Year Later
Four-Legged Friday: Meet Tuscaloosa's Adoptable All-Stars Cub & Pickle
Model Center Jasmin
Ice Dodo Unblocked 76
Is Slatt Offensive
Labcorp Locations Near Me
Storm Prediction Center Convective Outlook
Experience the Convenience of Po Box 790010 St Louis Mo
Fungal Symbiote Terraria
modelo julia - PLAYBOARD
Poker News Views Gossip
Abby's Caribbean Cafe
Joanna Gaines Reveals Who Bought the 'Fixer Upper' Lake House and Her Favorite Features of the Milestone Project
Tri-State Dog Racing Results
Navy Qrs Supervisor Answers
Trade Chart Dave Richard
Lincoln Financial Field Section 110
Free Stuff Craigslist Roanoke Va
Wi Dept Of Regulation & Licensing
Pick N Pull Near Me [Locator Map + Guide + FAQ]
Crystal Westbrooks Nipple
Ice Hockey Dboard
Über 60 Prozent Rabatt auf E-Bikes: Aldi reduziert sämtliche Pedelecs stark im Preis - nur noch für kurze Zeit
Wie blocke ich einen Bot aus Boardman/USA - sellerforum.de
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
How To Use Price Chopper Points At Quiktrip
Maria Butina Bikini
Busted Newspaper Zapata Tx
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 6318

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.